What is the Phoenix

Simply put, Phoenix is a framework that allows you to operate HBase databases using SQL.

HBase is a NoSQL database. The shell client only supports simple operations and is prone to dizziness.

For example, the following graph returns all the data in a table:

Is it ugly? In addition, if complex HBase query is performed, only the native API of HBase can be used to write query programs, which is quite troublesome.

Using Phoenix, you can use SQL to query data efficiently, add secondary indexes to HBase, improve query performance, and other features.

Phoenix supports writing JDBC code to operate HBase, which is more convenient than the native API.

Usage examples

Build table

CREATE TABLE IF NOT EXISTS us_population (
      state CHAR(2) NOT NULL,
      city VARCHAR NOT NULL,
      population BIGINT
      CONSTRAINT my_pk PRIMARY KEY (state, city));
Copy the code

See the table

0: jdbc:phoenix:localhost> ! tablesCopy the code

Go to HBase.

hbase(main):041:0> list
Copy the code

Result information:

TABLE
SYSTEM.CATALOG
SYSTEM.FUNCTION
SYSTEM.SEQUENCE
SYSTEM.STATS
US_POPULATION    
...
Copy the code

The US_POPULATION table was successfully generated.

Add data

Create a new test data file us_population.csv with the following contents:

NY,New York,8143197
CA,Los Angeles,3844829
IL,Chicago,2842518
TX,Houston,2016582
PA,Philadelphia,1463281
AZ,Phoenix,1461575
TX,San Antonio,1256509
CA,San Diego,1255540
TX,Dallas,1213825
CA,San Jose,912332
Copy the code

Run the following command to import the data from the file into the database:

./psql.py localhost us_population.csv
Copy the code

Query table data:

0: jdbc:phoenix:localhost> select * from US_POPULATION; 
Copy the code

The sample

Phoenix comes with a small example of web statistics. Run the following command to import it:

bin/psql.py localhost examples/WEB_STAT.sql examples/WEB_STAT.csv
Copy the code

The first implementation of the SQL table, and then import data file CSV.

Query table data:

0: jdbc:phoenix:localhost> select * from WEB_STAT;
Copy the code

The CORE and DB fields are CPU and database usage.

Check the average USAGE of CUP and database in each group according to DOMAIN groups:

SELECT DOMAIN, AVG(CORE) Average_CPU_Usage, AVG(DB) Average_DB_Usage
FROM WEB_STAT
GROUP BY DOMAIN
ORDER BY DOMAIN DESC;
Copy the code

View the number of accesses to each domain in descending order:

select domain,count(1) num 
from web_stat 
group by domain 
order by num desc;  
Copy the code

summary

The basic function of Phoenix is to add an SQL layer to HBase to facilitate HBase use.

Phoenix has many excellent features, such as: secondary index, namespace mapping, view, multi-tenant, dynamic column, transaction…

And now it has developed very well, can integrate Spark, Hive, Pig, MapReduce, and Flume plug-ins.