HBase, directing, ElasticSearch

Currently, the operation result data of Spark needs to be stored, which requires high query speed. Therefore, HBase, MongoDB, and ElasticSearch distributed databases are selected to compare the write speed, query speed, and disk usage respectively.

The results were stored in postGRE before, and the method of table segmentation was adopted, but it took one and a half minutes to query, so it was unacceptable…

Write speed

Write 1 year’s data

Hbase:10 minutes MongoDB:17 minutes ES:8 minutesCopy the code

Query speed

Query 500 times based on latitude and longitude (milliseconds)

Hbase: The average value is 200-300 milliseconds, and the maximum value is more than 7 seconds
```
Mean 375.526000 STD 973.780084 min 136.000000 25% 192.000000 50% 214.000000 75% 256.000000 Max 7534.000000Copy the code
```
MongoDB: 3-4 seconds on average, more than 14 seconds on maximum
```
Mean 4106.846000 STD 2370.718396 min 2188.000000 25% 2597.750000 50% 2983.500000 75% 4721.250000 Max 14680.000000Copy the code
```
ES: The first query takes about 40 seconds, and the second query takes about 3-4 seconds

Disk usage

Hbase: 32 year data :36.1 GB MongoDB: 32 year data :120 GB ES: 32 year data :110.9 GBCopy the code

conclusion
- Hbase is suitable for a large amount of data. For simple query conditions, Hbase can only perform Get or Scan based on rowkeys or query a small amount of data using secondary indexes. If the amount of data queried by secondary indexes is too large, the Hbase query speed is slow
- MongoDB supports more complex queries than Hbase and is suitable for scenarios with uncertain schemas. In addition, when the data amount reaches tens of millions, two MongoDB processes occupy 30 GB of memory……
- ElasticSearch is suitable for full text search, and only stores the fields used by the query. For example, the Hbase secondary index can be implemented in ES. Real data is stored in Hbase. A fast query speed requires large memory, which consumes resources
  
  At present, there are only two data query scenarios and high speed requirements. In the end, I saved two HBase data files (a total of 70 GB), one based on latitude and longitude, and the other based on time, both of which were kept within 1 second

Related Posts

ElasticSearch for pit log

MySQL Experiment: Practice index on full column matching, left-most prefix matching, range query and other conditions and understand dirty read, magic read, etc. – Web development notes

Seven big factory Java post all cool, the interviewer gave me some advice, just let me out of the dilemma (attached: Java post experience sharing)