@[toc]


1. hadoop

It is a distributed computing + distributed file system, the former is actually MapReduce, the latter is HDFS. The latter can run independently, while the former can be used optionally or not


2. hive

The data in the warehouse are data files managed by HDFS. It supports functions similar to SQL statements. You can use these statements to perform computing in distributed environments. Calculations here are limited to lookup and analysis, not updates, additions, and deletions.

Its advantage is that it does historical data processing, or offline computing, as it’s called these days, because underneath it is MapReduce, which has poor performance in real-time computing. It loads the data file as a Hive table (or external table), giving you the impression that your SQL is operating on a traditional table.


3. hbase

Generally speaking, hbase is similar to a database. Traditional databases manage centralized local data files. Hbase manages distributed data files based on the HDFS, for example, adding, deleting, modifying, and querying data files. In other words, hbase only uses HDFS of Hadoop to manage data persistence files (Hfiles). It has nothing to do with MapReduce.

Hbase has the advantage of real-time computing. All real-time data is directly stored in hbase. Clients access hbase through apis for real-time computing. Its use of NoSQL, or column structure, improves lookup performance and allows it to be used in big data scenarios, which differentiates it from MapReduce.


conclusion

Hadoop is the basis of Hive and hbase. Hive relies on Hadoop, while hbase relies only on the HDFS module of Hadoop.

Hive applies to offline data analysis and operates common formats (such as common log files) and data files managed by Hadoop. It supports SQL and is more convenient than Java code writing of MapReduce. Hive is positioned as a data warehouse to store and analyze historical data.

Hbase applies to real-time computing and noSQL in a column structure. Hbase operates hfiles in a special format generated by itself and data files managed by Hadoop. It is positioned as a database or DBMS.

Hive can directly operate HDFS files as its table data or use hbase databases as its table data.