01 Sqoop is introduced
Sqoop is an open source tool of Apache. It mainly aims at data transfer between relational database and Hadoop cluster. It imports data from a relational database into a Hadoop cluster (that is, into Hdfs). You can also export data from a Hadoop cluster (HDFS) to a relational database. Sqoop is an ETL tool with high efficiency and high volume data transfer features. Native Sqoop also uses instructions to submit and trigger tasks. Small partners are not a touch of sadness, is the way of instruction operation. So learning instructions is also very important.
02 Sqoop Functions
Sqoop itself contains a number of functions
Sqoop help Command (tip: sqoop help Command, replace command with function directive name to see the detailed function description. For example, sqoop help import can be used in detail.)
- 1. Sqoop import directive
Import a single relational database table to the HDFS of the Hadoop cluster. Cluster receive tables can be created automatically during the import process, empty data issues can be handled, and so on.
- 2. Sqoop import-all-tables directive
Import all tables in the entire database from the relational database into the Hadoop cluster. During the import process, all tables must have primary keys, only all columns of all tables can be imported, and all tables cannot have WHERE conditions.
- 3. Sqoop export directive
Export hadoop cluster data to a relational database. Concurrent export can be set in the import and export process, but not too large, sometimes the database can not bear.
- 4. Sqoop job directive
The job directive creates an alias for the confirmed import or export directive. When run again, run the alias through the SQoop job directive. Omitted large sections of instruction code.
- Sqoop metastore
You can use local SQoop Job tasks as shared tasks. The remote machine can be connected to and executed by sqoop Job — meat-Connect to enable the shared task to implement the remote call.
- 6, sqoop list-databases command
You can view a list of all databases under the connection. Easy to confirm the connection source.
- Sqoop list-tables
You can view a list of all tables under the join.
- The sqoop eval directive
The ability to query data or perform other DML operations through EVAL. The correctness of the data source can be further verified.
- Sqoop merge
You can merge different blocks of data from the same table that have been imported into the cluster. Ensure data is up to date. In most cases, data consolidation does not use this functionality of SQOOP. Basically all data development partners write THEIR own Sql.
Transfer from data here