Welcome to my GitHub

Github.com/zq2599/blog…

Content: all original article classification summary and supporting source code, involving Java, Docker, Kubernetes, DevOPS, etc.;

About Sqoop

Sqoop is an Apache open source project for efficiently transferring large amounts of data between Hadoop and relational databases. This article will work with you to do the following:

  1. The deployment of Sqoop
  2. Export hive table data to MySQL with Sqoop
  3. Import MySQL data into hive tables using Sqoop

The deployment of

  1. Download version 1.4.7 of Sqoop in your Hadoop account’s home directory:
Wget HTTP: / / https://mirror.bit.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gzCopy the code
  1. Extract:
The tar - ZXVF sqoop - 1.4.7. Bin__hadoop - server. Tar. GzCopy the code
  1. Unzip sqoop-1.4.7. Bin__hadoop-2.6.0, copy mysql-connector-java-5.1.47.jar to sqoop-1.4.7. Bin__hadoop-2.6.0 /lib
  2. Go to sqoop-1.4.7.bin__hadoop-2.6.0/conf and rename sqoop-env-template.sh to sqoop-env.sh:
mv sqoop-env-template.sh sqoop-env.sh
Copy the code
  1. Open sqoop-env.sh with the editor and add the following three configurations, HADOOP_COMMON_HOME and HADOOP_MAPRED_HOME are the full Hadoop path and HIVE_HOME is the full Hive path:
Export HADOOP_COMMON_HOME=/home/hadoop/hadoop-2.7.7 export HADOOP_MAPRED_HOME=/home/hadoop/hadoop-2.7.7 export HADOOP_MAPRED_HOME=/home/hadoop/hadoop-2.7.7 export HIVE_HOME = / home/hadoop/apache - hive - 1.2.2 - binCopy the code
  1. Bin__hadoop-2.6.0 /bin /sqoop version sqoop 1.4.7 /bin /sqoop version
[hadoop@node0 bin]$./sqoop version Warning: /home/hadoop-sqoop-1.4.7. bin__hadoop-2.6.0/bin/.. /.. /hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: / home/hadoop/sqoop - 1.4.7. Bin__hadoop - server/bin /.. /.. /hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: / home/hadoop/sqoop - 1.4.7. Bin__hadoop - server/bin /.. /.. /accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: / home/hadoop/sqoop - 1.4.7. Bin__hadoop - server/bin /.. /.. /zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 20/11/02 12:02:58 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 Sqoop 1.4.7 git commit id 2328971411 f57f0cb683dfb79d19d4d19d185dd8 Compiled by maugli on Thu Dec 21 15:59:58 STD 2017Copy the code
  • Now that SQoop is installed, let’s experiment with its functionality

MySQL prepared

For the next actual combat, you need to get MySQL ready. Here is the MySQL configuration for your reference:

  1. MySQL version: 5.7.29
  2. IP address of the MySQL server: 192.168.50.43
  3. MySQL service port: 3306
  4. Account: root
  5. Password: 123456
  6. Database name: SQoop

In order to save trouble, I used docker to deploy MySQL. Please refer to “CDH DS218+ MySQL deployment”.

Importing MySQL from Hive (export)

  • Run the following command to import hive data to MySQL:
. / sqoop export \ - connect JDBC: mysql: / / 192.168.50.43:3306 / sqoop \ - table address \ username root \ -- password 123456  \ --export-dir '/user/hive/warehouse/address' \ --fields-terminated-by ','Copy the code
  • SQL > select * from address;

Import hive from MySQL

  1. In hive command line mode, execute the following statement to create a table named address2 with the same structure as address:
create table address2 (addressid int, province string, city string) 
row format delimited 
fields terminated by ', ';
Copy the code
  1. Run the following command to import data from the MySQL address table to the Hive address2 table. -m 2 indicates that two map tasks are started:
. / sqoop import \ - connect JDBC: mysql: / / 192.168.50.43:3306 / sqoop \ - table address \ username root \ -- password 123456  \ --target-dir '/user/hive/warehouse/address2' \ -m 2Copy the code
  1. After executing, the console enters something like the following:
Virtual memory (bytes) snapshot=4169867264 Total committed heap usage (bytes)=121765888 File Input Format Counters Bytes  Read=0 File Output Format Counters Bytes Written=94 20/11/02 16:09:22 INFO mapreduce.ImportJobBase: Transferred 94 bytes in 16.8683 seconds (5.5726 bytes/ SEC) 20/11/02 16:09:22 INFO mapreduce.importjobbase: Retrieved 5 records.Copy the code
  1. Query the Hive address2 table, and the data is imported successfully:
hive> select * from address2;
OK
1	guangdong	guangzhou
2	guangdong	shenzhen
3	shanxi	xian
4	shanxi	hanzhong
6	jiangshu	nanjing
Time taken: 0.049 seconds, Fetched: 5 row(s)
Copy the code
  • So far, the deployment and basic operations of Sqoop tool have been experienced. I hope this article can give you some references when you perform data import and export operations.

You are not alone, Xinchen original accompany all the way

  1. Java series
  2. Spring series
  3. The Docker series
  4. Kubernetes series
  5. Database + middleware series
  6. The conversation series

Welcome to pay attention to the public number: programmer Xin Chen

Wechat search “programmer Xin Chen”, I am Xin Chen, looking forward to enjoying the Java world with you…

Github.com/zq2599/blog…