Introduction to the

Apache Sqoop is a tool designed to efficiently transfer bulk data between Apache Hadoop and structured data stores, such as relational databases. Sqoop jobs are eventually submitted to YARN for execution (Map, not Reduce).

General production number warehouse process

Service data is stored in MySQL, and the big data platform is required for subsequent processing

1) MySQL import to Hadoop(Hive/HBase...) 2) Distributed engines with big data: Hive/Spark/Flink... 3) Get the result after processing (the result Data is still stored in Hadoop) 4) analyze the processing result and export it to THE RDBMS(MySQL) 5) Display the Data of the RDBMS through the UI. Spring Boot + Spring Data + ReactCopy the code

SQOOP

The deployment of

Decompress to establish a soft connection

[hadoop@xinxingdata001 software]$tar -zxvf sqoop-1.4.6-cdh5.16.2.tar.gz -c.. /app [hadoop@xinxingdata001 app]$ln -s sqoop-1.4.6-cdh5.16.2/ sqoopCopy the code

Configuring environment Variables

[hadoop@xinxingdata001 app]$ vim ~/.bashrc

#set sqoop environment
export SQOOP_HOME=/home/hadoop/app/sqoop
export PATH=$SQOOP_HOME/bin:$PATH
Copy the code

[hadoop@xinxingdata001 app]$ source ~/.bashrc

Configuration sqoop – env. Sh

[hadoop@xinxingdata001 ~]$ cd /home/hadoop/app/sqoop/conf/
[hadoop@xinxingdata001 conf]$ cp sqoop-env-template.sh  sqoop-env.sh
[hadoop@xinxingdata001 conf]$ vim sqoop-env.sh
Copy the code
#Set path to where bin/hadoop is available //Hadoop directory export HADOOP_COMMON_HOME=/home/hadoop/app/hadoop #Set path to Where hadoop-*-core.jar is available // hadoop directory export HADOOP_MAPRED_HOME=/home/hadoop/app/hadoop #Set the path to where Bin /hive is available // hive directory export HIVE_HOME=/home/hadoop/app/hiveCopy the code

Copy the JDBC driver tosqoop/libdirectory

[hadoop@xinxingdata001 conf]$ cd~/software/ [hadoop@xinxingdata001 software]$cp mysql-connector-java-5.1.47.jar.. /app/sqoop/lib/mysql-connector-java.jarCopy the code

use

Sqoop help command

[hadoop@xinxingdata001 software]$ sqoop help

usage: sqoop COMMAND [ARGS] Available commands: codegen Generate code to interact with database records create-hive-table Import a table definition into Hive eval Evaluate a SQL statement and display the results export Export an HDFS directory to a database table help List available  commands import Import a table from a database to HDFS import-all-tables Import tables from a database to HDFS import-mainframe Import datasets from a mainframe server to HDFS job Work with saved jobs list-databases List available databases on a server list-tables List available tables in a database merge Merge results of incremental imports metastore Run a standalone Sqoop metastore version Display version informationCopy the code

Viewing the Version Number

See how the commands listed in SQoop Help are used

[hadoop@xinxingdata001 software]$ sqoop help list-databases

usage: sqoop list-databases [GENERIC-ARGS] [TOOL-ARGS]

Common arguments:
   --connect <jdbc-uri>					 					Specify JDBC connect string
 
	--driver <class-name>									Manually specify JDBC driver class to use
   
   --password <password>                                      Set authenticati on password

   --username <username>                                      Set authenticati on username
Copy the code
[hadoop @ xinxingdata001 software] $sqoop list - databases \ - connect JDBC: mysql: / / 192.168.66.66:3306 \ -- the password xinxingdata \ --username rootCopy the code


[hadoop@xinxingdata001 software]$ sqoop help import

usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS] Import control arguments: --append Imports data in append mode --as-textfile Imports data as plain text (default) --columns <col,col,col... > Columns to import from table --compression-codec <codec> Compression codec to use for import --delete-target-dir Imports data in delete mode -e,--query <statement> Import results of SQL 'statement' -m,--num-mappers <n> Use 'n' map tasks to import in paralle --split-by <column-name> Column of the table used to split work units --mapreduce-job-name <name> Set name for generated mapreduce job --table <table-name> Table to read --target-dir <dir> HDFS plain table destination --warehouse-dir <dir> HDFS parent for table destination --where <where clause> WHERE clause to use during import Output line formatting arguments: --enclosed-by <char> Sets a required field enclosing character --escaped-by <char> Sets the escape character --fields-terminated-by <char> Sets the field separator character --lines-terminated-by <char> Sets the end-of-line character Hive arguments: --create-hive-table Fail if the target hive table exists --hive-database <database-name> Sets the database name to use when importing to hive --hive-import Import tables into Hive (Uses Hive's default delimiters if none are set.) --hive-overwrite Overwrite existing data in the Hive table --hive-partition-key <partition-key> Sets the partition key to use when importing to hive --hive-partition-value <partition-value> Sets the partition value to use when importing to  hive --hive-table <table-name> Sets the table name to use when importing to hive --map-column-hive <arg> Override mapping for specific column to hive types. Code generation arguments: --null-non-string <null-str> Null non-string representation --null-string <null-str> Null string representationCopy the code

Run commands to import Mysql tables to HDFS

Sqoop import \ - connect JDBC: mysql: / / 192.168.66.66:3306 / xinxingdata \ - password * * * * * * * * * * \ - the username root \ - table  empCopy the code

If you get an error, download java-json.jar

View the imported EMP table

[hadoop@xinxingdata001 ~]$ hdfs dfs -text /user/hadoop/emp/part*
Copy the code

Import the Mysql table to the HDFS by querying

Sqoop import \ - connect JDBC: mysql: / / 192.168.66.66:3306 / xinxingdata \ - password xinxingdata \ - the username root \ --mapreduce-job-name xinxing \ --target-dir emp \ --delete-target-dir \ --fields-terminated-by '\t' \ --null-non-string '0' \ --null-string ' ' \ --query 'select * from emp where sal >2000 and $CONDITIONS' \ -m 1Copy the code
[hadoop@xinxingdata001 ~]$ hdfs dfs -text /user/hadoop/emp/p*
Copy the code

Sqoop is imported by file passing parameters

[hadoop@xinxingdata001 shell]$ vim sqoop_file
Copy the code
The import - connect JDBC: mysql: / / 192.168.66.66:3306 / xinxingdata - password xinxingdata - the username root -- target - dir emp --delete-target-dir --fields-terminated-by '\t' --query 'select * from emp where sal >2000 and $CONDITIONS' -m 1Copy the code
[hadoop@xinxingdata001 shell]$ sqoop --options-file sqoop_file 
Copy the code

Sqoop is exported from Hadoop to Mysql

[hadoop@xinxingdata001 ~]$ sqoop help export
Copy the code
usage: sqoop export [GENERIC-ARGS] [TOOL-ARGS]

Export control arguments:
...
  --export-dir <dir>                                         HDFS source  path for the export
...
Copy the code
sqoop export \
--connect jdbc:mysql://localhost:3306/xinxingdata \
--password xinxingdata \
--username root \
--table emp_demo \
--mapreduce-job-name FromMySQL2HDFS \
--fields-terminated-by '\t' \
--export-dir /user/hadoop/emp \
-m 2
Copy the code

Mysql query whether the export is successful

Hive import/export for SQOOP

usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS] Hive arguments: --create-hive-table Fail if the target hive table exists --hive-database <database-name> Sets the database name to use when importing to hive --hive-drop-import-delims Drop Hive record \0x01 and row delimiters (\n\r) from imported string fields --hive-import Import tables into Hive (Uses Hive's default delimiters if none are set.) --hive-overwrite Overwrite existing data in the Hive table --hive-partition-key <partition-key> Sets the partition key to use when importing to hive --hive-partition-value <partition-value> Sets the partition value to use when importing to hive --hive-table <table-name> Sets the table name to use when importing to hive --map-column-hive <arg> Override mapping for  specific column to hive types.Copy the code

Import to Hive table

You are not advised to set –create-hive-table because it may cause data type errors. You are advised to create a table structure in Hive before importing data

sqoop import \
--connect jdbc:mysql://localhost:3306/xinxingdata \
--password xinxingdata \
--username root \
--table emp \
--create-hive-table \
--hive-table hive_emp \
--hive-import \
--delete-target-dir \
--mapreduce-job-name Xinxing2HDFS \
-m 2
Copy the code

Error Make sure HIVE_CONF_DIR is set correctly. [hadoop@xinxingdata001 ~]$cp ~/app/hive/lib/hive-exec-1.1.0-cdh5.16.2.jar ~/app/sqoop/lib/

Export data from Hive to Mysql table

sqoop export \
--connect jdbc:mysql://localhost:3306/xinxingdata \
--password xinxingdata \
--username root \
--export-dir /user/hive/warehouse/hive_emp \
--table emp_hive \
--fields-terminated-by '\t' \
--mapreduce-job-name Xinxing2HDFS \
-m 2
Copy the code