Introduction to the
Apache Sqoop is a tool designed to efficiently transfer bulk data between Apache Hadoop and structured data stores, such as relational databases. Sqoop jobs are eventually submitted to YARN for execution (Map, not Reduce).
General production number warehouse process
Service data is stored in MySQL, and the big data platform is required for subsequent processing
1) MySQL import to Hadoop(Hive/HBase...) 2) Distributed engines with big data: Hive/Spark/Flink... 3) Get the result after processing (the result Data is still stored in Hadoop) 4) analyze the processing result and export it to THE RDBMS(MySQL) 5) Display the Data of the RDBMS through the UI. Spring Boot + Spring Data + ReactCopy the code
SQOOP
The deployment of
Decompress to establish a soft connection
[hadoop@xinxingdata001 software]$tar -zxvf sqoop-1.4.6-cdh5.16.2.tar.gz -c.. /app [hadoop@xinxingdata001 app]$ln -s sqoop-1.4.6-cdh5.16.2/ sqoopCopy the code
Configuring environment Variables
[hadoop@xinxingdata001 app]$ vim ~/.bashrc
#set sqoop environment
export SQOOP_HOME=/home/hadoop/app/sqoop
export PATH=$SQOOP_HOME/bin:$PATH
Copy the code
[hadoop@xinxingdata001 app]$ source ~/.bashrc
Configuration sqoop – env. Sh
[hadoop@xinxingdata001 ~]$ cd /home/hadoop/app/sqoop/conf/
[hadoop@xinxingdata001 conf]$ cp sqoop-env-template.sh sqoop-env.sh
[hadoop@xinxingdata001 conf]$ vim sqoop-env.sh
Copy the code
#Set path to where bin/hadoop is available //Hadoop directory export HADOOP_COMMON_HOME=/home/hadoop/app/hadoop #Set path to Where hadoop-*-core.jar is available // hadoop directory export HADOOP_MAPRED_HOME=/home/hadoop/app/hadoop #Set the path to where Bin /hive is available // hive directory export HIVE_HOME=/home/hadoop/app/hiveCopy the code
Copy the JDBC driver tosqoop/lib
directory
[hadoop@xinxingdata001 conf]$ cd~/software/ [hadoop@xinxingdata001 software]$cp mysql-connector-java-5.1.47.jar.. /app/sqoop/lib/mysql-connector-java.jarCopy the code
use
Sqoop help command
[hadoop@xinxingdata001 software]$ sqoop help
usage: sqoop COMMAND [ARGS] Available commands: codegen Generate code to interact with database records create-hive-table Import a table definition into Hive eval Evaluate a SQL statement and display the results export Export an HDFS directory to a database table help List available commands import Import a table from a database to HDFS import-all-tables Import tables from a database to HDFS import-mainframe Import datasets from a mainframe server to HDFS job Work with saved jobs list-databases List available databases on a server list-tables List available tables in a database merge Merge results of incremental imports metastore Run a standalone Sqoop metastore version Display version informationCopy the code
Viewing the Version Number
See how the commands listed in SQoop Help are used
[hadoop@xinxingdata001 software]$ sqoop help list-databases
usage: sqoop list-databases [GENERIC-ARGS] [TOOL-ARGS]
Common arguments:
--connect <jdbc-uri> Specify JDBC connect string
--driver <class-name> Manually specify JDBC driver class to use
--password <password> Set authenticati on password
--username <username> Set authenticati on username
Copy the code
[hadoop @ xinxingdata001 software] $sqoop list - databases \ - connect JDBC: mysql: / / 192.168.66.66:3306 \ -- the password xinxingdata \ --username rootCopy the code
[hadoop@xinxingdata001 software]$ sqoop help import
usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS] Import control arguments: --append Imports data in append mode --as-textfile Imports data as plain text (default) --columns <col,col,col... > Columns to import from table --compression-codec <codec> Compression codec to use for import --delete-target-dir Imports data in delete mode -e,--query <statement> Import results of SQL 'statement' -m,--num-mappers <n> Use 'n' map tasks to import in paralle --split-by <column-name> Column of the table used to split work units --mapreduce-job-name <name> Set name for generated mapreduce job --table <table-name> Table to read --target-dir <dir> HDFS plain table destination --warehouse-dir <dir> HDFS parent for table destination --where <where clause> WHERE clause to use during import Output line formatting arguments: --enclosed-by <char> Sets a required field enclosing character --escaped-by <char> Sets the escape character --fields-terminated-by <char> Sets the field separator character --lines-terminated-by <char> Sets the end-of-line character Hive arguments: --create-hive-table Fail if the target hive table exists --hive-database <database-name> Sets the database name to use when importing to hive --hive-import Import tables into Hive (Uses Hive's default delimiters if none are set.) --hive-overwrite Overwrite existing data in the Hive table --hive-partition-key <partition-key> Sets the partition key to use when importing to hive --hive-partition-value <partition-value> Sets the partition value to use when importing to hive --hive-table <table-name> Sets the table name to use when importing to hive --map-column-hive <arg> Override mapping for specific column to hive types. Code generation arguments: --null-non-string <null-str> Null non-string representation --null-string <null-str> Null string representationCopy the code
Run commands to import Mysql tables to HDFS
Sqoop import \ - connect JDBC: mysql: / / 192.168.66.66:3306 / xinxingdata \ - password * * * * * * * * * * \ - the username root \ - table empCopy the code
If you get an error, download java-json.jar
View the imported EMP table
[hadoop@xinxingdata001 ~]$ hdfs dfs -text /user/hadoop/emp/part*
Copy the code
Import the Mysql table to the HDFS by querying
Sqoop import \ - connect JDBC: mysql: / / 192.168.66.66:3306 / xinxingdata \ - password xinxingdata \ - the username root \ --mapreduce-job-name xinxing \ --target-dir emp \ --delete-target-dir \ --fields-terminated-by '\t' \ --null-non-string '0' \ --null-string ' ' \ --query 'select * from emp where sal >2000 and $CONDITIONS' \ -m 1Copy the code
[hadoop@xinxingdata001 ~]$ hdfs dfs -text /user/hadoop/emp/p*
Copy the code
Sqoop is imported by file passing parameters
[hadoop@xinxingdata001 shell]$ vim sqoop_file
Copy the code
The import - connect JDBC: mysql: / / 192.168.66.66:3306 / xinxingdata - password xinxingdata - the username root -- target - dir emp --delete-target-dir --fields-terminated-by '\t' --query 'select * from emp where sal >2000 and $CONDITIONS' -m 1Copy the code
[hadoop@xinxingdata001 shell]$ sqoop --options-file sqoop_file
Copy the code
Sqoop is exported from Hadoop to Mysql
[hadoop@xinxingdata001 ~]$ sqoop help export
Copy the code
usage: sqoop export [GENERIC-ARGS] [TOOL-ARGS]
Export control arguments:
...
--export-dir <dir> HDFS source path for the export
...
Copy the code
sqoop export \
--connect jdbc:mysql://localhost:3306/xinxingdata \
--password xinxingdata \
--username root \
--table emp_demo \
--mapreduce-job-name FromMySQL2HDFS \
--fields-terminated-by '\t' \
--export-dir /user/hadoop/emp \
-m 2
Copy the code
Mysql query whether the export is successful
Hive import/export for SQOOP
usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS] Hive arguments: --create-hive-table Fail if the target hive table exists --hive-database <database-name> Sets the database name to use when importing to hive --hive-drop-import-delims Drop Hive record \0x01 and row delimiters (\n\r) from imported string fields --hive-import Import tables into Hive (Uses Hive's default delimiters if none are set.) --hive-overwrite Overwrite existing data in the Hive table --hive-partition-key <partition-key> Sets the partition key to use when importing to hive --hive-partition-value <partition-value> Sets the partition value to use when importing to hive --hive-table <table-name> Sets the table name to use when importing to hive --map-column-hive <arg> Override mapping for specific column to hive types.Copy the code
Import to Hive table
You are not advised to set –create-hive-table because it may cause data type errors. You are advised to create a table structure in Hive before importing data
sqoop import \
--connect jdbc:mysql://localhost:3306/xinxingdata \
--password xinxingdata \
--username root \
--table emp \
--create-hive-table \
--hive-table hive_emp \
--hive-import \
--delete-target-dir \
--mapreduce-job-name Xinxing2HDFS \
-m 2
Copy the code
Error Make sure HIVE_CONF_DIR is set correctly. [hadoop@xinxingdata001 ~]$cp ~/app/hive/lib/hive-exec-1.1.0-cdh5.16.2.jar ~/app/sqoop/lib/
Export data from Hive to Mysql table
sqoop export \
--connect jdbc:mysql://localhost:3306/xinxingdata \
--password xinxingdata \
--username root \
--export-dir /user/hive/warehouse/hive_emp \
--table emp_hive \
--fields-terminated-by '\t' \
--mapreduce-job-name Xinxing2HDFS \
-m 2
Copy the code