This article source: making | | GitEE
Overview of SQOOP
SQOOP is an open source big data component that is used to transfer data between Hadoop(Hive, HBase, etc.) and traditional databases (MySQL, PostgreSQL, Oracle, etc.).
General data handling components of the basic functions: import and export.
Since SQOOP is a component of the big data technology architecture, importing a relational database into a Hadoop storage system is called importing, and vice versa.
SQOOP is a command-line component tool that converts an import or export command into a MapReduce program for implementation. MapReduce mainly customizes the InputFormat and OutputFormat.
Second, environment deployment
When testing SQOOP components, you should at least have the Hadoop family, relational data, JDK and other basic environment.
Since SQOOP is a tool class component, a single node installation is fine.
1. Upload the installation package
Installation package and version: sqoop-1.4.6
[root@hop01 opt]# tar-zxf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz [root@hop01 opt]# mv Sqoop - 1.4.6. Bin__hadoop 2.0.4 - alpha sqoop1.4.6
2. Modify the configuration file
File location: sqoop1.4.6/conf
[root@hop01 conf]# PWD /opt/sqoop1.4.6/conf [host conf]# mv sqoop-env-template.sh sqoop-env.sh
Configuration content: Related to the Hadoop family of common components and scheduling component ZooKeeper.
Export HADOOP_Common_Home =/opt/ HADOOP2.7 export [root@hop01 conf]# Vim SQOOP-ENV.SH HADOOP_MAPRED_HOME=/opt/hadoop2.7 export HIVE_HOME=/opt/hive1.2 export HBASE_HOME=/opt/hbase-1.3.1 export HBASE_HOME=/opt/hbase-1.3.1 export ZOOKEEPER_HOME = / opt/zookeeper3.4 export ZOOCFGDIR = / opt/zookeeper3.4
3. Configure environment variables
[root@hop01 opt]# vim /etc/profile export SQOOP_HOME=/opt/sqoop1.4.6 export PATH=$PATH:$SQOOP_HOME/bin [] opt # source /etc/profile
4. Introduce the MySQL driver
[root@hop01 opt]# cp mysql-connector-java-5.1.27-bin.jar sqoop1.4.6/lib/
5. Environmental inspection
Key points: import and export
Look at the help command and see the version number through version. SQOOP is a command-line based tool, so the commands here will be used below.
6. Related environment
At this point, look at the relevant environment in the SQOOP deployment node, which is basically clustered mode:
7. Test MySQL connection
sqoop list-databases --connect jdbc:mysql://hop01:3306/ --username root --password 123456
Here is the command to view the MySQL database. The result is printed correctly as shown in the figure:
III. Data import cases
1. MySQL data script
CREATE TABLE 'tb_user' (' id' int(11) NOT NULL AUTO_INCREMENT COMMENT ON 'tb_user ', 'user_name' varchar(100) DEFAULT NULL COMMENT 'user_name ', PRIMARY KEY (' id ')) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT=' user table '; INSERT INTO `sq_import`.`tb_user`(`id`, `user_name`) VALUES (1, 'spring'); INSERT INTO `sq_import`.`tb_user`(`id`, `user_name`) VALUES (2, 'c++'); INSERT INTO `sq_import`.`tb_user`(`id`, `user_name`) VALUES (3, 'java');
2. SQOOP import script
Specify the table of the database, import the Hadoop system in full quantity, notice that the Hadoop service needs to be started here;
sqoop import
--connect jdbc:mysql://hop01:3306/sq_import \
--username root \
--password 123456 \
--table tb_user \
--target-dir /hopdir/user/tbuser0 \
-m 1
3. Hadoop queries
[root@hop01 ~]# hadoop fs -cat /hopdir/user/tbuser0/part-m-00000
4. Specify columns and conditions
The SQL statement of the query must have WHERE&dollar in it; The CONDITIONS:
sqoop import --connect jdbc:mysql://hop01:3306/sq_import \ --username root \ --password 123456 \ --target-dir /hopdir/user/tbname0 \ --num-mappers 1 \ --query 'select user_name from tb_user where 1=1 and $CONDITIONS; '
To view the export results:
[root@hop01 ~]# hadoop fs -cat /hopdir/user/tbname0/part-m-00000
5. Import Hive components
Without specifying the database used by Hive, the default library is imported and the table name is created automatically:
sqoop import
--connect jdbc:mysql://hop01:3306/sq_import \
--username root \
--password 123456 \
--table tb_user \
--hive-import \
-m 1
The SQOOP execution log can be observed here:
Step 1: Import data from MySQL into the default path of HDFS;
Step 2: Migrate the data from the temporary directory to the Hive table.
6. Import the HBase component
The current HBase cluster version is 1.3, and you need to create the table to perform the data import properly:
sqoop import
--connect jdbc:mysql://hop01:3306/sq_import \
--username root \
--password 123456 \
--table tb_user \
--columns "id,user_name" \
--column-family "info" \
--hbase-table tb_user \
--hbase-row-key id \
--split-by id
View table data in HBase:
Four, data export cases
Create a new MySQL database and table, and then export the data from HDFS to MySQL. Here you can use the data generated by the first import script:
sqoop export
--connect jdbc:mysql://hop01:3306/sq_export \
--username root \
--password 123456 \
--table tb_user \
--num-mappers 1 \
--export-dir /hopdir/user/tbuser0/part-m-00000 \
--num-mappers 1 \
--input-fields-terminated-by ","
Check the data again in MySQL, the record is completely exported, here, is the separator symbol between each data field, syntax rules against the script an HDFS data query results can be.
Five, the source code address
Making address GitEE, https://github.com/cicadasmile/big-data-parent, https://gitee.com/cicadasmile/big-data-parent
Read labels
【 JAVA Foundation 】【 Design Patterns 】【 Structure and Algorithms 】【Linux System 】【 Database 】
[Distributed Architecture] [Micro Services] [Big Data Components] [SpringBoot Advanced] [Spring&Boot Foundation]
【 Data Analysis 】【 Technical Map 】【 Workplace 】