Disclaimer: original by the author, reproduced with credit.
Author: Handsome Chen eats an apple
First, install SQOOP
1. Download SQOOP, unzip and rename the folder
Wget tar ZXVF - http://mirror.bit.edu.cn/apache/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz Alpha.tar.gz -c /root/hadoop/ mv sqoop-1.4.6.bin_hadoop-2.0.4.alpha sqoop-1.4.6
2. Configure environment variables
vim /etc/profile
Export PATH = $PATH:$SQOOP_HME/bin
3, validation,
If SQOOP version information is displayed normally, the installation configuration is successful:
[root@localhost ~ ] # sqoop version
Create the database and table
After installing MySQL, create the test database and test tables.
Database name: test
Point (pointId int(10) primaryKey,pointName varchar(16),pointValue int(10))
The shell script generates test data
Write a Shell script where the insert statement needs to be modified based on the table structure created:
#! /bin/bash i=1; MAX_INSERT_ROW_COUONT=$1; while(( $i <= $MAX_INSERT_ROW_COUNT )) do mysql -uhive -phive test -e "insert into test.point(pointId,pointName,pointValue) values($i,'point"$i"',$i);" i=(($i+1)) done exit 0
The above script is very slow to generate test data, the author generated 10 million pieces of data like pregnancy, welcome suggestions, thank you!
4. MySQL data import
MySQL as a data source, Sqoop need to rely on the MySQL database connection driver package, download address: https://dev.mysql.com/get/Dow…
After downloading it, unzip it and copy the mysql-connector-java-5.1.44-bin.jar from mysql-connector-java-5.1.45 to $SQOOP_HOME/lib.
1. Import HDFS
sqoop import --connect jdbc:mysql://localhost:3306/test --username hive --password hive --table
point
Parameter analysis:
Import: import data from traditional database into HDFS/HIVE/HBASE, etc.
–connect: establish a database connection;
JDBC: mysql: / / localhost: 3306 / test: the way of using the JDBC connection mysql database, database named test;
Mysql > specify database username;
–password: specify database password;
–table: Specify the table name
Note:
A) The HDFS output directory cannot already exist;
B) When -m or splite-by is not specified, that is, when parallelism is not specified, the table to be imported must have a primary key, otherwise an error will occur.
C) Import to the specified directory: sqoop import –connect jdbc:mysql://localhost:3306/test –username hive –password hive –table point –target-dir /directory
If the output directory is not specified, create a subdirectory under /user/root/ with the same table name as the output directory by default. After the import operation, check to see if there are imported files in HDFS:
hdfs dfs -ls /user/root/point/
2. Import HBase
sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password root --table
point --hbase-table HPoint --column-family info --hbase-row-key pointId --hbase-create-table
Parameter analysis:
–hbase-table: Specifies the table to be imported into Hbase database;
–column-family: Specifies the column family name;
–hbase-row-key: Specify rowKey;
–hbase-create-table: create table in hbase;
Finally, corrections are welcome. If you like it, give it a “like” and invite you to an apple.