Use SQOOP to import massive MySQL test data into HDFS and HBase

Disclaimer: original by the author, reproduced with credit.

Author: Handsome Chen eats an apple

First, install SQOOP

1. Download SQOOP, unzip and rename the folder

Wget tar ZXVF - http://mirror.bit.edu.cn/apache/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz Alpha.tar.gz -c /root/hadoop/ mv sqoop-1.4.6.bin_hadoop-2.0.4.alpha sqoop-1.4.6

2. Configure environment variables

vim /etc/profile

Export PATH = $PATH:$SQOOP_HME/bin

3, validation,

If SQOOP version information is displayed normally, the installation configuration is successful:

[root@localhost ~ ] # sqoop version

Create the database and table

After installing MySQL, create the test database and test tables.

Database name: test

Point (pointId int(10) primaryKey,pointName varchar(16),pointValue int(10))

The shell script generates test data

Write a Shell script where the insert statement needs to be modified based on the table structure created:

#! /bin/bash i=1; MAX_INSERT_ROW_COUONT=$1; while(( $i <= $MAX_INSERT_ROW_COUNT )) do mysql -uhive -phive test -e "insert into test.point(pointId,pointName,pointValue) values($i,'point"$i"',$i);" i=(($i+1)) done exit 0

The above script is very slow to generate test data, the author generated 10 million pieces of data like pregnancy, welcome suggestions, thank you!

4. MySQL data import

MySQL as a data source, Sqoop need to rely on the MySQL database connection driver package, download address: https://dev.mysql.com/get/Dow…

After downloading it, unzip it and copy the mysql-connector-java-5.1.44-bin.jar from mysql-connector-java-5.1.45 to $SQOOP_HOME/lib.

1. Import HDFS

sqoop import --connect jdbc:mysql://localhost:3306/test --username hive --password hive  --table 
point

Parameter analysis:

Import: import data from traditional database into HDFS/HIVE/HBASE, etc.

–connect: establish a database connection;

JDBC: mysql: / / localhost: 3306 / test: the way of using the JDBC connection mysql database, database named test;

Mysql > specify database username;

–password: specify database password;

–table: Specify the table name

Note:

A) The HDFS output directory cannot already exist;

B) When -m or splite-by is not specified, that is, when parallelism is not specified, the table to be imported must have a primary key, otherwise an error will occur.

C) Import to the specified directory: sqoop import –connect jdbc:mysql://localhost:3306/test –username hive –password hive –table point –target-dir /directory

If the output directory is not specified, create a subdirectory under /user/root/ with the same table name as the output directory by default. After the import operation, check to see if there are imported files in HDFS:

hdfs dfs -ls /user/root/point/

2. Import HBase

sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password root --table 
point --hbase-table HPoint --column-family info --hbase-row-key pointId --hbase-create-table

Parameter analysis:

–hbase-table: Specifies the table to be imported into Hbase database;

–column-family: Specifies the column family name;

–hbase-row-key: Specify rowKey;

–hbase-create-table: create table in hbase;

Finally, corrections are welcome. If you like it, give it a “like” and invite you to an apple.

Use SQOOP to import massive MySQL test data into HDFS and HBase

First, install SQOOP

1. Download SQOOP, unzip and rename the folder

2. Configure environment variables

vim /etc/profile

3, validation,

Create the database and table

The shell script generates test data

4. MySQL data import

1. Import HDFS

Parameter analysis:

2. Import HBase

Parameter analysis:

Related Posts

MySQL > create a lock on a MySQL index

“MySQL interview cheat sheet” index test points one side summary

Ubuntu installation MySQL8.0