Introduction to Sqoop

Sqoop is a common data migration tool, mainly used to import and export data between different storage systems:

  • Import data: Import data from relational databases such as MySQL and Oracle to distributed file storage systems such as HDFS, Hive, and HBase.

  • Export data: Exports data from a distributed file system to a relational database.

The principle is to convert execution commands into MapReduce jobs to realize data migration, as shown in the following figure:

Second, the installation

Version selection: There are two versions of Sqoop, Sqoop 1 and Sqoop 2, but Sqoop 2 is not officially recommended as it is incompatible with Sqoop 1 and not fully functional, so Sqoop 1 is preferred.

2.1 Download and decompress the file

Download the required version of Sqoop, in this case the CDH version of Sqoop. Download address is: archive.cloudera.com/cdh5/cdh/5/

#Download it and unzip itThe tar - ZXVF sqoop - 1.4.6 - cdh5.15.2. Tar. GzCopy the code

2.2 Configuring Environment Variables

# vim /etc/profile
Copy the code

Add environment variables:

Export SQOOP_HOME = / usr/app/sqoop - 1.4.6 - cdh5.15.2 export PATH = $SQOOP_HOME/bin: $PATHCopy the code

Make configured environment variables take effect immediately:

# source /etc/profile
Copy the code

2.3 Modifying The Configuration

Go to the conf/ directory in the installation directory and copy the Sqoop environment configuration template sqoop-env.sh.template

# cp sqoop-env-template.sh sqoop-env.sh
Copy the code

Change sqoop-env.sh to the following (HADOOP_COMMON_HOME and HADOOP_MAPRED_HOME are mandatory, others are optional) :

# Set Hadoop-specific environment variables here.
#Set path to where bin/hadoop is availableExport HADOOP_COMMON_HOME = / usr/app/hadoop - server - cdh5.15.2
#Set path to where hadoop-*-core.jar is availableExport HADOOP_MAPRED_HOME = / usr/app/hadoop - server - cdh5.15.2
#set the path to where bin/hbase is availableExport HBASE_HOME = / usr/app/hbase - 1.2.0 - cdh5.15.2
#Set the path to where bin/hive is availableExport HIVE_HOME = / usr/app/hive - 1.1.0 - cdh5.15.2
#Set the path for where zookeper config dir isExport ZOOCFGDIR = / usr/app/zookeeper - 3.4.13 / confCopy the code

2.4 Copying database drivers

Copy the MySQL driver package to Sqoop installation directory lib directory, driver package download address for dev.mysql.com/downloads/c… . I have also uploaded a copy under the Resources directory of this warehouse. If necessary, you can download it by yourself.

2.5 validation

Since the bin directory of SQoop has been configured to the environment variable, verify that the configuration is successful by using the following command:

# sqoop version
Copy the code

If the version information is displayed, the configuration is successful:

HCatalog and Accumulo are not used by us. Sqoop checks to see if the software is configured in the environment variable at startup. If you want to remove these warnings, you can modify bin/configure-sqoop to comment out unnecessary checks.

# Check: If we can't find our dependencies, give up here.if [ ! -d "${HADOOP_COMMON_HOME}" ]; then echo "Error: $HADOOP_COMMON_HOME does not exist!" echo 'Please set $HADOOP_COMMON_HOME to the root of your Hadoop installation.' exit 1 fi if [ !  -d "${HADOOP_MAPRED_HOME}" ]; then echo "Error: $HADOOP_MAPRED_HOME does not exist!" echo 'Please set $HADOOP_MAPRED_HOME to the root of your Hadoop MapReduce installation.' exit 1 fi
## Moved to be a runtime check in sqoop.
if [ ! -d "${HBASE_HOME}" ]; then
  echo "Warning: $HBASE_HOME does not exist! HBase imports will fail."
  echo 'Please set $HBASE_HOME to the root of your HBase installation.'
fi

## Moved to be a runtime check in sqoop.if [ ! -d "${HCAT_HOME}" ]; then echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail." echo 'Please set $HCAT_HOME to the root of your HCatalog installation.' fi if [ !  -d "${ACCUMULO_HOME}" ]; then echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail." echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.' fi if [ !  -d "${ZOOKEEPER_HOME}" ]; then echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail." echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.' fiCopy the code

See the GitHub Open Source Project: Getting Started with Big Data for more articles in the big Data series