Body:

1. Run the following command to import the IP address to Hive:

    sqoop import –connect jdbc:mysql://xx.xx.xx.xx/cc –username uuu –password xxxxxxxx –query ‘select id,`name`,slogan,min_price,max_price from item where $CONDITIONS’ –hive-table hhhttt –hive-import –hive-overwrite –hive-drop-import-delims –delete-target-dir –target-dir /hive/2019/10/17/ -m 1

Some parameters:

Specifying a temporary directory

–hive-overwrite Overwrites data that already exists in the Hive table

–hive-drop-import-delims Removes the default hive delimiter contained in imported data

–delete-target-dir –target-dir Specifies a temporary directory and deletes the temporary directory after the job is completed. Load data from temporary directories to hive (default /user/hive/warehouse), and then delete the directory.



2. If a column in mysql is null/””

sqoop import –connect jdbc:mysql://xx.xxx.xxx.xx/xx –username xxx–password xxxxxxxxxxxxxxxxxxxxx –query ‘select id,`name`,slogan,min_price,max_price from item where $CONDITIONS’ –null-string ’emp’ –null-non-string ‘-1’ –target-dir /temp/hivetest/ –hive-table hivetest.item20191017 –hive-import –hive-overwrite –hive-drop-import-delims -m 1

–null-string Indicates a field of the string type. If the Value is null, it is replaced by the specified character (emp in this example)

–null-non-string Indicates yes or no
Type stringWhen Value is NULL, it is replaced with the specified character, in this case -1

3. Dynamic partition table usage:

3.1 Creating a Dynamically Partitioned Table

          drop table if exists hivetest.item_p; 

           create table hivetest.item_p( 

                     id int, 

                     name string, 

                     slogan string, 

                     min_price string, 

                     max_price string ) 

          partitioned by(modify_date string

The location ‘HDFS: / / 192.168.0.136:9000 / user/hive/warehouse/hivetest/item_p’;

3.2 Importing data into a dynamically partitioned table

          sqoop import –connect jdbc:mysql://xx.xxx.xxx.xx/xx –username xxx–password ccc  –query ‘select id,`name`,slogan,min_price,max_price,date(modify_time) modify_date from item where $CONDITIONS’ –null-string ’empStr’ –null-non-string ‘-1’ –delete-target-dir –target-dir /temp/hivetest/ –hive-table hivetest.item_p –hive-import –hive-overwrite –hive-drop-import-delims -m 1

After the task is completed, partition table file directory as shown in the figure below:

3.3 Querying dynamic or Static Partition Tables

     select * from item_p where modify_date=’2019-10-15′;

3.4 Static and Dynamic Partitions

3.4.1 Using Static Partitions

Creating a Static partition

Sqoop import specifies parameters:

                                         –hive-partition-key ods_date 

                                         –hive-partition-value 2019-10-24

3.4.2 Dynamic Partitioning (Potholes encountered)

Enable Hive dynamic partitioning

Add the following configuration to hive-site. XML:

                                                <property>                                                       <name>hive.exec.dynamic.partition</name> 

<value>true</value> 

                                                </property> 

                                                <property> 

<name>hive.exec.dynamic.partition.mode</name> 

<value>nonstrict</value> 

                                                </property> 

Create (C1 int, C2 string) Partitioned BY (column name, type);

Sqoop imports (see 3 for an example), noting the partition column — the last column in the SQL of Query, where the columns are in the same order as the table

Pit:

The author has no problem using dynamic partitioning in hive(3.1.2) in a local open environment

An error occurs when the production environment uses dynamic partition by column, and the production environment (CDH6.3) keeps throwing exceptions

           Need to specify partition columns because the destination table is partitioned

Try to enable/disable dynamic partitioning in the production environment but the exception message does not change…

Have similar experience of the big guy also please do not hesitate to give advice.

Sqoop incremental import to Hive (Hive does not support SQoop lastModified mode)

      Pay attention to: careful–hive-overwrite Parameter = >Clear all existing data from the table

Example 1: sqoop import –connect jdbc:mysql://xxx.ccc.vvv.bbb:3306/item –username bbb –password mmmmmm –table item –incremental append –check-column modify_time –last-value ‘2019-10-25 17:40:00’ –null-string /n –null-non-string 0 –hive-import –hive-drop-import-delims –target-dir /hive/ods/item/ –hive-table ods.item -m 1

Example 2: sqoop import –connect jdbc:mysql://xxx.ccc.vvv.bbb:3306/item –username bbb –password mmmmmm –query ‘select id,version,name,images,state,slogan,start_time,end_time,favorites,category_code,min_price,max_price,fk_merchant_id,solds ,actual_solds,recommend,fk_goods_id,create_time,modify_time from item where $CONDITIONS’ –incremental append –check-column modify_time –last-value ‘2019-10-25 17:40:00’ –null-string /n –null-non-string 0 –hive-import –hive-drop-import-delims –target-dir /hive/ods/item/ –hive-table ods.item -m 1

End of the text.

Improper place, also please criticize correct.