Recently, I saw her answer some of the most perplexing energy questions on blank Woman, such as why can I do so many things at once and have so much energy? Follow one truth at work:

Simplify complex things, standardize simple things, streamline standard things, streamline things as tools, automate instrumental things, and outsource things that can’t be automated.

In the process of data development, some processes can be instrumentalized to improve work efficiency and free up more time and energy to improve themselves.

tools

In the daily data development process, we often need to write table building sentences according to the data model. It takes several minutes to write table building sentences every time, and it is easy to make some low-level mistakes. Therefore, WE plan to make an Excel template, write table fields, table partitions and table names in it, and generate table building sentences automatically through the program. The effect is shown below:

To make the template

Table name, Chinese description of the table, field name, data type, field description, is not null, unique primary key, table partition, etc. (can be adjusted according to the actual situation)

The template consists of three parts:

  • Lines 1-3 are the table name, the Chinese description of the table, and the template column description
  • 4-19 Basic information about the fields in the behavior creation table
  • 20 Behavior partition field

Realize the principle of

Obtain the data model file in the specified directory. The convention is an Excel file ending with the data model.xls or data model.xlsx file name

Loop through each template file, according to the template composition specification, parse the file, splicing SQL statements, generate table construction sentence file

Generate results

CREATE EXTERNAL TABLE ods_cbonddescription (
    object_id string  COMMENT 'object ID',
    b_info_fullname string  COMMENT 'Name of Bond',
    s_info_name string  COMMENT 'Bond for short',
    b_info_issuer string  COMMENT 'Issuer',
    b_issue_announcement string  COMMENT 'Date of Publication',
    b_issue_firstissue string  COMMENT 'Date of Issue',
    b_issue_lastissue string  COMMENT 'Issue End date',
    b_issue_amountplan bigint  COMMENT 'Total Planned Issuance (100 million Yuan)',
    b_issue_amountact bigint  COMMENT 'Total Actual Issuance (100 million Yuan)',
    b_info_issueprice bigint  COMMENT 'Issue price',
    b_info_par bigint  COMMENT 'value',
    b_info_term_year int  COMMENT 'Bond Term (year)',
    b_info_term_day int  COMMENT 'Bond Maturity (days)',
    b_info_paymentdate int  COMMENT 'Date of payment',
    b_info_paymenttype int  COMMENT 'Method of interest',
    s_info_exchmarket string  COMMENT 'Exchange'
)
COMMENT 'Basic Bond Information' 
PARTITIONED BY(dt string) 
ROW FORMAT DELIMITED '\t' 
STORED AS ORC 
LOCATION 'hdfs://host:8020/dw/ods/ods_cbonddescription';

Copy the code

conclusion

If we use what kind of template in our work, we can communicate with each other how to form a certain tool script to improve our work efficiency.

Finally, follow the public account HelloTech and reply “602” to get the template and code.