Recently, I saw Blank Woman answering perennial energy questions, such as how can I juggle so many things at once and have so much energy? Follow one truth at work:
Complex things should be simplified, simple things should be standardized, standard things should be routed, routed things should be instrumented, instrumented things should be automated, and things that cannot be automated should be outsourced.
In the process of data development, some processes can be instrumented to improve work efficiency and free up more time and energy to improve themselves.
tools
In the process of daily data development, it is often necessary to write a table-building sentence according to the data model. Each time it takes a few minutes to write a table-building sentence, and it is also easy to make some low-level mistakes. Therefore, I plan to make an Excel template, write the table fields, table partitions and table names in it, and automatically generate table-building sentences through the program. The effect is as follows:
To make the template
It mainly includes table name, table Chinese description, field name, data type, field description, is not null, unique primary key, table partition, etc. (can be adjusted according to the actual situation)
The template consists of three parts:
- Lines 1-3 are the table name, the Chinese description of the table, and the template column description
- 4-19 Basic information for creating table fields
- Behavior partitioning field
Realize the principle of
Gets the data model file in the specified directory, by convention an Excel file ending with the file name “data model.xls” or “data model.xlsx”
Loop through each template file, form the specification according to the template, parse the file, splice the SQL statement, and generate the table construction sentence file
Generate results
The CREATE EXTERNAL TABLE ods_cbonddescription (object_id string COMMENT 'object ID, b_info_fullname string COMMENT' bonds name, S_info_name string COMMENT on "issue date", b_issuer string COMMENT on "issue date", b_issuer announcement string COMMENT on "issue date", B_issue_firstissue string string COMMENT 'issue date ', b_issue_lastissue string string COMMENT' issue date ', B_issue_amountplan bigint COMMENT "", b_issue_amountplan bigint COMMENT", B_info_issueprice bigint COMMENT ", b_info_par bigint COMMENT ", b_info_term_year int COMMENT ", b_info_issueprice bigint COMMENT ", b_info_issueprice bigint COMMENT ", b_info_issueprice bigint COMMENT ", B_info_term_day int COMMENT 'bond maturity (day) ', b_info_paymentdate int COMMENT' paymentdate ', b_info_paymenttype int COMMENT ', S_INFO_EXCHMARKET STRING COMMENT 'Partitioned BY(DT String) ROW FORMAT Dimited '\t' Stored AS ORC LOCATION 'hdfs://host:8020/dw/ods/ods_cbonddescription';
conclusion
If we use what kind of template in our work, we can communicate with each other how to form a certain instrumental script to improve our work efficiency.
Finally, pay attention to the public number, reply “601” to get the template and code.