A scenario analysis, demand In the process of the actual data platform operation management, the scale of the data table with more often for the construction of business data access and data applications and gradually grow to very large scale, data managers tend to want to take advantage of metadata analysis to better grasp the usage of different data table, so as to optimize the data model. This article describes how to use the MaxCompute metadata information to analyze the popular table and the unpopular table. MaxCompute Information_Schema Provides the full table metadata information Tables and task details tasks_history. By summarizing the number of times that each table is accessed by a job, you can obtain the number of times that different Tables are used by a job. The detailed steps are as follows: Obtain the details of the input_tables column in the tasks_history table. Tables and tasks_history table input_tables tables and tasks_history table input_tables tables and tasks_history table input_tables tables 1, Obtain details about the input_tables table in tasks_history. As shown below:
The result of querying data is as follows:
Input_tables (“lightning.customer”,”lightning.orders_delta”); For example, Ds>=’20190902′ and Ds<=’20190905′ can be adjusted as required.
The processing results are shown as follows:
2, statistics hot table data SQL preparation:
The results are shown below:
SQL preparation: through tables and tasks_history input_tables tables summary number of jobs associated, sorted, so as to count the number of tables in the specified time, positive order.
The result is as follows:
Sort all tables according to the number of times they are used to obtain the number of times they are used. To rationalize the management of the data sheet. Note: “your_project_name.” in SQL is the prefix of the table name. Customers need to modify the table name based on their actual data.
The original link
This article is the original content of the cloud habitat community, shall not be reproduced without permission.