In SQL, there are a class of functions called aggregate functions, such as sum(), avg(), Max (), etc. These functions can regularly aggregate multiple rows...
Hive + Tableau Calculates Retention Rate Calculates Retention Number Background: We need to calculate the monthly retention data of the APP in the last 2...
In the process of data model design of data warehouse, such requirements are often encountered :(1) the amount of data is relatively large. (2) Some...
However, it is difficult to build a perfect database if we are limited to using Hive without considering performance issues, so Hive performance tuning is...
Brief introduction: The client built the data warehouse and analysis system in the Hadoop cluster of IDC or the public cloud environment, purchased the AliCloud...
Hive achieves parallel processing by dividing a query into one or more MapReduce tasks. Each task may have multiple mapper and reducer tasks, at least...
Original link: [Link] Duplicates when importing data according to auto-increment ID, Incremental importsSQOOP provides an Incremental import mode which can be used to. Incremental importsSQOOP...
Business requirements: HDFS Hive Data Cleanup (ETL) is used to describe the process of extracting, transforming, and loading data from the source end to the...
Summary: Flink 1.13.0 makes the use of streaming applications as simple and natural as a normal application, and allows users to better understand the performance...
Description: DataWorks Migration Assistant provides the ability to quickly migrate tasks from open source scheduling engines Oozie, Azkaban, and Airflow to DataWorks. This article focuses...
Summary: DataWorks provides the ability to quickly migrate tasks from open source scheduling engines Oozie, Azkaban, and Airflow to DataWorks. This article focuses on how...
Introduction: Di Xingxing, the person in charge of Autohome real-time computing platform, shared on Meetup of Shanghai website on April 17, the integrated architecture practice...
Optimized compression of shuffle in Hive reduces the amount of data stored on disk and improves query speed by reducing I/O. Enable compression for a...