1. The table engine inherits from MergeTree and can use AggregatingMergeTree to aggregate incremental data statistics. If you want to combine and reduce the number...
Good future is an education technology company founded in 2003, under the brand xueersi, now you hear of Xueersi Peyou, Xueersi online school are derivatives...
1. Flink environment build 1.1 Flink version list: https://archive.apache.org/dist/flink/ 1.2 select the latest 1.12.2 version 1.3 unzip the installation checks
This document describes the hazards, symptoms, and causes of data skew and the solutions for Spark data skew. For distributed big data systems such as...
Residential area is very important information in rental business, it can reflect the location and quality of housing. For tenants, the ability to browse accurate...
Resilient Distributed Dataset (RDD) Resilient Distributed Dataset (RDD) is the most basic abstraction in Spark. It represents an immutable and partitioned collection of elements that...
Due to the lack of unified specification implementation and platform tool support, most service personnel and technical personnel do not consider the importance of Hive...
You'll need to install Elastic Search, and you'll need to install elasticSearch-Head to see your data visually. JNA not found. Native methods will be disabled....
How to use PAI+MaxCompute to complete the AARRR link of user growth model, including drive new, promote live, retain, generate revenue, share. The author of...
1. Business background Alibaba e-commerce Search and recommendation real-time data warehouse carries the real-time data warehouse scenarios of Alibaba Group Taobao, Taobao Special Edition, Ele....
SparkSQL is another outstanding module in spark stack. By introducing SQL support, it greatly reduces the use cost of developers and learners. It enables developers...
Last year, Ali Cloud released the local IDE plug-in Cloud Toolkit, IntelliJ IDEA platform alone, more than 150,000 developers downloaded, experienced the one-click deployment brought...
Hello, I'm Fan Donglai. Today we are going to talk about a relatively basic and important content called MapReduce. The reason why MapReduce is fundamental...
In the early days of big data technology application, we used Sqoop as a data synchronization tool to meet the daily development requirements of data...
As we all know, the other thing that makes Spark powerful is its distributed architecture, in addition to its powerful data processing capabilities. As an...
According to the speed of response to user behavior, the system can be roughly divided into offline training and online training. Offline training recommendation system...
What is ApacheFlink? In the era of rapid data volume, a large number of business data are generated in various business scenarios. How to effectively...
Open Source Developer Notes: DevOps, microservices, distribution, Big data, high availability, blockchain, whitepapers, Algorithms, hacking, design patterns, interview questions. Star ⭐️ Apache Ranger is a...
In recent years, the infrastructure construction of domestic road traffic and related facilities is changing with each passing day. The vast number of users have...
The real-time requirements of different computing frameworks are gradually increasing. Spark is a layer 4 computing framework in the whole big data technology framework. Spark...