Provides in-memory for RDD that requires caching in user programs through its own Block Manager. RDD is cached directly within the Executor process, so tasks...
This series of blogs summarizes and shares examples drawn from real business environments, and provides practical guidance on Spark business applications. Stay tuned for this...
OPPO encountered many classic big data problems during the evolution of the big data offline computing platform, such as shuffle failure, small file problem, metadata...
The Kafka version used in this paper is kafka_2.12-2.2.0, so the second way to integrate. In the sample code kafkaParams encapsulates the Kafka consumer properties,...
Spark programming, author introduction, Big data era, the third information wave The third information wave, information technology to provide technical support for the era of...
Spark Streaming application getting information from Kafka is a common scenario. Reading continuous data from Kafka has many advantages, such as good performance and speed....
This set of technical column is the author (Qin Kaixin) usually work summary and sublimation, through extracting cases from the real business environment to summarize...
Because the article is written relatively old, so it is a HashSHuffle principle article. But it's still a very good principle paper. Sort Base Shuffle...
The Kafka version used in this paper is kafka_2.12-2.2.0, so the second way to integrate. In the sample code kafkaParams encapsulates the Kafka consumer properties,...
Resilient Distributed Datasets RDD is a Distributed memory abstraction that represents a read-only collection of record partitions that can only be created by other RDD...
L "Machine learning is a science of artificial intelligence. The main research object of this field is artificial intelligence, especially how to improve the performance...
Spark is based on the Resilient Distributed Dataset (RDD) to solve the problem. The RDD is distributed computing. Some operations on Spark can trigger the...
Spark tuning is a common method. In production, a variety of problems are often encountered. There are pre-cause reasons, in-process reasons, and non-standard reasons. Allocate...
AI Front Line introduction: At the Spark+AI Summit yesterday, Matei Zaharia, a key Spark and Mesos author and chief technologist at Databrick, announced the launch...