Apache Kylin is an open source distributed analysis engine that provides SQL query interfaces on top of Hadoop and multi-dimensional analysis (OLAP) capabilities to support...
Hologres (Chinese name) interactive analysis is ali cloud from the research on the number of one-stop real-time warehouse, the cloud native system combines real-time service...
With large data sets, accurate de-duplication and fast query response can be challenging. We know that the most commonly used processing method for accurate reduplication...
Spark is undoubtedly a powerful processing engine and a distributed clustered computing framework for faster processing. Unfortunately, Spark also falls short in several areas. If...
2) Standalone: build a resource scheduling cluster based on Mster+Slaves, and Spark tasks are submitted to Master for operation. Spark is a scheduling system of...
Optimized compression of shuffle in Hive reduces the amount of data stored on disk and improves query speed by reducing I/O. Enable compression for a...
$SPARK_HOME/bin/spark-shell = $SPARK_HOME/bin/spark-shell = $SPARK_HOME/bin/spark-shell = $SPARK_HOME/bin/spark-shell = $SPARK_HOME/bin/spark-shell = $SPARK_HOME/bin/spark-shell Once in, you can see that SC and Spark have been initialized.
Under the background of digitalization and intelligent transformation, data, as the core means of production of enterprises, is expected to play a greater value. From...
Recently there is a requirement, real-time statistics pv, UV, results according to date,hour, PV, UV display, statistics by day, re-statistics the next day, of course,...
Recently, many people ask xiaobian how they learn big data so much. Many beginners in the initiation of the direction of big data development ideas,...
SparkContext This article uses spark source code version 2.3.4. SparkContext Note Let's look at a comment of spark source code. When entering SparkContext, you can...
In this article, our other Spark machine learning API, called Spark ML, is the recommended solution for developing big data applications using data pipelining.
Big data and the traditional BI is the product of social development in different stages, big data for traditional BI, both the inheritance, also have...
As anyone who has done ETL work with data cleansing knows, row and column transformation is a common data collation requirement. There are different implementations...
Background Spark Structured Streaming is used to process Streaming data in project development. The processing flow is as follows: Message Middleware (Source) -> Spark Structured...