Abstract: Clickhouse, as an OLAP database, has very limited transaction support. This article mainly introduces how to update and delete Clickhouse data by ReplacingMergeTree.
Abstract: How to test the function, performance, reliability and other aspects of tens of thousands of data lakes in the laboratory has also become a...
1. In the production environment, how to handle the configuration file && table data processing configuration file, or configuration table, usually stored in the online...
Before Spark, mature computing systems such as MapReduce exist and provide high-level apis (Map/Reduce) to implement distributed computing by running computing in clusters and providing...
//contains Compares objects to equals. TreeSet also implements the Comparable interface and overwrites the compareTo method. 1. Use generics no longer need to be fault-tolerant,...
Small knowledge, big challenge! This article is participating in the creation activity of "Essential Tips for Programmers". Given the mapping function f, map(f) counts the...
| | implementation way for spark submit way, whether the client to submit or cluster to submit, are SparkApplication inheritance. Client submission, subclass is JavaMainApplication,...
After deploying the scheduled pumping plan, I found that sometimes V\P\N would be disconnected, leading to the failure of pumping, so I modified it combining...
How quickly to process, understand, and respond to data is critical for many event-driven applications. In analysis and data processing for these scenarios, calculating precise...
With the rapid development of big data cloud computing and the blooming of relevant technology hotspots, major manufacturers in the industry have formulated corresponding strategies...
Pulsar is a Yahoo! Open-source messaging middleware in 2016 and Apache's top level project in 2018. In my previous articles I have written about many...
Java UDF supports multiple data types: BIGINT, STRING, DOUBLE, BOOLEAN, ARRAY, MAP, STRUCT, etc. And the Writable parameter. Java UDFs use complex data types...
Term Frequency -- Inverse Document Frequency (TF-IDF) is a commonly used weighting technique for information retrieval and data mining. Tf-idf is a statistical method used...
To sum up, groupArray() builds arrays from grouped values, and sortArray() sorts these values, using one array as the sorting key for the others. All...
Real time warehouse construction practice. Although real-time computing has only become popular in recent years, in the early stage, some companies have the demand for...
After deploying Standalone Hadoop in the previous article, I tried to deploy a Hadoop cluster. A Hadoop cluster requires at least three machines because the...
Abstract: In traditional big data clusters, user data is stored in HDFS in plain text. Cluster maintenance personnel or malicious attackers can bypass HDFS permission...
ClickHouse is a columnary-storage database (DBMS) open-source by Yandex of Russia in 2016 for online analytical processing queries (OLAP) that generate reports of analytical data...
Since 2017, Ali HBase has moved to public cloud. We have planned to gradually provide ali's internal HIGH availability technology to external customers. At present,...
One, one key to control the capture of "fugitives" two, intelligent empowerment, complete the "impossible" to complete the task three, science and technology to assist...