preface
You can follow my public account by long pressing the QR code, but it will be synchronized after the nuggets update.
PC end of the right side of the directory to find things really easy to find a point, I write every time also very attention to see this piece of typesetter is not neat 🤣, use here to directly jump to the corresponding content is really convenient, so I want to do a directory, if you write a new, will also write in this title.
According to the different technical framework of the division form. If you’ve written something down, write it down. If you haven’t written anything down, leave it blank.
1. Distributed storage HDFS
â‘ HDFS Basic Concepts
Content Summary: Concepts and copies of blocks, rack storage policies, three components: NameNode, DataNode, SecondaryNamenode, metaData, heartbeat mechanism, and load balancing mechanism
(2) HDFS read and write process and some important policies
Content Summary: HDFS read and write process, Hadoop HA high availability, federation, HAR and Sequence files for storing small files
3 basic summary and architecture evolution of HDFS
Summary: Some summaries and additions to the previous two articles
2. Distributed computing MapReduce
1) graphs is introduced
Mapper and Reducer code, shuffle, two rows, data skew judgment and reduction
3. Resource scheduling Yarn
Big data (4) — Resource scheduling framework Yarn
Summary: It’s all theoretical stuff. Yarn application scenarios, core components, application scheduling process, and typical yarn applications
4. Distributed coordination Zookeeper
Zookeeper is written in accordance with Java routines and has nothing to do with big data operations. It may be supplemented accordingly later
Basic concepts of Zookeeper
Content Summary: Includes the introduction and features of ZooKeeper, session mechanism, data composition and node type of ZNode, and zK listening mechanism
â‘¡ Zookeeper implements distributed locks
Content Summary: The characteristics of lock, the use of zkClient, the use of non-repeating node +watch mechanism to achieve distributed lock, the use of the number + minimum number lock +watch principle to achieve distributed lock
3. Set up the Zookeeper cluster and elect its leader
Content Summary: Pseudo-cluster form of ZooKeeper cluster construction, cluster connection and monitoring, paxOS algorithm explanation, ZooKeeper leader election mechanism
4 Distributed queues of Zookeeper
Content summary: ZAB protocol introduction, data synchronization, discard transactions, leader crash recovery, ZooKeeper distributed queue implementation logic and code implementation
⑤ Configuration center application of Zookeeper
Content summary: Configuration center introduction, data structure, code implementation
The Zookeeper Master election and the official website overview
Content summary: Master election and Zookeeper related implementation, official website own a browsing process
Hadoop source code and optimization
Preambles two RPC basics
High concurrency from zero (7) – introduction, protocol and framework of RPC
What is RPC, the three processes, why do we need it, its features and application scenarios, RPC flow and protocol definition and its framework
High concurrency from zero (8) – simple implementation of RPC framework
Content Summary: RPC process and task analysis and code implementation, with process optimization. It is recommended to jump directly to the general diagram for the optimization part
Hadoop source code chapter – NameNode startup process analysis
NameNode is an RPC server. NameNode is an RPC server. NameNode is an RPC server
Hadoop source code – DataNode initialization and registration process
DataNode startup process analysis, the idea is also to verify whether it is RPC client, and Hadoop HA high availability solution principle
Namenode metadata management and dual buffering
Content summary: as the title
6.Hive
7.Hbase
(1) MySQL synchronizes data to HBase
Summary of content: as mentioned, there are some details
8.Flume
9.Sqoop
10.azkaban
11.impala
12.Spark
â‘ Learn Spark from scratch
Content Summary: Four features of Spark, infrastructure, installation, and task submission
â‘¡ This article walks you through the basic concepts of Spark RDD
RDD can be divided into three stages: checkpoint, cache, checkpoint, DAG, and stage
â‘¢ Some minor questions about Spark foundation
Content summary: add some knowledge not mentioned in the previous two articles, such as broadcast variables, task scheduling, serialization problems
â‘£ This article will help you understand all aspects of Spark Core tuning
Content summary: Refer to the previous Spark article of Meituan, including the ten principles of Spark development and Spark operation process, as well as memory model tuning and data skew processing
⑤ Fault tolerant mechanism of Spark Streaming
As the name implies, Executor and Driver fault tolerance
â‘¥ Complete your first Spark Streaming program
Content summary: As the name implies, the running process description and BlockInterval and BatchInterval description, setMaster understanding supplemented
13. Kafka
Introduce you to Kafka
Content Summary: Kafka is composed of two main parts: partition, producer, consumer, message, copy, consumer group, controller, Kafka, and ZooKeeper. Kafka is composed of two parts: partition, producer, consumer, message, copy, and zooKeeper. Segmented log storage mechanism and Kafka three-tier network model
â‘¡ Kafka cluster deployment practice and operation and maintenance related
This article is not about concepts, but about the parameters of clustering, the important parameters of cluster setup, the simple operation of clustering, and some clients
â‘¢ Kafka producer principle and important parameter description
Producer principle (ProducerRecord, Partitioner, buffer and Sender thread), producer code and some tuning parameters
(4) Analysis of producer case and consumer principle of Kafka
Content summary: a small case of producer implementation and consumer principle (offset, Coordinator), consumer code and core parameters
⑤ Kafka operation process summary and source preparation
LEO&HW update principle, kafka run the total process comb, source code reading environment
14. Kafka source code
Kafka source preheat – Java NIO
The difference between traditional IO and NIO, NIO (buffer, channel, selector, pipe) introduction and obstructive and non-obstructive network communication code demonstration, mainly for Kafka source code preparation
Kafka source code – you can definitely get the Kafka Producer initialization and metadata retrieval process
Java example analysis KafkaProducer initialization process and sending process, metadata management and waitOnMetadata work logic
Kafka accumulator is probably the most detailed RecordAccumulator interpretation you have ever seen
RecordAccumulator is an accumulator buffer
15.Flink
â‘ Flink basic introduction
Content summary: Flink’s four major features and case notes, different modes of installation submitted
â‘¡ Flink operator operation
Flink shell usage, data source and common operator examples
â‘¢ Go through the various states of Flink
Flink’s state code sample, shown as the official example
â‘£ Flink’s checkPoint mechanism
Content summary: Checkpoint modification of the previous program, checkpoint mechanism and how to use
16.ELK
Build your ElasticSearch and Kibana quickly
Build ElasticSearch and Kibana locally, with some simple operations for ES
finally
Even if the front is still fraught with thorns, but we have no reason to give up running
This flag is a test for both you and me. In the first HDFS, we said that although it is similar to a study note, there is definitely a beginning and a end. We will use the most clear language to describe the knowledge points. With this list to prove it, I believe I can keep my word.
Now there are in the operation of their own knowledge planet, free but does not mean that there will be no harvest. Students interested in the direction of big data can pay attention to it