MIT 6.824 Distributed Systems is a well-known course that explains the design principles of distributed systems. Learn the principles of distributed system design through lectures and experiments. See the schedule for experiments and courses.

preface

Why am I taking this course? I was introduced to this course by a gay friend when I expressed my interest in distributed system, but I did not start learning it for various reasons. It was only recently that the design of distributed caching systems began to be studied again. If you have read my previous article, YOU may know that I am interested in the research content of Redis, and then I became interested in how Redis does distributed caching, so I started to look up information. Later, I found that ETCD is also very good in this area. In the process of learning ETCD, I learned raft protocol. Then checked the course to have introduce Raft agreement paper and related experiments, just learned that in the spring of 2020 course has the official version of the video and the enthusiastic netizens share to B to stand for, plus the need to use the language to achieve complete experiment, based on the study of distributed system design principles and practice the purpose of the language, and began to learn this course.

In fact, ETCD and Redis are completely different concepts, ETCD is mainly used for distributed locks and cluster core configuration, the core feature is high availability; And Redis is the in-memory database, the purpose is to do distributed cache, save data.

To prepare data

To learn this course, I first read the introduction of the course home page, and then study according to the curriculum schedule. It is stated in the curriculum schedule that I should read the paper first and then go to class (or watch the video). I should read the paper first and then watch the video, otherwise I will not know what the professor is talking about when watching the video.

The class steps are: read the paper -> watch the video -> do the experiment.

Introduction of graphs

Through studying papers, course videos and experiments, I have a preliminary understanding of MapReduce. Here I summarize my understanding.

MapReduce, essentially a programming model, is also a related implementation for processing large data sets. The purpose of this model is to hide “parallel computing, fault tolerant processing, data distribution, load balancing”, so as to realize an abstraction of big data computing.

MapReduce programming model:

  • Map: Receives a set of input key/value key-value pairs and generates a set of key/value key-value pairs called intermediate values.

  • Reduce: The input is the set of key/value key-value pairs generated by map, and the values of the same key in the intermediate set are merged.

The entire process is abstracted as follows:

In distributed systems, there are many other issues to consider besides the program, such as concurrency, fault tolerance, etc. For distributed MapReduce, the following is a classic flow chart for execution overview:

As can be seen from the figure, Map and Reduce programs are distributed on multiple machines, and fragmented data is taken out for processing. Data can be processed in parallel by multiple machines, while how to distribute data and program management are composed of Worker and Master. The implementation process is roughly as follows:

  • The system will start one or more masters, and the machine that needs to perform the task will start the Worker to process the task. The Master’s main responsibility is to assign tasks to workers. The Master can randomly select idle workers to assign tasks, or the Worker can actively request tasks to the Master.

  • Obtain the Worker of the Map task, read the data fragment, generate a set of intermediate values of key/value pairs, and write the data to the local file. Here, the data of each map task is divided into R parts (the number of Reduce tasks created by the Master). Using user-defined partitioning functions (such as hash(key) mod R) to determine which file to store the key in;

  • The Worker of the Reduce task requests data through remote call. After the data is loaded, it sorts the data, traverses the data, merges the data with the same key, and finally outputs the results.

  • When all map and Reduce tasks are completed, the whole MapReduce program is finished. The processed data of workers are usually stored in different small files, and the most important result is the merging of these files.

The above is my understanding of MapReduce paper summary, as well as other localization, task granularity, merge and sort procedures, performance and other topics, because in the experiment is not very deep impression, so I will not explain here.

In addition, we should focus on fault-tolerant processing. If the Master interrupts and the Worker program crashes, how to deal with these situations? The solution mentioned in the paper is to store the results in a temporary file and wait until the task is finished before writing to the output file.

The way to experiment completion

Unable to start

After reading the MapReduce paper, I watched the first two videos of the course, understood most of them, and was excited to start experimenting. After the code was pulled down, I found that I couldn’t start. I spent a whole night agonizing over the experimental questions and codes. I only knew that experiment 1 was to implement a distributed MapReduce, but I didn’t know what to do because the code already had Map and Reduce functions.

Repeat the material

After the failure first started doing the experiment, it took a few nights will repeatedly watched twice, see the video again, and in a second study, more impressive, over and over again see the title, description mentioned after every change the program, to perform the test – Mr. Sh, contains a lot of test cases, as long as passed all test cases, So the experiment is done. So I went to the test case file, combined with the problem description, and I finally knew what to do.

Test-mr. sh contains five test cases: Word count MapReduce, Index MapReduce, parallel Map, parallel Reduce, and program crash. For example, to check the word count, run mrsequential. Go to output a file named MR-correct-wC. TXT, then start mrMaster and mrworker, and merge the results into an MR-WC-all file. If the two files are the same, the use case will pass. To do this, see what’s going on inside Mrsequential. Go, and write a distributed program to do what Mrsequential.

As long as the above 5 test cases are completed, the experiment is completed, and map and Reduce programs have been implemented. What needs to be done is to realize the master and worker mentioned in the paper:

1. How to assign tasks? How are Map and Reduce jobs allocated? (Use Case 1, 2)

2. How to implement parallel processing? (Use Case 3, 4)

3. How to judge Worker’s death? How to recover from Worker failure and how to process ongoing tasks? (Use Case 5)

4. How to deal with the results after the task is completed?

5. The communication between Worker and Master is through RPC. How to maintain the state between them?

Once you understand what the requirements are, it’s time to design and code.

The system design

All programs are designed to be nothing more than data structures and algorithms, and for this experiment, that’s true.

The data structure

Define the Master and Task data structures as follows:

typeMaster struct { nReduce int nMap int mapTasks []Task reduceTasks []Task state int // MASTER_INIT; MAP_FINISHED; REDUCE_FINISHED mapTaskFinishNum int reduceTaskFinishNum int }Copy the code
typeTask struct { State int // TASK_INIT; TASK_PROCESSING; TASK_DONE InputFileName string Id int OutputFileName string TaskType int // MAP_TASK; REDUCE_TASK NReduce int NMap int StartTime int64 }Copy the code

Implementation flowchart

Based on the understanding of the thesis and the experimental topic, two structures, Master and Task, are designed to achieve the following functions:

1. After the Master is started, the Master state is INIT, and the map task is initialized according to the startup parameters

2. Start the Worker, request the Master to assign a task, and then process the task (Map /reduce)

3. Notify Master to update the task status to complete after the processing is completed; When a task is complete, check whether all Map/Reduce tasks are complete and update the Master status based on the completion progress

4. After all tasks are completed, the Master state is REDUCE_FINISHED

Collapse treatment

As for the processing of worker crash, it is mentioned in the experimental hint that the Master cannot clearly distinguish whether the worker has timed out or crashed, so it needs to design a timeout time (such as 10 seconds). If the task has timed out, the task is considered unfinished and will be reassigned next time. The implementation is that when the Master assigns a task, it initializes a start time. When the Master assigns a task, it checks the ongoing task. If the task is not completed and timed out, it reallocates the task to the Worker.

ALL PASS

The moment all the test cases passed, there was a small thrill, akin to passing a lab question in college.

Q&A

Share some problems encountered in learning:

1. What’s the use of learning this?

If you are interested in distributed systems and want to strengthen your understanding of distributed systems through practice, then this course will help. If you are not interested, then this article is of no use to you.

2. How do you start learning?

Look at the course homepage, according to the course schedule, first read the paper, then watch the video to understand the general beginning of the experiment, and then watch the paper and video to deepen your understanding.

3. After watching the video, how does the experimental program run? How do I start writing the first line of code?

For example, the first point of the prompt says that the first step to make the code run is to modify the worker function of Mr /worker. Go and send an RPC request to Master to request a task data.

4. Papers, courses and titles are all in English. What should I do if I can’t understand them?

Brave scalp to see, do not understand the translation, of course, can see the Chinese version, there are a lot of resources online. The course video has a Chinese subtitle made by enthusiastic netizens, you can see the Chinese subtitle.

In addition, many say, or recommend to the English, had no intention of selling, just for program development, English ability is a necessary skill, because at ordinary times to check the problem when data are in English, and read data is one of the best, this article is my digest the knowledge sharing, There may be a lot more in the paper and the course that I can’t see but you can see.

5. Is there a code link?

Talk is cheep, show me the code. However, as the course emphasizes not to look at others’ implementations as much as possible, some of them have been put on Github and removed by MIT, so the author will not share all the code, and can communicate privately if necessary.

conclusion

After learning the first two lessons and completing the MapReduce experiment, I have the most superficial understanding of distributed system, which is not to say to master. This is just the simplest experiment. More important courses and experiments are still ahead, and there is a long way to go.

If you’re learning, I hope this article has been helpful. Welcome the students who are interested to study and discuss together.

Original article, writing is limited, talent and learning shallow, if the article is not straight, hope to inform.

If this article was helpful to you, please click “like”, thanks ^_^

More exciting content, please pay attention to the individual public number.