0 x00 preface

In 2014, Doug Cutting, the father of Hadoop, gave a lecture at Tsinghua University. At that time, three of his classmates rode bicycles to the lecture and took notes carefully. Now turn out to review the feeling or have a lot of feelings, sort out slightly to share.

0x01 Lecture Notes

The lecture lasted for about an hour and started at 2:30. Doug Cutting had about 7 PPT in the first half hour and interaction in the second half hour.

Doug Cutting talked about his open source business, Lucene, Hadoop and so on in about 7 slides, each with a single title and a picture in the body.

PPT One:Means For Change : Hardware

Moore’s law, the speed at which processors and hardware updates are stored. This is a hardware foundation.

PPT Two:Fuel For Change : Data

There is a logic here that leads to the importance of Open Source.

Firstly, Software is eating the industry. This will produce a variety of data, and the amount of data is very large, very high value; Tools are needed to process this data, which leads to the next slide: OpenSource.

PPT Three: Seeds For Change: Open Source

About the benefits of open source software about, not particularly much, generally is also convenient and open, useful so use.

One of his ideas for starting an open source Business was that while working as Lucene, he realized that Business was not for him, so he gave it away

This PPT also mentions three important components, which are probably the whole computer industry.

The three are Hardware, Data and Software

PPT Four: New DataStyle: Hadoop

This PPT introduces Hadoop. Hadoop is introduced briefly. Speaking of GFS, much of Hadoop’s thinking is based on GFS. Google published a paper with this theory, and everybody was interested, but it wasn’t Google, so it wasn’t very easy to use. That’s where Hadoop comes in, OpenSource is convenient and easy to get. It has its natural advantage of being close to the people.

Doug Cutting mentioned that he went to Yahoo because Yahoo had a lot of data to handle and a lot of hardware to work with, which was a good fit for him.

PPT Five:Style Catches on:Ecosystem

Hive, Pig, Spark, etc.

Unit 01 The Economist: Enterprise Data Hub

I briefly talked about my work at Cloudera and the importance of Enterprise Data Hub. I remember saying that I am lucky in the right place in the right time. I mentioned that this is the future tool.

PPT Seven: The Data multi-tool

I’m going to end up talking about some of the things that Hadoop does, and I’m going to give you an example, and this is the picture on the slide, which is a cell phone. Mobile phones can do many things, such as taking pictures, but the functions of taking pictures are not as good as those of some professional cameras. But one thing is for sure, you spend more time taking pictures with your phone than with your camera, why? Because your phone is always with you, you can use it whenever you want, and besides taking pictures, I can also share them, and in general, it’s already there and it’s convenient.

Hadoop is similar. There are a lot of computing frameworks out there, Spark, Storm, things like that. There is no need to deny the existence of others. Hadoop is familiar and widely used. If you need it, you may have a Hadoop cluster environment, and some calculations may be better than Spark, but Hadoop can also be done and used easily.

That makes me think of an operating system, not necessarily Windows is the best, but everyone is used to it, that is enough, and then a new operating system comes along, unless you make me feel like I don’t want to use Windows anymore, Windows is enough, I don’t have to change it, something like that.

0x02 Live Q&A

Finally, there was time for questions, and several questions were recorded.

1. Security.

Cutting: Technical Solution +Social Solution

It feels like a problem, and the security problem has gotten worse in recent years.

2. Relational database and no

This is not a new problem. Doug Cutting says that each has its uses

3. Spark, storm exists

For example, Spark uses memory. Hadoop is now HDFS. Do you want to learn from Spark

It’s ecosystem. Each component has a role to play. I am happy to see Spark. Also, this is open source software, and it’s not like one company controls Hadoop and the other controls Spark, but two companies are competing. Because it’s open source, the ultimate goal is to be used by everyone.

Right now, Hadoop and Spark are really a fighting ecosystem, not a competition, but a symbiosis.

4. What is BigData

Doug Cutting: Not the size, it’s the style.

Well, BigData is an idea, an embodiment of a processing method. Can I understand that the number of data is not important, what is important is the method of processing?

5. Cloudera and Hortonworks

Doug Cutting also answered some polite questions and then said: Happy competition.

0 XFF summary

It’s been three years since I flipped through my notes before, which seems really fast. I was still a student at that time. Three years is a big change. I used to be ignorant about big data and didn’t know whether I would embark on this road, but NOW I have been working in this industry for nearly two years.

Thanks to the likes of Doug Cutting and the open source community for creating millions of jobs.