0x00 is written first

I haven’t written anything well for some time. After some time of torturing and thinking, I was finally able to calm down and think about the follow-up study plan and start a new blog writing journey.

0x01 Timeline of my blog

From 2009 to 2013

In 2009, I opened my first blog. I used netease blog to pretend literature and art for about three or four years. I wrote more than ten articles on hydrology and some small poems. (It looks funny now.)

At the beginning of 2015

At the beginning of 2015 I registered my second blog, my first technical blog, on the CSDN blog platform. This blog is mainly used to record my own learning notes, so I have written a lot of notes on the installation and use of the big data ecosystem in the blog. I think this blog is a series of treading holes. At the same time, I bought a domain name and set up a personal blog page on Github, whose content was synchronized with CSDN.

From 2015 to 2016

From the beginning of 2015 to the end of 2016, I wrote nearly a hundred technical blogs (installation notes + potholes). Looking back now, IT is not that I don’t want to write in-depth articles, but that I didn’t have a deep understanding. It was a big step to install Hadoop successfully. At that time, I didn’t have much deep thinking.

At the end of 2016

In November 2016, after a month of leisure, I was in a relaxed state, thinking about how to write more interesting articles. Just at that time, Jianshu appeared. In order to satisfy my mood of writing articles secretly in the corner, I registered my second technical blog in Jianshu, a platform of literary and artistic style. In the same month, I wrote several interesting design patterns blog posts titled “How Programmers Manage their Harem.”

At the beginning of 2017

At the beginning of 2017, I came to the Goose Factory and continued to deepen my path of data development. I got in touch with calculation of billions of data volume and data modeling of complex business. In order to meet the requirements of the job, I studied several classic data warehouse theory books in depth. Read at night, summarize on weekends, and practice during the day. After four months of accumulation, completed my second relatively mature article series data is king, these articles mainly focus on large data scenario data warehouse practice, from books and practice, but relatively easy to understand, I want to write is dry, rather than obscure theory or full post code. The results of this series are relatively good, and I have received a lot of positive feedback from friends and colleagues.

By the end of the first quarter of 2017, I had almost finished my data Is King series, leaving a few articles I wanted to write for later inspiration. However, I felt that I lacked technical depth, so I read some source code of Spark and compiled some source code reading notes. And then came today.

What does 0x02 want to write?

What do you want to write after? In fact, I have a lot of things I want to write, such as distributed algorithms, big data algorithms, machine learning series. Of course, these are all notes in the process of study and work, so my grasp is not enough.

In addition, I want to write an interesting series, the name is not yet decided, this series is a summary of my data related work.

  • I’m going to start with data harvesting (data harvesting is a very important point, and I’m going to write a little bit about crawlers first, because crawlers allow you to crawl as much data as you want);
  • Data cleaning (using frameworks like MapReduce and Spark), data exploration (pagerank and LPA implementations, and of course, NLP related things like inverted indexes and Chinese word segmentation);
  • Then I’ll cover some data management (this is where I’m good at, dimension modeling and OLAP series);
  • Then I will write something about machine learning and deep learning (I have almost forgotten this part, I will try my best to pick it up and deepen my learning, so as to prepare for the application in my future work).
  • And of course data visualization (how to elegantly present data with Gephi and D3JS);
  • There are also some interesting things to mention (such as graph database applications, some new technology designs).

I hope to write an interesting series, which will help me and other partners to sort out all aspects related to data with practice-oriented but with some theoretical explanations.

The article will be based on the length of the short article, a small topic, write their own text with their own pictures.