preface

After the year of the dog, and one year old, also into the second semester of the second postgraduate. In the next few months of 2018, we are faced with internships and job hunting. It is also time to learn how to earn a living out of the city. I am especially good at data analysis and machine learning on the data of 2 million Zhihu users. This is the first article about Millions of Zhihu data. In the following posts, I will explore some interesting information hidden in these data from different angles.

Source of ideas

Before, I watched a lot of online heroes and analyzed millions of Zhihu user data. I was very interested in such things, because I usually do this direction, and I want to find a job in this field in the future. Therefore, I found the GitHub Zhihu crawler online and crawled the data of two million Zhihu users (the data was crawled last month, which has certain timeliness and is only for reference and entertainment use), so as to make some attempts in data analysis or machine learning.

I have seen three such articles so far, all of which have inspired me a lot. Here I quote, the crawler is also from the GitHub of the author of the first article, while the second article mainly analyzes where the programmers from famous universities in China go to work. There is a part about the proportion of students from prestigious universities in BAT Company, so I came up with the idea of looking at the distribution of employees in major Domestic Internet companies in major schools, so I have this article.

  1. Data analysis of millions of Zhihu users
  2. Where do programmers like to work
  3. Big Data Report: Analysis of Millions of Zhihu users

Summary data

  1. A total of 2 million data, delete schools and companies are empty, there are still more than 80,000, delete non-major Internet companies, there are still nearly 10,000.
  2. Analysis tools: Python+ PANDAS +BDP Personal edition
  3. Analysis Angle: company statistics, school statistics, position statistics, etc

General data presentation

The first is the statistics of Internet companies and major cities screened this time (as shown in the figure below). This paper adopts the word cloud map (the larger the word, the higher the frequency). From the picture, it is not difficult to see that conventional BAT, Huawei, netease and so on are included in the screening data, and the main program users are also mainly distributed in Beijing, Shanghai, Hangzhou, Shenzhen and so on (BAT’s respective headquarters).

Next comes the statistics of positions, most of which are selected from Internet companies. You can see from the figure how active the programmers of different positions in the Internet companies are in Zhihu. Unsurprisingly, most of them are product managers and front end, etc. It seems that these people are more active on Zhihu.

Speaking of gang statistics by school, let’s start by looking at the major distribution of programmers by school at all the Internet companies I screened. There are many programmers in Beijing University of Posts and Telecommunications, Huazhong University of Science and Technology, Zhejiang University, Wuhan University, Tsinghua University, Peking University and Nanjing University, Shanghai Jiaotong University, XDIAN University, Harbin Institute of Technology and so on, which shows that these programmers focus on code and often take a walk on Zhihu ~.

Internet companies display

The above pictures only show data a ballpark, next is the school program of each major Internet company main distribution of apes, dear reader if like me face internships and looking for a job, you can refer to the see where more alumni (like the ranking order, below is my row).