Zhihu is the leading UGC platform in China, but it is so popular that many other comprehensive platforms are ignored. So what are the other platforms?
Take Jianshu for example, this is a comprehensive platform similar to Zhihu, but because there is no such meme as “people in the United States, just off the plane”, it has been gradually forgotten by people…..
Who are the best users of Jane Book? How many big Vs have tens of thousands of followers and likes? Which articles are read the most? What are the most popular columns among users?
1. Access to data
We have to say that the data must be crawled in Python. We can find the data that we want to crawl, and we can go anywhere in the code
Because of official data protection and restrictions, only 900 followers of a single user (as well as fans) and the first 1,900 or so articles can be accessed. After 2-3 layers of data crawling, a total of 261,277 pieces of user information were obtained, including user name, home page URL, whether it is a signed author, number of fans, number of likes, number of followers, number of articles, total word count and so on.
Meanwhile, according to the descending order of the number of likes received by the 1916 articles, it can be seen that the number of likes received by the first article is 17,076; At the bottom of the list was 488. Thus, the most popular articles in Jane’s book may already be available (they’re not).
Ii. BI analysis
In general, once you’ve done the math in Python, it’s time to visualize the data.
When it comes to data visualization, it can be said that a hundred flowers bloom together, and a dazzling array of third-party libraries emerged in the front-end sector: Highcharts, Echarts, Chart.js, d3.js and so on. But change is the same thing: good code knowledge is required, and these products aren’t really open source.
So what’s the solution for people like us who don’t know how to code?
That’s what I’m talking about today, BI, or business intelligence. Baidu search BI, feel content spread all over the ground, let a person feel confused. In fact, BI really does very little, but there are still some excellent products at home and abroad.
Foreign representative is Tableau, 15.7 billion dollars was acquired, enough to show its strong, but for domestic, it does not apply:
- Based on data query tools, real-time data analysis function is still lacking
- The price is very expensive (local tycoons detour), are agents so the after-sales service is very poor
- There is no back-end data warehouse itself, claiming that it is in-memory BI, but it has very high requirements for hardware when it is actually used. For the analysis of over ten million pieces of data, other ETL tools must be used to process the data before front-end analysis
- Unable to support Chinese complex tables
Therefore, I chose FineBI, a domestic BI product, which is an enterprise-level data analysis software. The most important thing is that its personal version is free.
The advantages are as follows:
- Automatic modeling, simple modeling, model flexibility is very strong
- Rich visualization and front-end analysis operations, visualization of data drilling, data slicing, data rotation and other multidimensional analysis operations
- Built-in ETL, real-time data analysis, and rapid processing of big data
Data visualization
As mentioned above, FineBI is an enterprise-level data analysis software, but it is free for individuals. At the same time, FineBI supports multiple forms of data sources, different connection modes, data processing is completely stress-free.
Once I’ve finished and activated it, I insert the data that Python crawled out into FineBI and have fun with the analysis.
1. Analysis of contracted authors
Since it is a “we media” platform, the purpose of the writer inside is to become a signed author. Of these 26w+ quality users, a total of 126 have the “signed author” label on their home pages.
This proportion can be said to be very small, but also from the side of the book to show how strict the requirements of the author.
A total of 69 authors contributed five or more popular articles by themselves, which shows that writing is not easy.
2. Users’ fans
This is a pyramid-like analysis chart. Among the 26W + users, there are 5 people with more than 10W + fans, all of whom are one in ten thousand. The number of other gradients can be seen in the chart, no further details. It is worth mentioning that the number of fans in the range of 10-100 accounts for the largest proportion of 40.38%, rather than 0 or 1 fans, which further indicates that the data collected in this time is relatively high quality.
3. 24-hour analysis of popular articles
11 o ‘clock is the most articles, I feel very strange, as a small transparent like to publish articles in the evening, originally thought that the evening is the good time for creation ah, 11 o ‘clock already belongs to the meal, is it the morning to concentrate on creation, the plan of the day is in the morning, will the day’s writing tasks completed early, a relaxed? And then 24 hours a day someone posts and it becomes hot.
4. Reading, liking and commenting
The popularity of an article is directly reflected in the number of likes and comments, which is true from the graph.
The original link: sail soft software www.toutiao.com/a6782840504…
Wenyuan network, only for the use of learning, such as infringement, contact deletion. Follow wechat official account: Python Circle, get Python foundation, crawler, framework, data analysis, machine learning and other materials.