Two days ago, I accidentally saw such a popular law education about 520, Tanabata, Valentine’s Day and so on. ! Believe that a few days ago Tanabata many fairies have received from each other’s love!

But there are many partners even do not have the opportunity to send a red envelope…

I’m the kind of person who has no love and no chance to show it! I don’t know why I’m still single! Ha ha ha, I want to know, so many single people, you single because of what! The crawler! The crawler! The crawler!

Everyone say, single so many people, have analyzed why so excellent you are still single?

First, demand background

Today, I found an interesting topic # 3 reasons for being single after 90

On the occasion of the Chinese Valentine’s Day, a sample survey has been released for the country’s post-90s youth love and marriage views. The results show that the proportion of singles in first-tier cities continues to lead. TOP3 reasons for post-90s singles: small circle, busy job and over-perfect love fantasy.

Top 3 reasons for staying single: small circle, busy job, and over-perfect love fantasy!

I think these three reasons all seem unreasonable, isn’t the reason for being single not because of poverty? Cry…

Ii. Function Description

Curious how this survey came about? Just these days, we also learned how to climb the topic of weibo. Today, we will analyze why many students are so outstanding but still single!

Iii. Technical scheme

  1. Simulate logging in to weibo
  2. Crawl topic
  3. Save the file
  4. Data cleaning
  5. The data analysis

4. Simulated login

Simulation login before speaking climb up # jay super words # when already said, here will not repeat, directly posted code!

Five, climb the topic

1. Find the topic load data URL %9B%A0TOP3%23%26t%3D0&isnewpage=1&extparam=pos%3D41%26c_type%3D31%26realpos%3D40%26flag%3D0%26filter_type%3Drealtimehot% 26cate%3D0%26display_time%3D1565179797&luicode=10000011&lfid=106003type%3D25%26t%3D3%26disable_hot%3D1%26filter_type%3Dr ealtimehot&page_type=searchall

2. Code simulates request data

We’re still using the Requests library to crawl data, and this time we’ve added a timeout parameter to the request to prevent a request from being blocked.

3. Extract micro-blog content

To extract the content of the tweet, you need to understand the format of the data returned by the request

Once we understand the data format, we can write code to extract the content we want.

Above we have got the content of the microblog, but there are still many page tags, let’s use the re to remove the page tags, and the beginning of the topic!

4. Save the file

After the tweets are extracted, we save them!

Six, batch crawl

Batch crawls involve paging, and the last time we called Jay, its paging mechanism was:

Weibo ultra call paging mechanism: Each since_id has a since_id based on time pagination, and the older the since_id is, the bigger the since_id is so if you pass in the since_id on request, it will load the since_id of tweets that are smaller than the since_id for that topic, and then you get the smallest since_id and you pass in the smallest since_id, This enables paging

Is the pagination mechanism for this topic the same? Let’s compare the URL of the first request to that of the second

We found that the pagination mechanism of common topics is actually in the form of page, it seems that Weibo has different pagination mechanism for different levels of topics!

We’ve seen a lot of cases where we pass in a “for” loop and the “I” is the page.

7. Data analysis

Data analysis we use Pyecharts library, which is a very useful visual analysis library!

First read the data, then use Jieba library for word segmentation and data cleaning, and finally use Pyecharts library for display!

Previous surveys have shown that the top three reasons for staying single are having a small circle of friends, a busy job and having an overly romantic fantasy. The results of our data analysis also seem to be true! Ha ha ha, no wonder I am also single, every day busy with you out of the article tutorial! Why are you single?!