This is the 28th day of my participation in Gwen Challenge

Simple crawler

If you’ve been learning Python for so long, you’ll have to climb something to live up to it

But I have low requirements on their own, ha ha leng is from the vast page to find such a simple structure, clear, and you and I are familiar with a page ~

Dangdang dangdang dang ~, the page to crawl is as follows:

Yes, today let’s use Python3 to simply crawl baidu home page hot search list information ~

Baidu address: www.baidu.com/

Train of thought

Of course, in addition to knowing a little Python, you also need to know the structure of web pages

After all, a crawler is a process of taking things from our web pages and then doing it automatically, depending on your Settings, to simulate the user clicking a button and triggering some event on the web page.

So, to analyze the next Baidu home ~

As shown in the picture, bring up the console through the browser F12, and then locate the “Chinese positive energy”.

You can see that they’re all in the li tag, and the CSS class is hotsearch-item odd or hotsearch-item even, so when we find these elements, we get the text in the tag using getText. Then sort them, print them out, and complete the task of our climb ~

The code is as follows:

# -*- coding: utf-8 -*-
# @Time : 2020/10/9 15:44
# @Author : ryzeyang

import requests
from bs4 import BeautifulSoup
from datetime import datetime

headers = {
    'User-Agent': "Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36"
}
response = requests.get("https://www.baidu.com/", headers=headers)
# parse
bsObj = BeautifulSoup(response.text)
Get response header time
resDate = response.headers.get('Date')
print(resDate)
# Find the top trending list
nameList = bsObj.findAll("li", {"class": {"hotsearch-item odd"."hotsearch-item even"}})
# Add content to the top trending list
tests = []
for name in nameList:
    tests.append(name.getText())
# sort
tests.sort()
for news in tests:
    news = news[0:1] + ":" + news[1:]
    print(news)

Copy the code

The printed result looks like this:

Successfully completed the mission ~

The last

Welcome friends to discuss the question ~

If you think this article is good, please give it a thumbs-up 😝

Let’s start this unexpected meeting! ~

Welcome to leave a message! Thanks for your support! ヾ(≧▽≦*)o go!!

I’m 4ye. We should… next time. See you soon!! 😆