“This is the third day of my participation in the Gwen Challenge in November. Check out the details: The last Gwen Challenge in 2021.”
After the first two days of reptilian basic learning, today to test the knife. Check out netease Cloud’s comments.
1. Locate the location target
First I look for my favorite song “Golden Age”, but there is no original song, netease cloud really does not have any original song, a lot of covers!!
You can see that all comments are wrapped with id=” auto-id-0FLvTEg8ZLVKFZST”
2. Download the web page
Just download the web page and use BeautifulSoup to extract the comments.
Import requests def get_URL (url): headers = {'user-agent':'Mozilla/5.0 (Windows NT 6.1); WOW64) AppleWebKit / 535.1 (KHTML, } res = requests. Get (url,headers = headers) return res def main(): Url = input(" please input the url :") res = get_url(url) with open("res.txt","w",encoding = 'utF-8 ') as file: file.write(res.text) if __name__ == "__main__": main()Copy the code
The output result is as follows, you need to input the relevant song webpage:
Search the relevant comments, no search! Comments are not in this file! That means the comments are in another file!
3. Set the speed and locate the target file
The Internet is so fast that it loads the entire page with one swipe.
If we click on Network and refresh, we can find many source files that are part of the entire web page:
We’re going to dig through this pile of papers to find the one with the comment; Obviously, we can go through file by file, but that’s a little tricky. Here we can make the browser load the page slowly and stop time when the target is detected.
In addition, the order of labels is as follows:
data = soup.select('#main > div > div.mtop.firstMod.clearfix > div.centerBox > ul.newsList > li > a')
Copy the code
The car overturned, the browser does not work! Go back tonight and update Chrome!
Comments are documents, and we can look directly at XHR and DOC files. Also, when we download the target file, we find that the file is a POST file, remember what we said about POST files? We need to submit certain data to the server in order to get what we want. I will update this tomorrow!