TuChong_Spider

(New crawler project Daniel do not spray)

By chance, I saw this APP in Tiktok and found a lot of high-quality mobile phone wallpaper and pictures. For a beginner of crawler, this is very pleasing. A lot of European and American little sisters ah, hey hey….

Map worm net share map library crawler, by grabbing Ajax to get the picture ID for picture saving

Crawl web site: https://stock.tuchong.com

Crawl results

Operating environment:

  • Python 3.5 +
  • Windows 10
  • VSCode

How to use

Download the project source code

https://github.com/cexll/tuchong_Spider.git

Install dependencies

$ pip install -r requirements.txt

Run the project

$python spider.py enters what you want to search for: girl gets picture ID..... ImageID exists ID, parsing HTML image URL... Ready to download... / / p3a.pstatp.com/weili/l/199813 * * * * * * * * * * * * * 89 JPG download success -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- splicing url to access the page parsing HTML image url... Ready to download... / / p3a.pstatp.com/weili/l/189 * * * * * * * * * * * 417 JPG download success -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- splicing url to access the page parsing HTML image url... Ready to download... / / p3a.pstatp.com/weili/l/1 * * * * * * * * * * * * * * 25. PNG splicing url to access the page parsing HTML image url... Ready to download... / / p3a.pstatp.com/weili/l/2 * * * * * * * * * * * 62820 JPG splicing url to access the page parsing HTML image url... Ready to download... / / p3a.pstatp.com/weili/l/ * * * * * * * * * * * * 2. JPG splicing url to access the web page...Copy the code

Picture link I big size, so as not to be hit….

Ideas (high energy ahead, please pay attention to the flow party)

First climb a web page, the first step, first open the page (/ funny,,,,

After opening, first see how to search for images, download images..

MMM ~~, beautiful mountains and rivers… What am I doing here….

Back to the subject

Open developer tools (F12 or right-click and refresh

Then take a look what useful thing…. It seems that careful search has found nothing useful… What to do?

I really can’t find it. What should I do?

Go down to,,,,,

A spy…. appears in the middle of the picture This is what we need, but when we open it, we find it is wrong, why there is no link to download pictures?

Hey, don’t panic, first go to the website to open a picture to see what the structure is…

Here we found that the web url link has a imageID= this thing, as if and before found a spy content is the same, open it found it is indeed the same

So the overall structure is clear, let’s open the spy link and see what it is

Trypophobia has been committed,…. Take a closer look and find that there are imageID at the beginning, so the idea is there

By accessing spies imageID links for each page, with https://stock.tuchong.com/free/image/? + imageID can access to the pictures, the same good

code

The complete code can be found at the project address: github.com/cexll/tucho…

conclusion

Crawl all the web pages are similar ideas, first of all through the logic of the people to find the data, in the code to get, do not come to start the code,

Careful friends must have found that what we catch is the free picture library in the picture, yes, as for why not catch preferred picture library, high-end picture library,1 is I also just found that they are not the same,2 is the business of others to make money I so issued in case was caught how to do….

Have the ability to think about it yourself, I looked at the last time imageID is saved in HTML, a bit similar to today’s headlines

Project address: github.com/cexll/tucho…