Basic idea:

First use developer tools to find the list of tags to extract data from:

! [](https://p6-tt-ipv6.byteimg.com/large/pgc-image/bd2bcc5f762547ce8b954deae1193f9b)

Use xpath to locate the list of data to extract

! [](https://p26-tt.byteimg.com/large/pgc-image/0025bc4a2eeb4349a7dbd3098dea1c12)

And then extract the corresponding data one by one:

! [](https://p9-tt-ipv6.byteimg.com/large/pgc-image/ee47b055daf141dc9e971c420d66c084)

Save data to CSV:

! [](https://p1-tt-ipv6.byteimg.com/large/pgc-image/5fe2f65117d946909db951fe6a538436)

Use developer Tools to find the next button TAB:

! [](https://p6-tt-ipv6.byteimg.com/large/pgc-image/85e34c5409104fc599652944a5a1d633)

Extract this tag object using xpath and return:

! [](https://p1-tt-ipv6.byteimg.com/large/pgc-image/b8c2e5240e2d4a198f0f459cb6cc04ce)

Call the click event and loop through the above process:

! [](https://p6-tt-ipv6.byteimg.com/large/pgc-image/ed409f438f104b0ead1c8f4903cb9b14)

Final effect:

! [](https://p1-tt-ipv6.byteimg.com/large/pgc-image/24d7a011458d40eb857577df9a13aef3)

Code:

from selenium import webdriver import time import re class Douyu(object): def __init__(self): # at the beginning of the url of the self. Start_url = "https://www.douyu.com/directory/all" # instantiate a Chrome object self. The driver = webdriver. Chrome (#) Self.start_csv = True def __del__(self): self.driver. Quit () def get_content(self): Sleep (2) item = {} # Get the next TAB next_page = Self. Driver. Find_element_by_xpath (" / / span [text () = 'next'] /.." Get_attribute ("aria-disabled") is_next_URL = next_page.get_attribute("aria-disabled" Self.driver. find_elements_by_xpath("//ul[@class=' layout-cover-list ']//li") item["user-id"] = li.find_element_by_xpath(".//div[@class='DyListCover-userName']").text item["img"] = li.find_element_by_xpath(".//div[@class='DyListCover-imgWrap']//img").get_attribute("src") item['class-name'] = li.find_element_by_xpath(".//span[@class='DyListCover-zone']").text item["click-hot"] = li.find_element_by_xpath(".//span[@class='DyListCover-hot']").text item["click-hot"] = Re.sub (r'\n', ",item['click-hot']) # Save data self.save_csv(item) # Return the tag if there are next page and next page click events, return next_page,is_next_url def save_csv(self,item): STR =','. Join ([I for I in item.values()]) with open('./douyu.csv','a',encoding=' utF-8 ') as f: if self.start_csv: F.write (" userid,image, class ") self.start_csv = False # print("save success") Self.driver. get(self.start_url) while True: def run(self): Is_next = self.get_content() if is_next! Next_page.click () if __name__=='__main__': douyu_spider = Douyu() douyu_spider. Run ()Copy the code

This article reprinted text, copyright belongs to the author, such as infringement contact xiaobian to delete

Original address: blog.csdn.net/qq_46456049…

The source code is available here

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Selenium is used to realize automatic page turning and crawling a fish data! You have everything you want!

Basic idea:

Selenium is used to realize automatic page turning and crawling a fish data! You have everything you want!

Basic idea:

Related Posts

A binary tree is non-recursive traversal

I studied at Lebyte on the eleventh day

How do I pass additional arguments to a callback function