Basic idea:

First use developer tools to find the list of tags to extract data from:

! [](https://p6-tt-ipv6.byteimg.com/large/pgc-image/bd2bcc5f762547ce8b954deae1193f9b)

Use xpath to locate the list of data to extract

! [](https://p26-tt.byteimg.com/large/pgc-image/0025bc4a2eeb4349a7dbd3098dea1c12)

And then extract the corresponding data one by one:

! [](https://p9-tt-ipv6.byteimg.com/large/pgc-image/ee47b055daf141dc9e971c420d66c084)

Save data to CSV:

! [](https://p1-tt-ipv6.byteimg.com/large/pgc-image/5fe2f65117d946909db951fe6a538436)

Use developer Tools to find the next button TAB:

! [](https://p6-tt-ipv6.byteimg.com/large/pgc-image/85e34c5409104fc599652944a5a1d633)

Extract this tag object using xpath and return:

! [](https://p1-tt-ipv6.byteimg.com/large/pgc-image/b8c2e5240e2d4a198f0f459cb6cc04ce)

Call the click event and loop through the above process:

! [](https://p6-tt-ipv6.byteimg.com/large/pgc-image/ed409f438f104b0ead1c8f4903cb9b14)

Final effect:

! [](https://p1-tt-ipv6.byteimg.com/large/pgc-image/24d7a011458d40eb857577df9a13aef3)

Code:

from selenium import webdriver import time import re class Douyu(object): def __init__(self): # at the beginning of the url of the self. Start_url = "https://www.douyu.com/directory/all" # instantiate a Chrome object self. The driver = webdriver. Chrome (#) Self.start_csv = True def __del__(self): self.driver. Quit () def get_content(self): Sleep (2) item = {} # Get the next TAB next_page = Self. Driver. Find_element_by_xpath (" / / span [text () = 'next'] /.." Get_attribute ("aria-disabled") is_next_URL = next_page.get_attribute("aria-disabled" Self.driver. find_elements_by_xpath("//ul[@class=' layout-cover-list ']//li") item["user-id"] = li.find_element_by_xpath(".//div[@class='DyListCover-userName']").text item["img"] = li.find_element_by_xpath(".//div[@class='DyListCover-imgWrap']//img").get_attribute("src") item['class-name'] = li.find_element_by_xpath(".//span[@class='DyListCover-zone']").text item["click-hot"] = li.find_element_by_xpath(".//span[@class='DyListCover-hot']").text item["click-hot"] = Re.sub (r'\n', ",item['click-hot']) # Save data self.save_csv(item) # Return the tag if there are next page and next page click events, return next_page,is_next_url def save_csv(self,item): STR =','. Join ([I for I in item.values()]) with open('./douyu.csv','a',encoding=' utF-8 ') as f: if self.start_csv: F.write (" userid,image, class ") self.start_csv = False # print("save success") Self.driver. get(self.start_url) while True: def run(self): Is_next = self.get_content() if is_next! Next_page.click () if __name__=='__main__': douyu_spider = Douyu() douyu_spider. Run ()Copy the code

This article reprinted text, copyright belongs to the author, such as infringement contact xiaobian to delete

Original address: blog.csdn.net/qq_46456049…

The source code is available here