Python crawls the novel site! The whole web novel is on my computer!

Feel like pretty long useless to update the blog, it’s not ready to adult the university entrance exam, as I graduated from technical secondary school, think college certificate or there will be a little useless, so spend a little time reading, but every time reading a book, watch and went for a chat with Mr Zhou, ah, I don’t know any who taught me how to improve the reading interest, I appreciate…

Gossip is here, today’s goal: new pen boring pavilion, and then we enter search page: www.xbiquge.la/modules/art…

Today climb some novels, when it comes to reading, I am not sleepy when reading novels, ha ha.

PS: If you need Python learning materials, please click on the link below to obtain them

Free Python learning materials and group communication solutions click to join

POST for novel Requests (requests, GET, GET, GET, GET, GET, GET, GET, GET, GET, GET, GET, GET, GET, GET, GET, GET, GET) For example, search for a favorite novel: Guest

! [](https://p1-tt-ipv6.byteimg.com/large/pgc-image/db241f07e4854f80bdcd5c4ab47a9cce)

The Form Data in the lower right corner is the Data we’re carrying around for Parsing requests:

import requests import parsel import re def get_url(headers,keyword): headers = { 'User-Agent': 'the Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36', } url = 'http://www.xbiquge.la/modules/article/waps.php' data = { 'searchkey': keyword, } res = requests.post(url,data=data,headers=headers) res.encoding = 'utf-8' sreach = parsel.Selector(res.text) n = 0 href = [] for each in sreach.xpath('//div[@id="content"]/form/table/tr')[1:]: Href. Append (each xpath ('/td/a / @ href '). The get ()) address title = # books each. The xpath ('/td/a/text () '). The get () the author name = # books Each xpath (. '[3] / td/text ()'). The get () n + = 1 print # author (STR (n) + ":" + title, author) if n = = 4: Print (f" {keyword} not found, please re-type!!" ) main() while True: choice = int(input(" please select the book you want to download: ")) if choice == 1: return href[0] elif choice == 2: Return href[1] elif choice == 3: return href[2] Elif choice == 4: return href[3] else: print Please re-enter!" ) def main(): keyword = input(") headers = {' user-agent ': 'Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36', } get_url(headers,keyword) if __name__ == "__main__": main()Copy the code

The results are as follows:

! [](https://p6-tt-ipv6.byteimg.com/large/pgc-image/acbdabdd4b4b459e80f9b95c94aac002)

! [](https://p9-tt-ipv6.byteimg.com/large/pgc-image/ad8bb30b222c4b88a1e409832d9f1e8b)

! [](https://p1-tt-ipv6.byteimg.com/large/pgc-image/083fe7db784e465fbd877bf8a23018f6)

(If == 4) Add a few more functions, you can find books or authors, up to four books, so please try to write the full book name, of course you can change, if == 4 of the 4 to the number of books you want to display, of course, the judgment behind also need to increase a little

Now that we have these things, the following is actually simple, directly to the code:

import requests import parsel import re def get_url(headers,keyword): headers = { 'User-Agent': 'the Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36', } url = 'http://www.xbiquge.la/modules/article/waps.php' data = { 'searchkey': keyword, } res = requests.post(url,data=data,headers=headers) res.encoding = 'utf-8' sreach = parsel.Selector(res.text) n = 0 href = [] for each in sreach.xpath('//div[@id="content"]/form/table/tr')[1:]: Href. Append (each xpath ('/td/a / @ href '). The get ()) address title = # books each. The xpath ('/td/a/text () '). The get () the author name = # books Each xpath (. '[3] / td/text ()'). The get () n + = 1 print # author (STR (n) + ":" + title, author) if n = = 4: Print (f" {keyword} not found, please re-type!!" ) main() while True: choice = int(input(" please select the book you want to download: ")) if choice == 1: return href[0] elif choice == 2: Return href[1] elif choice == 3: return href[2] Elif choice == 4: return href[3] else: print Please re-enter!" ) def get_list(url,headers): res = requests.get(url,headers=headers) res.encoding = "utf-8" list_url = parsel.Selector(res.text) book_name = List_url. Xpath (' / / div [@ id = "info"] / h1 / text () '). The get () # books name print (" start download: ",book_name) for lis in list_url.xpath('//div[@id="list"]/dl/dd'): List_url = 'http://www.xbiquge.la' + lis. Xpath ('/a / @ href '). The get () # contents page URL chap = lis. Xpath ('/a/text () '). The get () # section titles print(chap) content = requests.get(list_url,headers=headers) content.encoding = "utf-8" content_url = parsel.Selector(content.text) con_text = "" all_content = content_url.xpath('//div[@id="content"]/text()').getall() for ac in all_content: re.sub("[\xa0]"," Con_text += ac # print(con_text) with open("./images/" + book_name+". TXT ","a+",encoding=" utF-8 ") as f:  f.write(chap) f.write("\n") f.write(con_text) f.write('\n') def main(): Headers = {' user-agent ': 'Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36', } url = get_url(headers,keyword) get_list(url,headers) if __name__ == "__main__": main()Copy the code

! [](https://p3-tt-ipv6.byteimg.com/large/pgc-image/832cf066ca3a40fcb0206e44f6f92fe1)

! [](https://p26-tt.byteimg.com/large/pgc-image/20d58d2c3d5849818bcc4906bb6e5b32)

Done! Welcome to communicate with you, I’m going to read!

Python crawls the novel site! The whole web novel is on my computer!

Related Posts

Rainy day: the goddess said that there are stars in the evening to date with me? Python will help you draw a sky full of stars!

Column Learning experience dachang Promotion Guide

LeetCode. 263 ugly number