I don’t know what to say
As a reptile, my first small task was, of course, to climb our cute new friend, “New Pen Fun Pavilion”.
It’s not that difficult,
But let me share a little bit about my coding journey,
I hope I can give you some ideas or help,
Of course, if there’s a bigwig who can point out mistakes or things that can be improved,
That would be even better,
Waiting for you yo ~
Projects show
So before I talk about the project,
Just to show you how it works,
Otherwise, after looking for a long time, the result is not what you want that is not autistic.
< — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — line — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — >
Honestly, I think I’m stupid enough to write code like this, because if you don’t download a whole novel,
The result is my stupid chapter after chapter, and even feel good, harm
Explanation of code ideas
Modules to be used by the project
Import osfrom time import sleepImport requestsfrom LXML import etree
PIP /pip3 install Import osfrom time Import sleep Import requestsfrom LXML import etree
All novel interface links of new Pen Fun Pavilion, all novel links are here
Url = ‘www.xbiquge.la/xiaoshuodaq… Headers = {‘ user-agent ‘:”Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit / 537.36 (KHTML, Like Gecko) Chrome/73.0.3683.103 Safari/537.36″} All_book_html = etree.html (all_book_r.content.decode(‘ utF-8 ‘) Click the F12 key on the upper right of the keyboard or right-click the mouse and click check to enter the page debugging interface (white part on the right). Then click the “Select” button in the upper left corner of the debugging console. Select a novel at first, and then click it to find that the HTML interface in the left debugging console has jumped to the place we selected. We can see that this is an A tag under an UL list, and this UL list stores all the links to novels, then it is easy to do ~
We use Chrome’s xpath plugin to get links to all the novels of new Biquge. If not, you can download and install them yourself. (strongly recommended, especially useful) Link: pan.baidu.com/s/1_HzBzOp-… Extract code: sb7p xpath usage: here is a big guy’s blog, do not understand can see
We’re going to put the code that gets the information that we need correctly into the xpath method, and the xpath method is going to put all the information that we need from the HTML page that we get into a list, which is the variable names that you set up
Stores a list of all novel links
All_book_url = all_book_html.xpath(‘//div[@class=”novellist”]/ul/li/a/@href’)# all_book_title = all_book_title All_book_html. Xpath (‘ / / div [@ class = “novellist”] / ul/li/a/text () ‘) print (all_book_url) # below
Set the num value and judge it once in each cycle until the novel is found. Then the num value will be used as the subscript of the list above to obtain the link of the novel that the user wants to download and obtain the data of the novel
Type in the title of the novel you are looking for
Find_book = input(‘ enter the name of the book you want to download :’) num = 0 If find_book == book_title: Print (‘ find, Book_url = all_book_URL [num] # request book_r = requests. Get (book_url, Headers =headers) # parse book_html = eree.html (book_r.content.decode(‘ utF-8 ‘)) # list of chapter links for this novel book_URL = Book_html.xpath (‘//div[@id=”list”]/dl/dd/a/@href’) # List of chapter names of the novel chapter_title = Book_html.xpath (‘//div[@id=”list”]/dl/dd/a/text()’) # loop once with num += 1 Not only was it confusing, but it was hard to find the chapters I wanted to read. (It wasn’t that hard as long as the stories were packed together, but my sister thought.)
Checks whether the path exists, returns true if it does, and flase if it does not
judge = os.path.exists(‘.. / fiction /%s’ % STR (book_title))# judge: os.makedirs(‘.. / novel /%s ‘% STR (book_title)) loops the desired chapter links through the user-entered value, and then retrieves the novel text content via xpath
Tell users how many chapters novel print (‘ < — — — — — – please enter a number (the novel, a total of % s) — — — — — – > ‘% len (chapter_title))
# download_book_start = int(input(' input from chapter: ')) download_book_end = int(input(' input to chapter: ') ')) chapter_num = 0 # download_book_start - 1 because list subscripts start from 0 for book_content_URL in book_url[download_book_start - 1:download_book_end]: Sleep (2) new_book_content_URL = 'http://www.xbiquge.la' + book_content_url book_content_r = requests.get(new_book_content_url, Headers =headers) book_content_html = etree.html (book_content_r.content.decode(' utF-8 ')) # book_content_html.xpath('//div[@class="box_con"]/div[@id="content"]/text()')Copy the code
Loop the list of contents into the all_content string, and write it to the text file with open(‘.. / novel /%s/% s.ext ‘% (STR (book_title), chapter_title[download_book_start + chapter_num -1]), ‘w’, Encoding =’ utF-8 ‘) as write_content: # All_content = “for content in book_content: all_content += content write_content.write(all_content) print(chapter_title[download_book_start + chapter_num -1], Chapter_num += 1 print(‘ All downloads complete ‘)
Emmm, probably maybe maybe finished, should talk about quite detailed, MOE new write for the first time, if there is any deficiency, can put forward (do not spray), will slowly improve the complete code
import osfrom time import sleep import requestsfrom lxml import etree
Url = ‘www.xbiquge.la/xiaoshuodaq… ‘
Headers = {‘ user-agent ‘:”Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36″}
all_book_r = requests.get(url, headers=headers)
all_book_html = etree.HTML(all_book_r.content.decode(‘utf-8’))
all_book_url = all_book_html.xpath(‘//div[@class=”novellist”]/ul/li/a/@href’)
all_book_title = all_book_html.xpath(‘//div[@class=”novellist”]/ul/li/a/text()’)print(all_book_url)
Num = 0for book_title in all_book_title: if find_book == book_title: Print (‘ I ‘, book_title) book_url = all_book_url[num]
book_r = requests.get(book_url, headers=headers) book_html = etree.HTML(book_r.content.decode('utf-8')) book_url = book_html.xpath('//div[@id="list"]/dl/dd/a/@href') chapter_title = book_html.xpath('//div[@id="list"]/dl/dd/a/text()') judge = os.path.exists('.. %s' % STR (book_title)) if not judge: os.makedirs('.. STR/novel / % s' % (book_title)) print (' < -- -- -- -- -- - please enter a number (the novel, a total of % s) -- -- -- -- -- - > '% len (chapter_title) download_book_start = Int (input(' input to the end of chapter: ')) download_book_end = int(input to the end of chapter: ')) chapter_num = 0 for book_content_url in book_url[download_book_start - 1:download_book_end]: sleep(2) new_book_content_url = 'http://www.xbiquge.la' + book_content_url book_content_r = requests.get(new_book_content_url, headers=headers) book_content_html = etree.HTML(book_content_r.content.decode('utf-8')) book_content = book_content_html.xpath('//div[@class="box_con"]/div[@id="content"]/text()') with open('.. / novel /%s/% s.ext '% (STR (book_title), chapter_title[download_book_start + chapter_num -1]), 'w', encoding='utf-8') as write_content: all_content = '' for content in book_content: all_content += content write_content.write(all_content) print(chapter_title[download_book_start + chapter_num -1], Chapter_num += 1 print(' book_title ') break elif num + 1 == len(all_book_title): print(' book_title ') num += 1Copy the code
This article is reprinted from SCDN