Cause 1.
On a Saturday night when I was still working overtime, MY body and mind were very tired, so I got up and poured milk and opened station B to see if there was anything new recently. Once a two-yuan dead house, to now a few months have not seen the paper man. In a lazy mood, I turned on Bilibili, and there was one on the new watch that caught my attention,
“In short very 🍋” at a draught finished a few sets, found that is not enough to see ah, want to continue to see, and then opened the bilibilii cartoon, searched unexpectedly to
I seldom read comic strips, 60 ocean suddenly want to pay for the cartoon really a little loathe to give up (in fact because of the poor), so I just each big web site search on the net resource, then point to point on the computer to not feel well, if you can switch to epub format cartoons, I can read on the phone’s own books, software, So I searched GitHub again to see if there was any open source program for downloading comics. After some checking, I found a CLI program written by Rust, named Mikack-CLI. After some understanding, this program supports ONE cartoon website to be accessed. I found that the domain name of ONE cartoon was changed. I found the issues of domain name change from the author’s other library mikack. Since cli program is based on this library, I asked the author and answered yes
The author’s answer seems to take some time to complete, but I think the heart of the comic book is not acceptable.
2. Do it yourself
Analysis of the
I did a little analysis with my meager knowledge:
-
I don’t have any professional crawler experience, so professional crawler libraries are excluded
-
The solution that comes to mind is: “Can I open a browser and let it save a chapter or a page by itself?” (Maybe that’s what non-professionals think 😂)
-
Since you’re operating a browser and saving files, NodeJS is the preferred option
-
Puppeteer is a headless Chrome that you can script to do for you whatever you do on the browser, which is perfect for me
I grab the website is coco comics original ONE comics, first of all to conduct a preliminary analysis of the website:
- Each piece has a corresponding number
- This section gets the urls for all sections
-
Comic pages have lazy loading mechanisms
Train of thought
- Puppeteer is used to obtain all the chapters of the work
- Simulate human operation, let the browser slowly scroll down
- At the same time, monitor the response event of the browser, filter out other non-picture content and save it as a picture file
The development of
Inspired by the Mikack-CLI project, I decided to develop this program from the command line, using packages commander. Js and Inquirer. The development process is not too difficult, as this is the first time you are using puppeteer, you need to pay a little attention to the manipulation of page objects. In addition, there are a lot of asynchronous operations due to operating the browser, so the code is full of async await statements. Here are some screenshots
- help
- search
3. Download
The problem
-
This program has a fatal drawback: it saves too slowly. However, I have no way to solve it at present. Due to the network and lazy loading, I can only imitate the operation of people and let the browser scroll down bit by bit to ensure that each piece can be successfully loaded and saved.
-
I successfully used epub-gen to convert to ePUB format, but the effect is not satisfactory, put two pictures for comparison
The first one is transferred by ePUB-Gen, and the second one is transferred by the Epub-Manga-Creator project on GitHub. According to the author, he makes it according to the specifications of digital-Comic-Association (デ デ somethingsomethingsomethingcollaborationMiu consortium). Of course, I am not an expert in this area, so I have not been able to migrate it, and THEN I will look at the specification and the source code of the project author to achieve the effect of converting it to the second one.
-
So far, it has only achieved my goal of downloading comics, but I haven’t tried the effect of continuous downloading. I don’t know whether the program will crash, at least during my development. I will continue to update and optimize in the future and explore new solutions.
subsequent
-
- デ デ デ デ blockchain Miu cooperation committee EPUB document specification
-
- Write background services to API program, web management, timing crawl and other services
Use and development
If some students want to try to use or develop, you can clone to the local, the code has a lot of comments, the implementation of logic is not difficult. Note that PUPpeteer-core is used here and the local Chrome is used. Please check whether Chrome is installed before using puppeteer-core. You can also use Puppeteer, which comes with Chromium.
git clone https://github.com/XavierXuV5/manga Copy the code
yarn install Copy the code
yarn link Copy the code
manga -h Copy the code
-
3. Talk about
In my spare time, I would like to write some small tools for fun. I am not an expert in this field, nor have I studied this knowledge in depth. Therefore, many things I write are not professional, so I can only realize my ideas. In a simple sentence, you have an idea and go for it, that’s all.
I also enjoyed the happiness in this process, but also increased knowledge, this is enough. If someone has a better idea, you can also discuss it with me. After all, one’s ideas are limited.