1. Start a jerk

Recently, I am learning Java crawler technology, so I came into contact with jSoup. The second small project after climbing pictures of beautiful women, climbing novels of pen Ququge,

2. Page analysis

First of all, we entered the pen ququge and chose a favorite novel. After F12, we got the title and the address of each chapter.

Then we continue to analyze what’s going on in the chapter

Here we can get the title content of each chapter and the address of the next chapter

3. Code implementation

After my rigorous analysis. Now that we know the general layout of the page, we have a code implementation

Introduction of depend on

< the dependency > < groupId > org, apache httpcomponents < / groupId > < artifactId > httpclient < / artifactId > < version > 4.5.6 < / version > </groupId> </artifactId> </artifactId> <version>1.10.2</version> </dependency>Copy the code
Public class ArticleSpider {/** * save address */ private String path; Public void start(String url){try {Document Document = jsoup.connect (url).get(); String listUrl = document.select("#list>dl>dd>a").attr("abs:href"); String fileName = document.select("h1").text(); Path = "D:/novel/" + fileName; each(listUrl); }catch (Exception e){ e.printStackTrace(); / / private void each(String url){try{Document Document = jsoup.connect (url).get(); Element elm = document.getelementById ("content"); Element elm = document.getelementById ("content"); String content = elm.text().replaceAll(" ", "\n").replaceAll(". "). , ". \n"); String title = document.getElementsByTag("h1").text(); / / in the next chapter addresses String next = document. GetElementsByClass (" bottem1 "). The get (0). The child (3). Attr (" abs: href "); File File = createFile(title); mergeBook(file,content); If (next. IndexOf (" HTML ")! Thread.sleep(5000); thread.sleep (5000); System.out.println(" rest 5 seconds and continue climbing "); each(next); } }catch (Exception e){ e.printStackTrace(); }} /** * createFile */ public File createFile(String fileName) { networkNovel File file = new File( path +"/"+ fileName + ".txt"); Try {// Get the parent directory File fileParent = file.getParentFile(); if (! fileParent.exists()) { fileParent.mkdirs(); } // Create file if (! file.exists()) { file.createNewFile(); } } catch (Exception e) { file = null; System.err. Println (" New file operation error "); e.printStackTrace(); } return file; Public void mergeBook(File File, String content) {// String stream try {FileWriter resultFile = new FileWriter(file, true); PrintWriter myFile = new PrintWriter(resultFile); / / write myFile. Println (content); myFile.println("\n"); myFile.close(); resultFile.close(); } catch (Exception e) {system.err. Println (" write operation error "); e.printStackTrace(); }}}Copy the code

There are also entity classes for storing titles and content

@data public class NovelAttribute {// private String title; // Private String content; private String url; public NovelAttribute(String title, String content) { this.title = title; this.content = content; }}Copy the code

Finally start

public static void main(String[] args){ ArticleSpider articleSpider = new ArticleSpider(); / / into your favorite novel address articleSpider. Start (" https://www.xbiquge.la/1/1688/ "); }Copy the code

4. Running effect

5. To summarize

These are the steps of the climbing novel, quite simple. In the future, I will study the method of music crawling when I have time. If this article helped you, please like <^_^>