First lecture text retrieval and database retrieval


Introduction (important!!)

Hello, I am Jack, the recommend system step-by-step tutorial to bring this news, learning and development for auxiliary news recommendation systems, this tutorial route based on documents provided by the news recommendation systems, of course, the document I will be prepared to give you, you follow to learn step by step, hope everybody can learn 😊 ~ of course because of my energy is limited, Please forgive me if there are any imperfections or omissions

The total process

Links to Supporting materials

Be sure to read the documentation first and then check out the tutorial below!! Each lecture corresponds to the task of a document. PS: Lecture 5 Correspondence > Information Retrieval and Recommendation Handout 06 Introduction to recommendation Systems V6.2 (main).pdf

Document links: download.csdn.net/download/m0…

Source link: download.csdn.net/download/m0…

Suitable crowd

With a certain Java foundation and a certain front-end foundation, want to complete the news recommendation (assuming you have opened) children’s shoe, god please detour ~

Knowledge involved

Back-end knowledge:

  • Basis:Java SE.Java Web
  • Advanced:Springboot.Mybatis

Front-end knowledge:

  • Basis:Html.CSS.JavaScript.jQuery.Ajax
  • Advanced:Webpack(Understand),Vue(Can use),less(with)

Must:

  • The first four speak
    • The backend:Java SE.Java Web
    • Front end:Html.CSS.JavaScript.jQuery.Ajax;vueYou can not
  • Lecture five begins
    • The backend:Springboot.Mybatis
    • Front end:Vue,scss/sassorless

Other:

  • Front and rear end separated interaction mode
  • Browser cross-domain principles and solutions

Considering that some of you may not know Vue technology, and some of you already have a very good Vue technology, this tutorial is divided into several front-end development modes, respectively, there are jQuery version, Vue basic version and Vue advanced version. The back end is also moving from basic JavaWeb (the first four chapters) to Springboot (the last chapter)

I. Environmental suggestions:

1. My suggested environment (summary) :

1. Back-end environment (recommended)

  • Java environment: JDk1.8 is the latest version

    • History version: www.oracle.com/java/techno… (You need to log in to Oracle to download. You can register yourself.)
  • Database environment: MySQL5.7 any version, strongly recommended to use integrated environment Phpstudy 64-bit MySQL, simpleuse, or the official version of MySQL, recommended to install the database on a solid state disk, but if the solid state disk is a system disk and small capacity is not necessary

    • PhpStudy: public. Xp. Cn/upgrades/ph…

    • MySQL5.7 original: install version: downloads.mysql.com/archives/in…

      The compressed version: downloads.mysql.com/archives/co…

  • Development environment: it is recommended to use Intellij IDEA 2021 or above version, preferably several versions in the latest version, at least to 2019; It is recommended to use the Toolbox APP to manage Jetbrains

    www.jetbrains.com/zh-cn/idea/…

  • Database software: Navicat 15 (recommended), DataGrip 2021 (memory intensive) or IDEA built-in database plug-in (standby)

2. Front-end Environment (recommended)

  • NodeJs environment: a newer version is needed for installing the vUE scaffold environment

  • Development environment: VS Code or Webstorm 2021.2 or above, why add.2, because Webstorm 2021.2 and later update pages automatically refresh the browser, equivalent to VS Code Live Server. If you don’t want to install more software, you can use IDEA because IDEA has all the features of Webstorm

  • Browser plug-in: can formatJSONPlugins for code (such as FeHelper),VueOfficial plug-in (for debuggingVueProject)

3, other

  • Interface debugging: Postman or ApiPost (domestic), the latter is recommended, Chinese interface, and Postman function is similar, but also has account synchronization function

  • Listening to music: net suppression cloud, while listening to music while tapping code, happy programming ~

My environment

1. Back-end environment

  • Java environment:Jdk1.8.0 _192
  • Database environment:PHPStudyIntegrated environmentMySQL5.7.265.7.26The speed of inserting data8.0It will be faster.)
  • Development environment:Intellij IDEA 2021.2
  • Database software:Navicat 15 + IDEABuilt-in database plug-in

2. Front-end environment

  • Node.JsEnvironment:v16.5.0
  • Development environment:Webstorm 2021.2, occasionally withVS Code
  • Browser plug-in:CSDNPlugin.VueThe official plug-in

3, the machine

  • Processor: Intel Core 8 generationi7
  • Memory:16GB DDR4 2400Mhz
  • Disk: the system disk is solid, the data disk is mechanical, and the database is mounted on the mechanical disk. The reason is to test the performance of the data insertion at a lower upper limit

3. News data sets

News data from 18 channels of Sohu news website, including domestic, international, sports, social and entertainment, during June to July 2012, providing URL and text information, with a total of 1.29 million records.

4. News data set files

News_origin_data. Dat (1.43 G) news raw data news_handled_data. Dat news (1.43 G) after processing the data (just in order to offer reference to here, and later code to generate the file)

Ali cloud disk: www.aliyundrive.com/s/4T4DJXLpH…

5. XML format of news data set data

<doc>
  <url>The page URL</url>
  <docno>Page ID</docno>
  <contenttitle>The page title</contenttitle>
  <content>The page content</content>
</doc>
Copy the code

2. Preparation:

1, inMySQLEstablish a database innews_demo

create database `news_demo`;
Copy the code

2. Create data tablesarticle

Serial number/Description The column name The data type instructions
1 id bigint Automatically grow primary keys
2 url varchar(500) Link to the website
3 docno varchar(50) The document number
4 title varchar(50) The article title
5 content text The article content

Note: Data table names, field names are generally all lowercase, more than one word with an underscore. For specific requirements, please refer to the Java code specification standard “Alibaba Java Development Manual”.

CREATE TABLE `article` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'Auto-grow primary key',
  `url` varchar(500) DEFAULT NULL COMMENT 'Link site',
  `docno` varchar(50) DEFAULT NULL COMMENT 'Document number',
  `title` varchar(50) DEFAULT NULL COMMENT 'Article Title',
  `content` text COMMENT 'Article content'.PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Copy the code

Note: the table engine is recommended to be MyISAM, which can greatly improve the news insertion speed compared to Innodb

3. Create projects in IDEA

  1. Create a new Java normal project,Select the project SDK, as well asMySQLSupport (select IDEA)MySQLSyntax and default display database tools, optional)

2. Name the project and select its location3. Create a project package in the root directorycom.example.demo, create a new lib directory as the external library directory of the project, and import itMySQL Database driver JAR packageMysql connector - Java - 5.1.47. Jar(Version 8.0 importedMysql connector - Java - 8.0.21. Jar), and right-click on the lib directory ->Add as Library New under project packageutilsPackage storage tool class, newDatabaseUtilThe configuration file is stored in the resources directory under the project package, because that’s what I wrote in my code, and it needs to be called db.properties

package com.example.demo.utils;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.sql.*;
import java.util.Objects;
import java.util.Properties;

/** * Database connection tool class */
public class DatabaseUtil {
  /** * Get a database connection **@returnDatabase connection */
  public static Connection getConnection(a) throws IOException {

    // Get the absolute path
    String absolutePath = Objects.requireNonNull(DatabaseUtil.class.getResource("")).getPath();
    absolutePath = new File(absolutePath).getParent();
    // The absolute path of the configuration file
    String filepath = absolutePath + "/resources/db.properties";
    // Read the configuration file
    FileInputStream in = new FileInputStream(filepath);

    // Parse the configuration file
    Properties properties = new Properties();
    properties.load(in);
    in.close();

    // Get the value of the database connection field in the configuration file
    String driver = properties.getProperty("driver");
    String url = properties.getProperty("url");
    String username = properties.getProperty("username");
    String password = properties.getProperty("password");

    // Get a database connection
    Connection connection = null;
    try {
      Class.forName(driver);
      connection = DriverManager.getConnection(url, username, password);
    } catch (Exception e) {
      e.printStackTrace();
    }
    return connection;
  }

  /** * Close database connection **@paramResultSet resultSet *@paramStatement Specifies the SQL executor@paramConnection to connect * /
  public static void close(ResultSet resultSet, Statement statement, Connection connection) {
    if(resultSet ! =null) {
      try {
        resultSet.close();
      } catch(SQLException e) { e.printStackTrace(); }}if(statement ! =null) {
      try {
        statement.close();
      } catch(SQLException e) { e.printStackTrace(); }}if(connection ! =null) {
      try {
        connection.close();
      } catch(SQLException e) { e.printStackTrace(); }}}public static void main(String[] args) throws IOException {
    // Test the database connectionSystem.out.print(DatabaseUtil.getConnection()); }}Copy the code

Create the resources package in the project package to store resources and create the db. Properties data connection configuration file to facilitate database connection configuration

Version 5.7 (selected here)

driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost:3306/news_demo? characterEncoding=utf8&useSSL=false&serverTimezone=UTC
username=root
password=root
Copy the code

Version 8.0

driver=com.mysql.cj.jdbc.Driver
url=jdbc:mysql://localhost:3306/news_demo? characterEncoding=utf8&useSSL=false&serverTimezone=UTC
username=root
password=root
Copy the code

Run DatabaseUtil’s main method to test the connection and it succeeded

com.mysql.jdbc.JDBC4Connection@5ce65a89
Copy the code

Create a new main package as the main package to write the code, create a new news import processing class HandleImportNews under main, then PSVM quickly generates the main method, and write some code to print the time of our business. Encapsulate the main business in the handleImportNews method

package com.example.demo.main;

/** * Process news data import into database */
public static void main(String[] args) throws Exception {
  // Calculate the start time
  long startTime = System.currentTimeMillis();
  System.out.println("Start the clock...");
  
  // Write the logic in the handleImportNews method
  handleImportNews();
  
  // Print time
  System.out.println("Insertion completed, time consuming" + (System.currentTimeMillis() - startTime) + "ms");
}
Copy the code

At this point, the preamble is ready

Write the insert news data

1. Write business logic

  1. Define methodshandleImportNewsIn which we write our main logic
public static void handleImportNews(a) throws Exception {
  // Main business logic
}
Copy the code
  1. Gets the file read buffer stream
// Pass in the news file path and encoding format to get the file stream
BufferedReader br = getBufferReader("F:\\news_data\\news_origin_data.dat"."gbk");
Copy the code

Wrap the BufferedReader construction logic inside the getBufferReader method, which we’re going to be using in the future, so wrap it inside a utility class, create a FileUtil class under the utils package, and cut the method inside

public static BufferedReader getBufferReader(String filePath, String encoding) throws IOException {
  return new BufferedReader(
    new InputStreamReader(
      new FileInputStream(filePath), encoding
    )
  );
}
Copy the code

It also adds a recycle method for closeBufferReader

public static void closeBufferReader(BufferedReader br) {
  if(br ! =null) {
    try {
      br.close();
    } catch(IOException e) { e.printStackTrace(); }}}Copy the code
  1. Get the database connection and prepareStatement
// Get a database connection
Connection connection = DatabaseUtil.getConnection();
PreparedStatement ps = connection.prepareStatement("");
Copy the code

Note: prepareStatement is used here, but we don’t use prepareStatement precompilation. We’ll talk about why we don’t use prepareStatement, but personally I prefer prepareStatement

  1. Define the prefix and suffix of the insert statement
// Define the prefix and suffix of the database insert statement
String sqlPrefix = "insert into `article`(url, docno, title, content) values";
StringBuilder sqlSuffix = new StringBuilder();
Copy the code

Question: What are insert prefixes and suffixes? Why do you define prefixes and suffixes for insert statements? Insert into table(column1, column2…) insert into table(column1, column2… values(value1, value2…) , (value1, value2…) , (value1, value2…) . ; Insert into table(column_1, column_2…) Values, the suffix refers to (value_1, value_2… Value_n), used to manually concatenate SQL statements, mainly to improve the insertion speed, we manually concatenate, although a bit of trouble, but the speed of the bar drop ~

The third method is the most important and most useful, and we will implement it throughout this tutorial. Please keep it in mind:

  1. Increase bulk_insert_buffer_size in the mysql configuration. The default value is 8M. It is recommended to change it to 100M

    bulk_insert_buffer_size=100M

  2. Rewrite all INSERT statements to insert delayed

    This INSERT delayed is different in that the result is returned immediately and the insert is processed in the background.

  3. Insert into tablename(column1, column2…) values(‘xxx’,’xxx’), (‘yyy’,’yyy’), (‘zzz’,’zzz’)… ;

Specific ways to improve the speed of insertion can be understood online

Question: manual splicingsqlStatement comparisonPreparedStatementThe placeholder?Precompiled form, which is faster?

Insert into table(column1, column2…) values(value1, value2…) , (value1, value2…) , (value1, value2…) . ; Insert into table(column1, column2…) values(value1, value2…) ; Insert into table(column1, column2…); insert into table(column1, column2…) values(? ,?) PreparedStatement PreparedStatement PreparedStatement PreparedStatement PreparedStatement PreparedStatement PreparedStatement PreparedStatement PreparedStatement PreparedStatement PreparedStatement However, the former requires manual concatenation, which can make the code less readable.

  1. The data is read line by line and judged every 6 lines

    5.1 Defining Variables

   / / line number
   int lineNumber = 0;
   // Store each news item
   String[] news = new String[6];
   // Store the content one line at a time
   String line = br.readLine();
Copy the code

Why does String[6] have size 6?

Answer: Because in the source data XML structure of news, every six acts as an article, we use an array to store an article, and then get an article, that is, when the line number goes to module 6 is 0, we process a news. At this time, we can take each part of the content according to the subscript

<! --news[0]:--><doc>
<! --news[1]:--><url>The page URL</url>
<! --news[2]:--><docno>Page ID</docno>
<! --news[3]:--><contenttitle>The page title</contenttitle>
<! --news[4]:--><content>The page content</content>
<! --news[5]:--></doc>
Copy the code

5.2 Loop through each article and insert into the database once every 1000 articles are processed

   // Scan the news line by line
   while(line ! =null) {
     // Assign the corresponding row of each story to a temporary array
     news[lineNumber++ % 6] = line;
   
     // Every six lines is a piece of news
     if (lineNumber % 6= =0) {
       // Get news headlines and content
       String newsTitle = getContent(news[3]."contenttitle");
       String newsContent = getContent(news[4]."content");
       // The news is qualified if the headline or content is not blank or blank
       if (newsTitle.length() > 0 && newsContent.length() > 0) {
         // Get the news link and document number
         String newsUrl = getContent(news[1]."url");
         String newsDocno = getContent(news[2]."docno");
         // Generate and splice suffixes
         sqlSuffix.append(generateSQLSuffix(newsUrl, newsDocno, newsTitle, newsContent));
         / / 1000
         if (lineNumber % 1000= =0) {
           // Insert into the database
           insertNews(ps, sqlPrefix, sqlSuffix);
           // Reinitialize the suffix
           sqlSuffix = newStringBuilder(); }}}// Read the next line
     line = br.readLine();
   }
Copy the code

Question: Why below

// Get the news link and document number
String newsUrl = getContent(news[1]."url");
String newsDocno = getContent(news[2]."docno");
Copy the code

I’m not going to say if here, right?

Answer: The assignment to newsUrl and newsDocno will be wasted if the if condition is not valid if the news headline or content is not empty or blank

Question: Can it be changed per 1000 pieces?

Answer: Of course, this value will affect the total time of insertion, according to the number of attempts to choose, you can try more, choose a suitable value, but generally this value should not be too large or too small, too large if mysql configuration Bulk_insert_buffer_size should also be changed to a little larger

Question: this codenews[lineNumber++ % 6] = line;How do you understand that?

News [lineNumber % 6] = line; lineNumber++; It’s just a combination of

5.3 Details improvement. After the end of the cycle, the remaining news articles less than 1000 should also be inserted, and then the resources should be recycled by the way

// Insert less than 1000 pieces of data
if (sqlSuffix.length() > 0) {
insertNews(ps, sqlPrefix, sqlSuffix);
}
// Recycle resources
DatabaseUtil.close(null, ps, connection);
FileUtil.closeBufferReader(br);
Copy the code
  1. Code involved in the function:

GetContent: Get the content in the XML tag, remove the Spaces at the beginning and end of the tag, turn the characters into half-characters, escape the single quote ‘and \ (avoid conflicts with the single quote in the insert statement, if the placeholder in the preparedStatement is used? It will be automatically escaped, but we want to insert quickly, so we can’t use placeholder form, so we do it manually)

Insert into article(title) values(‘ hello \\\’). Insert into article(title) values(‘ hello \\\’). Then cut this method under the NewsUtil class

public static String getContent(String text, String payload) {
  int payloadLength = payload.length();
  String content = text.substring(payloadLength + 2, text.length() - payloadLength - 3).trim();
  return StringUtil.toDbcCase(content) // Turn the full Angle symbol to half Angle, and replace '' with space
    .replace("\ \"."\ \ \ \") // Escape the backslash
    .replace("'"."\ \" "); // Escape single quotes
}
Copy the code

Here,payloadLoad, load, load, load, load, loadxmlIt’s just the name of the label, but I don’t know how to name it at that time, so I chose this one. We try to use readable English in the coding process

About the Stringutil. toDbcCase method:

Due to the original news character is all Angle character, including English alphanumeric punctuation, but in our daily life, these characters use half Angle, on the whole Angle of half Angle difference, actually it is easy to reflect from the visual effect, a whole Angle position of the characters can put two and a half Angle character, but ultimately, is to take up the difference between the number of bytes, the Angle of two bytes, The half corner takes one byte

Full Angle number:

1234567890

Half Angle of digital

1234567890

So, we’re going to convert, and we have a conversion function here, and I’ve defined it in StringUtil, so I’m going to create StringUtil under the utils package

public class StringUtil {
  /** * Full Angle turn half Angle **@paramInput Input string *@returnHalf-angle string */
  public static String toDbcCase(String input) {
    char[] c = input.toCharArray();
    for (int i = 0; i < c.length; i++) {
      if (c[i] == '\u3000' || c[i] == '\ue40c') {
        c[i] = ' ';
      } else if (c[i] > '\uFF00' && c[i] < '\uFF5F') {
        c[i] = (char) (c[i] - 65248); }}return newString(c); }}Copy the code

Here \u3000 refers to the full corner space, what \ue40c is? \ue40c is a character  that appears in the news data, but we don’t want it to appear on the page, so we’ll convert it to a space as well

  1. generateSQLSuffix: Generates each insert(value1, value2...), an insert represents a news article
public static String generateSQLSuffix(String newsUrl, String newsDocno, String newsTitle, String newsContent) {
  return "(" + newsUrl + "', '" + newsDocno + "', '" + newsTitle + "', '" + newsContent + "'),";
}
Copy the code
3. 'insertNews' : We do it in a batch, but we don't actually do it in a batch, so we can cache executeUpdate, but we can do it in a batch, but we just insert one long statement at a timeCopy the code
public static void insertNews(PreparedStatement ps, String sqlPrefix, StringBuilder sqlSuffix) throws SQLException {
  ps.addBatch(sqlPrefix + sqlSuffix.substring(0, sqlSuffix.length() - 1));
  ps.executeBatch();
}
Copy the code

At this point, the logic is written

2. Complete code

package com.example.demo.main;

import com.example.demo.utils.DatabaseUtil;
import com.example.demo.utils.FileUtil;
import com.example.demo.utils.NewsUtil;

import java.io.*;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.SQLException;

/** * Process news data import into database */
public class HandleImportNews {
  // Insert news
  public static void insertNews(PreparedStatement ps, String sqlPrefix, StringBuilder sqlSuffix) throws SQLException {
    ps.addBatch(sqlPrefix + sqlSuffix.substring(0, sqlSuffix.length() - 1));
    ps.executeBatch();
  }
  
  // Generate each insert
  public static String generateSQLSuffix(String newsUrl, String newsDocno, String newsTitle, String newsContent) {
    return "(" + newsUrl + "', '" + newsDocno + "', '" + newsTitle + "', '" + newsContent + "'),";
  }

  public static void handleImportNews(a) throws Exception {
    // Get the file stream
    BufferedReader br = FileUtil.getBufferReader("F:\\news_data\\news_origin_data.dat"."gbk");

    // Get a database connection
    Connection connection = DatabaseUtil.getConnection();
    PreparedStatement ps = connection.prepareStatement("");

    // Define the prefix and suffix of the database insert statement
    String sqlPrefix = "insert into article(url, docno, title, content) values";
    StringBuilder sqlSuffix = new StringBuilder();

    / / line number
    int lineNumber = 0;
    // Store each news item
    String[] news = new String[6];
    // Store the content one line at a time
    String line = br.readLine();
    // Scan the news line by line
    while(line ! =null) {
      // Assign to a temporary array
      news[lineNumber++ % 6] = line;

      // Every six are judged
      if (lineNumber % 6= =0) {
        String newsTitle = NewsUtil.getContent(news[3]."contenttitle");
        String newsContent = NewsUtil.getContent(news[4]."content");
        // Whether the news headline or content is not empty or blank
        if (newsTitle.length() > 0 && newsContent.length() > 0) {
          String newsUrl = NewsUtil.getContent(news[1]."url");
          String newsDocno = NewsUtil.getContent(news[2]."docno");
          sqlSuffix.append(generateSQLSuffix(newsUrl, newsDocno, newsTitle, newsContent));
          / / 1000
          if (lineNumber % 1000= =0) {
            insertNews(ps, sqlPrefix, sqlSuffix);
            sqlSuffix = newStringBuilder(); }}}// Read the next line
      line = br.readLine();
    }
    // Insert less than 1000 pieces of data
    if (sqlSuffix.length() > 0) {
      insertNews(ps, sqlPrefix, sqlSuffix);
    }
    // Recycle resources
    DatabaseUtil.close(null, ps, connection);
    FileUtil.closeBufferReader(br);
  }

  public static void main(String[] args) throws Exception {
    long startTime = System.currentTimeMillis();
    System.out.println("Start the clock...");
    handleImportNews();
    System.out.println("Insertion completed, time consuming" + (System.currentTimeMillis() - startTime) + "ms"); }}Copy the code

Run the main method

3. Running result

Start the clock... The insertion was completed, taking 54731msCopy the code

In total, 129,8155 pieces of data were inserted, taking 54 seconds and writing at around 40 megabits per second, um… That’s a lot of speed to insert a million pieces of data in about a minute. In my many tests, the average is around a minute, but the performance of different machines can vary. This speed is indeed ok.

4. Compiling retrieval experiments (searching related news according to keywords)

Main package under the new SearchNewsByKeyWordTest class, in the main method to write the basic logic logic is relatively simple, are basic operations, database search by fuzzy query, file search is to scan a news for a judgment

The main method writes the rough logic

public class SearchNewsByKeyWordTest {
  public static void main(String[] args) throws Exception {
    // 0. Define keywords
    String keyword = "NBA";
    System.out.println(The key words are: + keyword);

    // 1, test search in the database
    System.out.println(1, start SQL query:);
    long startTime = System.currentTimeMillis();
    testSearchBySQL(keyword);
    System.out.println("Take" + (System.currentTimeMillis() - startTime) + "毫秒");

    // 2, test search in file
    System.out.println("2, start file traversal query:");
    startTime = System.currentTimeMillis();
    testSearchByFile(keyword);
    System.out.println("Take" + (System.currentTimeMillis() - startTime) + "毫秒"); }}Copy the code

1, test in the database lookup

Simply execute the fuzzy query statement

public static void testSearchBySQL(String keyword) throws Exception {
  Connection connection = DatabaseUtil.getConnection();
  String sql = "select count(*) from article where title like '%" + keyword + "%' or content like '%" + keyword + "% '";
  PreparedStatement ps = connection.prepareStatement(sql);
  ResultSet resultSet = ps.executeQuery();
  while (resultSet.next()) {
    System.out.println("Total" + resultSet.getInt(1) + "Bar result");
  }
  DatabaseUtil.close(resultSet, ps, connection);
}
Copy the code

2, test to find in the file

public static void testSearchByFile(String keyword) throws IOException {
  // Get the file stream
  BufferedReader br = FileUtil.getBufferReader("F:\\news_data\\news_origin_data.dat"."gbk");

  int lineNumber = 0, matchingNumber = 0;
  String title = "", content = "";
  String line = br.readLine();
  while(line ! =null) {
    lineNumber++;
    // Read the title
    if (lineNumber % 6= =4) {
      title = line;
    }
    // Read the content
    if (lineNumber % 6= =5) {
      content = line;
    }
    // Retrieve every 6 rows
    if (lineNumber % 6= =0) {
      // Get the processed title and content
      String newsTitle = NewsUtil.getContent(title, "contenttitle");
      String newsContent = NewsUtil.getContent(content, "content");
      // Check whether the title and content are empty
      if(! newsTitle.equals("") && !newsContent.equals("")) {
        // Splice the title and content together to judge
        String searchText = newsTitle + "" +  newsContent;
        // If keywords are included, the counter is incremented by 1
        if (searchText.contains(keyword)) {
          matchingNumber++;
        }
      }
    }
    line = br.readLine();
  }
  System.out.println("Total" + matchingNumber + "Bar result");
  FileUtil.closeBufferReader(br);
}
Copy the code

3. Test results

The key words are: NBA 1. Start to execute SQL query: 7,130 results take 6502 ms 2. Start to execute file traversal query: 6,977 results take 15,685 msCopy the code

Why is the result number different here, and again, this problem appears in English, so what is the key feature of English? That’s the case difference. If you search NBA in the database, it will include NBA, NBA, NBA… So what do we do? We can solve this problem by turning the headlines and keywords to all caps when matching. Here is the solved code, and I will indicate the key points

// Retrieve every 6 rows
if (lineNumber % 6= =0) {
    // Get the processed title and content
    String newsTitle = NewsUtil.getContent(title, "contenttitle");
    String newsContent = NewsUtil.getContent(content, "content");
    // Check whether the title and content are empty
    if(! newsTitle.equals("") && !newsContent.equals("")) {
        // Splice the title and content together to judge
        String searchText = (newsTitle + "" +  newsContent).toUpperCase(); // <-- change here
        // If keywords are included, the counter is incremented by 1
        if (searchText.contains(keyword.toUpperCase())) { // <-- change herematchingNumber++; }}}Copy the code

Test it separately again:

2. Start the file traversal query: 7130 results and 23,252 millisecondsCopy the code

That’s fine now


5. Directory structure

Six, source reference matters needing attention

Since my project has put the idea configuration.ideaDirectory and.imlDeleted, so you open my code and you have to reconfigure yourselfjdkAnd, of course,ideaWill be prompted to installjdkYeah, you just take what you got, andlibDirectory also need to right-click to add to the library, as well as the language level of the project, it is recommended that you directly follow the tutorial steps to their own knocking ~