First lecture text retrieval and database retrieval
Introduction (important!!)
Hello, I am Jack, the recommend system step-by-step tutorial to bring this news, learning and development for auxiliary news recommendation systems, this tutorial route based on documents provided by the news recommendation systems, of course, the document I will be prepared to give you, you follow to learn step by step, hope everybody can learn 😊 ~ of course because of my energy is limited, Please forgive me if there are any imperfections or omissions
The total process
Links to Supporting materials
Be sure to read the documentation first and then check out the tutorial below!! Each lecture corresponds to the task of a document. PS: Lecture 5 Correspondence > Information Retrieval and Recommendation Handout 06 Introduction to recommendation Systems V6.2 (main).pdf
Document links: download.csdn.net/download/m0…
Source link: download.csdn.net/download/m0…
Suitable crowd
With a certain Java foundation and a certain front-end foundation, want to complete the news recommendation (assuming you have opened) children’s shoe, god please detour ~
Knowledge involved
Back-end knowledge:
- Basis:
Java SE
.Java Web
- Advanced:
Springboot
.Mybatis
Front-end knowledge:
- Basis:
Html
.CSS
.JavaScript
.jQuery
.Ajax
- Advanced:
Webpack
(Understand),Vue
(Can use),less
(with)
Must:
- The first four speak
- The backend:
Java SE
.Java Web
- Front end:
Html
.CSS
.JavaScript
.jQuery
.Ajax
;vue
You can not
- The backend:
- Lecture five begins
- The backend:
Springboot
.Mybatis
- Front end:
Vue
,scss/sass
orless
- The backend:
Other:
- Front and rear end separated interaction mode
- Browser cross-domain principles and solutions
Considering that some of you may not know Vue technology, and some of you already have a very good Vue technology, this tutorial is divided into several front-end development modes, respectively, there are jQuery version, Vue basic version and Vue advanced version. The back end is also moving from basic JavaWeb (the first four chapters) to Springboot (the last chapter)
I. Environmental suggestions:
1. My suggested environment (summary) :
1. Back-end environment (recommended)
-
Java environment: JDk1.8 is the latest version
- History version: www.oracle.com/java/techno… (You need to log in to Oracle to download. You can register yourself.)
-
Database environment: MySQL5.7 any version, strongly recommended to use integrated environment Phpstudy 64-bit MySQL, simpleuse, or the official version of MySQL, recommended to install the database on a solid state disk, but if the solid state disk is a system disk and small capacity is not necessary
-
PhpStudy: public. Xp. Cn/upgrades/ph…
-
MySQL5.7 original: install version: downloads.mysql.com/archives/in…
The compressed version: downloads.mysql.com/archives/co…
-
-
Development environment: it is recommended to use Intellij IDEA 2021 or above version, preferably several versions in the latest version, at least to 2019; It is recommended to use the Toolbox APP to manage Jetbrains
www.jetbrains.com/zh-cn/idea/…
-
Database software: Navicat 15 (recommended), DataGrip 2021 (memory intensive) or IDEA built-in database plug-in (standby)
2. Front-end Environment (recommended)
-
NodeJs environment: a newer version is needed for installing the vUE scaffold environment
-
Development environment: VS Code or Webstorm 2021.2 or above, why add.2, because Webstorm 2021.2 and later update pages automatically refresh the browser, equivalent to VS Code Live Server. If you don’t want to install more software, you can use IDEA because IDEA has all the features of Webstorm
- Browser plug-in: can format
JSON
Plugins for code (such as FeHelper),Vue
Official plug-in (for debuggingVue
Project)
3, other
-
Interface debugging: Postman or ApiPost (domestic), the latter is recommended, Chinese interface, and Postman function is similar, but also has account synchronization function
-
Listening to music: net suppression cloud, while listening to music while tapping code, happy programming ~
My environment
1. Back-end environment
- Java environment:
Jdk1.8.0 _192
- Database environment:
PHPStudy
Integrated environmentMySQL5.7.26
(5.7.26
The speed of inserting data8.0
It will be faster.) - Development environment:
Intellij IDEA 2021.2
- Database software:
Navicat 15
+IDEA
Built-in database plug-in
2. Front-end environment
Node.Js
Environment:v16.5.0
- Development environment:
Webstorm 2021.2
, occasionally withVS Code
- Browser plug-in:
CSDN
Plugin.Vue
The official plug-in
3, the machine
- Processor: Intel Core 8 generation
i7
- Memory:
16GB DDR4 2400Mhz
- Disk: the system disk is solid, the data disk is mechanical, and the database is mounted on the mechanical disk. The reason is to test the performance of the data insertion at a lower upper limit
3. News data sets
News data from 18 channels of Sohu news website, including domestic, international, sports, social and entertainment, during June to July 2012, providing URL and text information, with a total of 1.29 million records.
4. News data set files
News_origin_data. Dat (1.43 G) news raw data news_handled_data. Dat news (1.43 G) after processing the data (just in order to offer reference to here, and later code to generate the file)
Ali cloud disk: www.aliyundrive.com/s/4T4DJXLpH…
5. XML format of news data set data
<doc>
<url>The page URL</url>
<docno>Page ID</docno>
<contenttitle>The page title</contenttitle>
<content>The page content</content>
</doc>
Copy the code
2. Preparation:
1, inMySQL
Establish a database innews_demo
create database `news_demo`;
Copy the code
2. Create data tablesarticle
Serial number/Description | The column name | The data type | instructions |
---|---|---|---|
1 | id | bigint | Automatically grow primary keys |
2 | url | varchar(500) | Link to the website |
3 | docno | varchar(50) | The document number |
4 | title | varchar(50) | The article title |
5 | content | text | The article content |
Note: Data table names, field names are generally all lowercase, more than one word with an underscore. For specific requirements, please refer to the Java code specification standard “Alibaba Java Development Manual”.
CREATE TABLE `article` (
`id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'Auto-grow primary key',
`url` varchar(500) DEFAULT NULL COMMENT 'Link site',
`docno` varchar(50) DEFAULT NULL COMMENT 'Document number',
`title` varchar(50) DEFAULT NULL COMMENT 'Article Title',
`content` text COMMENT 'Article content'.PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Copy the code
Note: the table engine is recommended to be MyISAM, which can greatly improve the news insertion speed compared to Innodb
3. Create projects in IDEA
- Create a new Java normal project,Select the project SDK, as well as
MySQL
Support (select IDEA)MySQL
Syntax and default display database tools, optional)
2. Name the project and select its location3. Create a project package in the root directorycom.example.demo
, create a new lib directory as the external library directory of the project, and import itMySQL
Database driver JAR packageMysql connector - Java - 5.1.47. Jar
(Version 8.0 importedMysql connector - Java - 8.0.21. Jar
), and right-click on the lib directory ->Add as Library
New under project packageutils
Package storage tool class, newDatabaseUtil
The configuration file is stored in the resources directory under the project package, because that’s what I wrote in my code, and it needs to be called db.properties
package com.example.demo.utils;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.sql.*;
import java.util.Objects;
import java.util.Properties;
/** * Database connection tool class */
public class DatabaseUtil {
/** * Get a database connection **@returnDatabase connection */
public static Connection getConnection(a) throws IOException {
// Get the absolute path
String absolutePath = Objects.requireNonNull(DatabaseUtil.class.getResource("")).getPath();
absolutePath = new File(absolutePath).getParent();
// The absolute path of the configuration file
String filepath = absolutePath + "/resources/db.properties";
// Read the configuration file
FileInputStream in = new FileInputStream(filepath);
// Parse the configuration file
Properties properties = new Properties();
properties.load(in);
in.close();
// Get the value of the database connection field in the configuration file
String driver = properties.getProperty("driver");
String url = properties.getProperty("url");
String username = properties.getProperty("username");
String password = properties.getProperty("password");
// Get a database connection
Connection connection = null;
try {
Class.forName(driver);
connection = DriverManager.getConnection(url, username, password);
} catch (Exception e) {
e.printStackTrace();
}
return connection;
}
/** * Close database connection **@paramResultSet resultSet *@paramStatement Specifies the SQL executor@paramConnection to connect * /
public static void close(ResultSet resultSet, Statement statement, Connection connection) {
if(resultSet ! =null) {
try {
resultSet.close();
} catch(SQLException e) { e.printStackTrace(); }}if(statement ! =null) {
try {
statement.close();
} catch(SQLException e) { e.printStackTrace(); }}if(connection ! =null) {
try {
connection.close();
} catch(SQLException e) { e.printStackTrace(); }}}public static void main(String[] args) throws IOException {
// Test the database connectionSystem.out.print(DatabaseUtil.getConnection()); }}Copy the code
Create the resources package in the project package to store resources and create the db. Properties data connection configuration file to facilitate database connection configuration
Version 5.7 (selected here)
driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost:3306/news_demo? characterEncoding=utf8&useSSL=false&serverTimezone=UTC
username=root
password=root
Copy the code
Version 8.0
driver=com.mysql.cj.jdbc.Driver
url=jdbc:mysql://localhost:3306/news_demo? characterEncoding=utf8&useSSL=false&serverTimezone=UTC
username=root
password=root
Copy the code
Run DatabaseUtil’s main method to test the connection and it succeeded
com.mysql.jdbc.JDBC4Connection@5ce65a89
Copy the code
Create a new main package as the main package to write the code, create a new news import processing class HandleImportNews under main, then PSVM quickly generates the main method, and write some code to print the time of our business. Encapsulate the main business in the handleImportNews method
package com.example.demo.main;
/** * Process news data import into database */
public static void main(String[] args) throws Exception {
// Calculate the start time
long startTime = System.currentTimeMillis();
System.out.println("Start the clock...");
// Write the logic in the handleImportNews method
handleImportNews();
// Print time
System.out.println("Insertion completed, time consuming" + (System.currentTimeMillis() - startTime) + "ms");
}
Copy the code
At this point, the preamble is ready
Write the insert news data
1. Write business logic
- Define methods
handleImportNews
In which we write our main logic
public static void handleImportNews(a) throws Exception {
// Main business logic
}
Copy the code
- Gets the file read buffer stream
// Pass in the news file path and encoding format to get the file stream
BufferedReader br = getBufferReader("F:\\news_data\\news_origin_data.dat"."gbk");
Copy the code
Wrap the BufferedReader construction logic inside the getBufferReader method, which we’re going to be using in the future, so wrap it inside a utility class, create a FileUtil class under the utils package, and cut the method inside
public static BufferedReader getBufferReader(String filePath, String encoding) throws IOException {
return new BufferedReader(
new InputStreamReader(
new FileInputStream(filePath), encoding
)
);
}
Copy the code
It also adds a recycle method for closeBufferReader
public static void closeBufferReader(BufferedReader br) {
if(br ! =null) {
try {
br.close();
} catch(IOException e) { e.printStackTrace(); }}}Copy the code
- Get the database connection and prepareStatement
// Get a database connection
Connection connection = DatabaseUtil.getConnection();
PreparedStatement ps = connection.prepareStatement("");
Copy the code
Note: prepareStatement is used here, but we don’t use prepareStatement precompilation. We’ll talk about why we don’t use prepareStatement, but personally I prefer prepareStatement
- Define the prefix and suffix of the insert statement
// Define the prefix and suffix of the database insert statement
String sqlPrefix = "insert into `article`(url, docno, title, content) values";
StringBuilder sqlSuffix = new StringBuilder();
Copy the code
Question: What are insert prefixes and suffixes? Why do you define prefixes and suffixes for insert statements? Insert into table(column1, column2…) insert into table(column1, column2… values(value1, value2…) , (value1, value2…) , (value1, value2…) . ; Insert into table(column_1, column_2…) Values, the suffix refers to (value_1, value_2… Value_n), used to manually concatenate SQL statements, mainly to improve the insertion speed, we manually concatenate, although a bit of trouble, but the speed of the bar drop ~
The third method is the most important and most useful, and we will implement it throughout this tutorial. Please keep it in mind:
-
Increase bulk_insert_buffer_size in the mysql configuration. The default value is 8M. It is recommended to change it to 100M
bulk_insert_buffer_size=100M
-
Rewrite all INSERT statements to insert delayed
This INSERT delayed is different in that the result is returned immediately and the insert is processed in the background.
-
Insert into tablename(column1, column2…) values(‘xxx’,’xxx’), (‘yyy’,’yyy’), (‘zzz’,’zzz’)… ;
Specific ways to improve the speed of insertion can be understood online
Question: manual splicingsql
Statement comparisonPreparedStatement
The placeholder?
Precompiled form, which is faster?
Insert into table(column1, column2…) values(value1, value2…) , (value1, value2…) , (value1, value2…) . ; Insert into table(column1, column2…) values(value1, value2…) ; Insert into table(column1, column2…); insert into table(column1, column2…) values(? ,?) PreparedStatement PreparedStatement PreparedStatement PreparedStatement PreparedStatement PreparedStatement PreparedStatement PreparedStatement PreparedStatement PreparedStatement PreparedStatement However, the former requires manual concatenation, which can make the code less readable.
-
The data is read line by line and judged every 6 lines
5.1 Defining Variables
/ / line number
int lineNumber = 0;
// Store each news item
String[] news = new String[6];
// Store the content one line at a time
String line = br.readLine();
Copy the code
Why does String[6] have size 6?
Answer: Because in the source data XML structure of news, every six acts as an article, we use an array to store an article, and then get an article, that is, when the line number goes to module 6 is 0, we process a news. At this time, we can take each part of the content according to the subscript
<! --news[0]:--><doc>
<! --news[1]:--><url>The page URL</url>
<! --news[2]:--><docno>Page ID</docno>
<! --news[3]:--><contenttitle>The page title</contenttitle>
<! --news[4]:--><content>The page content</content>
<! --news[5]:--></doc>
Copy the code
5.2 Loop through each article and insert into the database once every 1000 articles are processed
// Scan the news line by line
while(line ! =null) {
// Assign the corresponding row of each story to a temporary array
news[lineNumber++ % 6] = line;
// Every six lines is a piece of news
if (lineNumber % 6= =0) {
// Get news headlines and content
String newsTitle = getContent(news[3]."contenttitle");
String newsContent = getContent(news[4]."content");
// The news is qualified if the headline or content is not blank or blank
if (newsTitle.length() > 0 && newsContent.length() > 0) {
// Get the news link and document number
String newsUrl = getContent(news[1]."url");
String newsDocno = getContent(news[2]."docno");
// Generate and splice suffixes
sqlSuffix.append(generateSQLSuffix(newsUrl, newsDocno, newsTitle, newsContent));
/ / 1000
if (lineNumber % 1000= =0) {
// Insert into the database
insertNews(ps, sqlPrefix, sqlSuffix);
// Reinitialize the suffix
sqlSuffix = newStringBuilder(); }}}// Read the next line
line = br.readLine();
}
Copy the code
Question: Why below
// Get the news link and document number
String newsUrl = getContent(news[1]."url");
String newsDocno = getContent(news[2]."docno");
Copy the code
I’m not going to say if here, right?
Answer: The assignment to newsUrl and newsDocno will be wasted if the if condition is not valid if the news headline or content is not empty or blank
Question: Can it be changed per 1000 pieces?
Answer: Of course, this value will affect the total time of insertion, according to the number of attempts to choose, you can try more, choose a suitable value, but generally this value should not be too large or too small, too large if mysql configuration Bulk_insert_buffer_size should also be changed to a little larger
Question: this codenews[lineNumber++ % 6] = line;
How do you understand that?
News [lineNumber % 6] = line; lineNumber++; It’s just a combination of
5.3 Details improvement. After the end of the cycle, the remaining news articles less than 1000 should also be inserted, and then the resources should be recycled by the way
// Insert less than 1000 pieces of data
if (sqlSuffix.length() > 0) {
insertNews(ps, sqlPrefix, sqlSuffix);
}
// Recycle resources
DatabaseUtil.close(null, ps, connection);
FileUtil.closeBufferReader(br);
Copy the code
- Code involved in the function:
GetContent: Get the content in the XML tag, remove the Spaces at the beginning and end of the tag, turn the characters into half-characters, escape the single quote ‘and \ (avoid conflicts with the single quote in the insert statement, if the placeholder in the preparedStatement is used? It will be automatically escaped, but we want to insert quickly, so we can’t use placeholder form, so we do it manually)
Insert into article(title) values(‘ hello \\\’). Insert into article(title) values(‘ hello \\\’). Then cut this method under the NewsUtil class
public static String getContent(String text, String payload) {
int payloadLength = payload.length();
String content = text.substring(payloadLength + 2, text.length() - payloadLength - 3).trim();
return StringUtil.toDbcCase(content) // Turn the full Angle symbol to half Angle, and replace '' with space
.replace("\ \"."\ \ \ \") // Escape the backslash
.replace("'"."\ \" "); // Escape single quotes
}
Copy the code
Here,payload
Load, load, load, load, load, loadxml
It’s just the name of the label, but I don’t know how to name it at that time, so I chose this one. We try to use readable English in the coding process
About the Stringutil. toDbcCase method:
Due to the original news character is all Angle character, including English alphanumeric punctuation, but in our daily life, these characters use half Angle, on the whole Angle of half Angle difference, actually it is easy to reflect from the visual effect, a whole Angle position of the characters can put two and a half Angle character, but ultimately, is to take up the difference between the number of bytes, the Angle of two bytes, The half corner takes one byte
Full Angle number:
1234567890
Half Angle of digital
1234567890
So, we’re going to convert, and we have a conversion function here, and I’ve defined it in StringUtil, so I’m going to create StringUtil under the utils package
public class StringUtil {
/** * Full Angle turn half Angle **@paramInput Input string *@returnHalf-angle string */
public static String toDbcCase(String input) {
char[] c = input.toCharArray();
for (int i = 0; i < c.length; i++) {
if (c[i] == '\u3000' || c[i] == '\ue40c') {
c[i] = ' ';
} else if (c[i] > '\uFF00' && c[i] < '\uFF5F') {
c[i] = (char) (c[i] - 65248); }}return newString(c); }}Copy the code
Here \u3000 refers to the full corner space, what \ue40c is? \ue40c is a character that appears in the news data, but we don’t want it to appear on the page, so we’ll convert it to a space as well
generateSQLSuffix
: Generates each insert(value1, value2...)
, an insert represents a news article
public static String generateSQLSuffix(String newsUrl, String newsDocno, String newsTitle, String newsContent) {
return "(" + newsUrl + "', '" + newsDocno + "', '" + newsTitle + "', '" + newsContent + "'),";
}
Copy the code
3. 'insertNews' : We do it in a batch, but we don't actually do it in a batch, so we can cache executeUpdate, but we can do it in a batch, but we just insert one long statement at a timeCopy the code
public static void insertNews(PreparedStatement ps, String sqlPrefix, StringBuilder sqlSuffix) throws SQLException {
ps.addBatch(sqlPrefix + sqlSuffix.substring(0, sqlSuffix.length() - 1));
ps.executeBatch();
}
Copy the code
At this point, the logic is written
2. Complete code
package com.example.demo.main;
import com.example.demo.utils.DatabaseUtil;
import com.example.demo.utils.FileUtil;
import com.example.demo.utils.NewsUtil;
import java.io.*;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.SQLException;
/** * Process news data import into database */
public class HandleImportNews {
// Insert news
public static void insertNews(PreparedStatement ps, String sqlPrefix, StringBuilder sqlSuffix) throws SQLException {
ps.addBatch(sqlPrefix + sqlSuffix.substring(0, sqlSuffix.length() - 1));
ps.executeBatch();
}
// Generate each insert
public static String generateSQLSuffix(String newsUrl, String newsDocno, String newsTitle, String newsContent) {
return "(" + newsUrl + "', '" + newsDocno + "', '" + newsTitle + "', '" + newsContent + "'),";
}
public static void handleImportNews(a) throws Exception {
// Get the file stream
BufferedReader br = FileUtil.getBufferReader("F:\\news_data\\news_origin_data.dat"."gbk");
// Get a database connection
Connection connection = DatabaseUtil.getConnection();
PreparedStatement ps = connection.prepareStatement("");
// Define the prefix and suffix of the database insert statement
String sqlPrefix = "insert into article(url, docno, title, content) values";
StringBuilder sqlSuffix = new StringBuilder();
/ / line number
int lineNumber = 0;
// Store each news item
String[] news = new String[6];
// Store the content one line at a time
String line = br.readLine();
// Scan the news line by line
while(line ! =null) {
// Assign to a temporary array
news[lineNumber++ % 6] = line;
// Every six are judged
if (lineNumber % 6= =0) {
String newsTitle = NewsUtil.getContent(news[3]."contenttitle");
String newsContent = NewsUtil.getContent(news[4]."content");
// Whether the news headline or content is not empty or blank
if (newsTitle.length() > 0 && newsContent.length() > 0) {
String newsUrl = NewsUtil.getContent(news[1]."url");
String newsDocno = NewsUtil.getContent(news[2]."docno");
sqlSuffix.append(generateSQLSuffix(newsUrl, newsDocno, newsTitle, newsContent));
/ / 1000
if (lineNumber % 1000= =0) {
insertNews(ps, sqlPrefix, sqlSuffix);
sqlSuffix = newStringBuilder(); }}}// Read the next line
line = br.readLine();
}
// Insert less than 1000 pieces of data
if (sqlSuffix.length() > 0) {
insertNews(ps, sqlPrefix, sqlSuffix);
}
// Recycle resources
DatabaseUtil.close(null, ps, connection);
FileUtil.closeBufferReader(br);
}
public static void main(String[] args) throws Exception {
long startTime = System.currentTimeMillis();
System.out.println("Start the clock...");
handleImportNews();
System.out.println("Insertion completed, time consuming" + (System.currentTimeMillis() - startTime) + "ms"); }}Copy the code
Run the main method
3. Running result
Start the clock... The insertion was completed, taking 54731msCopy the code
In total, 129,8155 pieces of data were inserted, taking 54 seconds and writing at around 40 megabits per second, um… That’s a lot of speed to insert a million pieces of data in about a minute. In my many tests, the average is around a minute, but the performance of different machines can vary. This speed is indeed ok.
4. Compiling retrieval experiments (searching related news according to keywords)
Main package under the new SearchNewsByKeyWordTest class, in the main method to write the basic logic logic is relatively simple, are basic operations, database search by fuzzy query, file search is to scan a news for a judgment
The main method writes the rough logic
public class SearchNewsByKeyWordTest {
public static void main(String[] args) throws Exception {
// 0. Define keywords
String keyword = "NBA";
System.out.println(The key words are: + keyword);
// 1, test search in the database
System.out.println(1, start SQL query:);
long startTime = System.currentTimeMillis();
testSearchBySQL(keyword);
System.out.println("Take" + (System.currentTimeMillis() - startTime) + "毫秒");
// 2, test search in file
System.out.println("2, start file traversal query:");
startTime = System.currentTimeMillis();
testSearchByFile(keyword);
System.out.println("Take" + (System.currentTimeMillis() - startTime) + "毫秒"); }}Copy the code
1, test in the database lookup
Simply execute the fuzzy query statement
public static void testSearchBySQL(String keyword) throws Exception {
Connection connection = DatabaseUtil.getConnection();
String sql = "select count(*) from article where title like '%" + keyword + "%' or content like '%" + keyword + "% '";
PreparedStatement ps = connection.prepareStatement(sql);
ResultSet resultSet = ps.executeQuery();
while (resultSet.next()) {
System.out.println("Total" + resultSet.getInt(1) + "Bar result");
}
DatabaseUtil.close(resultSet, ps, connection);
}
Copy the code
2, test to find in the file
public static void testSearchByFile(String keyword) throws IOException {
// Get the file stream
BufferedReader br = FileUtil.getBufferReader("F:\\news_data\\news_origin_data.dat"."gbk");
int lineNumber = 0, matchingNumber = 0;
String title = "", content = "";
String line = br.readLine();
while(line ! =null) {
lineNumber++;
// Read the title
if (lineNumber % 6= =4) {
title = line;
}
// Read the content
if (lineNumber % 6= =5) {
content = line;
}
// Retrieve every 6 rows
if (lineNumber % 6= =0) {
// Get the processed title and content
String newsTitle = NewsUtil.getContent(title, "contenttitle");
String newsContent = NewsUtil.getContent(content, "content");
// Check whether the title and content are empty
if(! newsTitle.equals("") && !newsContent.equals("")) {
// Splice the title and content together to judge
String searchText = newsTitle + "" + newsContent;
// If keywords are included, the counter is incremented by 1
if (searchText.contains(keyword)) {
matchingNumber++;
}
}
}
line = br.readLine();
}
System.out.println("Total" + matchingNumber + "Bar result");
FileUtil.closeBufferReader(br);
}
Copy the code
3. Test results
The key words are: NBA 1. Start to execute SQL query: 7,130 results take 6502 ms 2. Start to execute file traversal query: 6,977 results take 15,685 msCopy the code
Why is the result number different here, and again, this problem appears in English, so what is the key feature of English? That’s the case difference. If you search NBA in the database, it will include NBA, NBA, NBA… So what do we do? We can solve this problem by turning the headlines and keywords to all caps when matching. Here is the solved code, and I will indicate the key points
// Retrieve every 6 rows
if (lineNumber % 6= =0) {
// Get the processed title and content
String newsTitle = NewsUtil.getContent(title, "contenttitle");
String newsContent = NewsUtil.getContent(content, "content");
// Check whether the title and content are empty
if(! newsTitle.equals("") && !newsContent.equals("")) {
// Splice the title and content together to judge
String searchText = (newsTitle + "" + newsContent).toUpperCase(); // <-- change here
// If keywords are included, the counter is incremented by 1
if (searchText.contains(keyword.toUpperCase())) { // <-- change herematchingNumber++; }}}Copy the code
Test it separately again:
2. Start the file traversal query: 7130 results and 23,252 millisecondsCopy the code
That’s fine now
5. Directory structure
Six, source reference matters needing attention
Since my project has put the idea configuration.idea
Directory and.iml
Deleted, so you open my code and you have to reconfigure yourselfjdk
And, of course,idea
Will be prompted to installjdk
Yeah, you just take what you got, andlib
Directory also need to right-click to add to the library, as well as the language level of the project, it is recommended that you directly follow the tutorial steps to their own knocking ~