Solr full update/incremental update (to configure database)

1. Full update

See Solr to connect to the database. Md configuration, after completion:

Remember: clean=true, commit=true

2. Incremental updates

1. Understand the necessary attributes, as well as the database table creation requirements, and the data in dataimporter.properties and data-config.xml

<! Transformer format conversion: HTMLStripTransformer index ignores HTML tags --> <! -- query: query database table matches record data --> <! -- deltaQuery: deltaQuery: primary key ID --> -- deltaImportQuery: deltaImportQuery --> <! -- deletedPkQuery: delete primary key ID query --> Note that this only returns the ID fieldCopy the code

2. Precautions for configuring the database

1. If only the add and modify business is involved, then the database only needs to have an additional TIMpstamp field. The default value is the current system time, CURRENT_TIMESTAMP 2. If the deletion service is involved, an additional field isdelete should be added to the data. The value of int is 0,1 to identify whether the record is deletedCopy the code

3.dataimporter.properties

This configuration file is important because it keeps track of the current time and the last time it was modified, and it can find out which records have been added, modified, or deleted, and whether this record has been deletedCopy the code

4. Incremental update is to add some configuration on the basis of full update. Data-config. XML configuration is as follows:

The original configuration:

<? The XML version = "1.0" encoding = "utf-8"? > <dataConfig> <dataSource type="JdbcDataSource" driver="oracle.jdbc.OracleDriver" url="jdbc:oracle:thin:@localhost:1521:ORCL" user="duke" password="duke" /> <document> <entity name="userEntity" query="select ID, NAME,AGE from t_user"> <! <field column="ID" name="user_id"/> <field column=" name" name="user_name"/> <field column="AGE" name="user_age"/> </entity> </document> </dataConfig>Copy the code

Incremental update configuration:

<? The XML version = "1.0" encoding = "utf-8"? > <dataConfig> <dataSource type="JdbcDataSource" driver="oracle.jdbc.OracleDriver" url="jdbc:oracle:thin:@localhost:1521:ORCL" user="duke" password="duke" /> <document> <entity name="userEntity" query="select ID,NAME,AGE from t_user" deltaImportQuery ="select ID,NAME,AGE from t_user t where t.ID='${dataimport.delta.id}'" deltaQuery = "select ID from t_user t where t.LASTUPDATETIME='${dataimport.last_index_time}'" deletedPkQuery = "select ID from t_user"> <field column="ID" name="user_id"/> <field column="NAME" name="user_name"/> <field column="AGE" name="user_age"/> <field column="LASTUPDATETIME" name="user_lastUpdateTime"/> </entity> </document> </dataConfig> <! --> <! -- pk="ID" --> <! -- dataSource="mydb" --> <! -- name="myinfo" --> <! Select * from myinfo WHERE isdelete=0; select * from myinfo WHERE isdelete=0; Query for undeleted data (note that this query only works on the first full import, not incremental imports) --> <! -- deltaQuery="select productId from product where modifyTime > '${dataimporter.last_index_time}'" The ID of all records that have been modified can be queried by a modify, add, or delete operation (this query only works with incremental imports and only returns ID values) --> <! -- deletedPkQuery="select productId from product where isdelete=-1" -- delete productId ="select productId from product where isdelete=-1" Solr uses it to delete the corresponding data in the index (this query only works with incremental imports and only returns an ID value) --> <! -- deltaImportQuery="select * from product where productId = '${dih.delta. ProductId}'" -- deltaImportQuery="select * from product where productId = '${dih.delta. Update the index library based on the retrieved data, either by deleting, adding, or modifying it (this query only works with incremental imports and can return multiple field values, generally all columns) -->Copy the code

5. Manually incremental update through background management or manually update through browser

In the browser directly input web site: http://127.0.0.1:8983/solr/#/corename/dataimport//dataimport remember: clean = false, commit = true

Second, solR common automatic update mode

Method 1: Incremental update provided with Solr

1. Download the JAR packages apache-Solr-DatAIMPortScheduler. jar, Solr-DataimPorthandler -8.2.0.jar, and Solr-DataimPorthandler – Extras-8.2.0.jar In the web-INF \lib directory of the solr project.

Download at: link: pan.baidu.com/s/13p_Wxqtx… Extraction code: 6A9Y

2. Modify the web. XML file to configure listening and add:

   <listener>
          <listener-class>
                org.apache.solr.handler.dataimport.scheduler.ApplicationListener
          </listener-class>
    </listener>
Copy the code

3. Create dataimPort. properties In the solrHome directory (where solR data is stored), create a conf folder and create a dataimPort. properties file.

################################################# # # # dataimport scheduler properties # # # ################################################# # tosync or not to sync # 1- active; SyncEnabled =1 # which cores to schedule # ina multi-core environment you can decide Which cores you want syncronized # leave empty or comment it out if using single-core deployment Here is my custom core: Simple syncCores=active # solr server name or IP address # [defaults to localhost if empty Server =localhost # solr server port # [defaults to 80 if empty] Port =8089 # application name/context # [defaults to current ServletContextListener's context (app) name] # Params [mandatory] # REMAINDER of URL # Params =/ dataimPort? command=delta-import&clean=false&commit=true # schedule interval # number of minutes between two runs # [defaults to 30 If empty] # set interval=1 # set interval=1 # set interval=1 The five days; ReBuildIndexInterval =7200 reBuildIndexParams=/select Qt =/ dataimPort&Command = full-import&Clean =true&commit=true First actual execution time =reBuildIndexBeginTime+reBuildIndexInterval*60*1000; ReBuildIndexBeginTime =03:10:00 when the service is startedCopy the code

Finally, restart Solr, add a piece of data to the database, wait a minute, and then query. Because we’re set to listen once a minute

4. The contents of the data-config. XML file are as follows

 <?xml version="1.0" encoding="UTF-8"? >
<dataConfig>
	<dataSource 
	type="JdbcDataSource" 
	driver="oracle.jdbc.OracleDriver" 
	url="jdbc:oracle:thin:@localhost:1521:ORCL" 
	user="duke" 
	password="duke" /> 
	<document>
		<entity name="userEntity" query="select ID,NAME,AGE from t_user"
		deltaImportQuery ="select ID,NAME,AGE from t_user t where t.ID='${dataimport.delta.id}'"
		deltaQuery = "select ID from t_user t where t.LASTUPDATETIME='${dataimport.last_index_time}'"
		deletedPkQuery = "select ID from t_user">
			<field column="ID" name="user_id"/> 
			<field column="NAME" name="user_name"/> 
			<field column="AGE" name="user_age"/> 
			<field column="LASTUPDATETIME" name="user_lastUpdateTime"/> 
		</entity>
	</document>
</dataConfig>

			<! --> < span style = "max-width: 100%; clear: both;
             <! -- pk="ID" -- pk="ID" -->
              <! -- dataSource="mydb"
              <! -- name="myinfo" -->
               <! Select * from myinfo WHERE isdelete=0; select * from myinfo WHERE isdelete=0; Query for undeleted data (note that this query only works on the first full import, not incremental import) -->
           <! -- deltaQuery="select productId from product where modifyTime > '${dataimporter.last_index_time}'" Query all records that have been modified with ids that can be generated by modify, add, or delete operations (this query only works with incremental imports and only returns ID values) -->
           <! -- deletedPkQuery="select productId from product where isdelete=-1" -- delete productId ="select productId from product where isdelete=-1" Solr uses this query to delete the corresponding data in the index (this query only works with incremental imports and only returns an ID value) -->
           <! -- deltaImportQuery="select * from product where productId = '${dih.delta. ProductId}'" -- deltaImportQuery="select * from product where productId = '${dih.delta. Update the index library based on the retrieved data, either by deleting, adding, or modifying it (this query only works with incremental imports and can return multiple field values, generally all columns) -->
Copy the code

Method 2: Perform incremental updates using scheduled tasks in Windows

This method is used for Solr deployed under Tomcat.

1. Search for “Scheduled Tasks” in Windows Start

2. Create a new scheduled task

3. Set routine, triggers, and operations for scheduled tasks

4. Start a scheduled task

Using curl to simulate a request with curl

curl http://localhost:8089/solr/active/dataimport? command=delta-import^&clean=false^&commit=trueCopy the code

You need to download the configuration environment variable for Windows Curl. You need to download the environment variable for Windows Curl.

Link: pan.baidu.com/s/1CArPu0vd… Extraction code: 09KS (64-bit system, 32-bit can be downloaded from the official website)

Method 3: In Linux, use crontab to implement incremental updates

This method is used for Solr deployed under Tomcat.

1. Modify the crontab configuration file vim /etc/crontab

crontab -e
*/5 * * * * curl http://localhost:8089/solr/active/dataimport?command=delta-import^&clean=false^&commit=true
Copy the code

Note: Perform incremental index every 5 minutes (time can be adjusted according to actual services)