What is Solr

Apache Solr is an open source search server, Solr is developed in Java language, mainly based on HTTP and Apache Lucene implementation. Apache Lucene is an efficient, Java-based full-text retrieval library.

Second, why use Solr

  • In the application of historical order query in the background of the company, the implementation of fuzzy query is LIKE ‘%something%’, with poor performance.
  • Keyword – based log content requires quick retrieval.
  • Other database fuzzy query optimization scheme.

Three, Solr characteristics

  • Advanced full text search capability
  • High capacity
  • Standards-based open interfaces (XML, JSON, HTTP) : Documents are added to a search collection using XML over HTTP, and queries to the collection are also implemented by receiving an XML/JSON response over HTTP
  • Provides a full-featured management interface that allows you to easily control your Solr instances
  • Easy to monitor
  • High stability and fault tolerance
  • Easy to configure, and do not lose flexibility and adaptation
  • A quasi-real-time index ensures that you can see the updated content in real time
  • Extensible plug-in architecture: New functionality can be easily added to Solr servers as plug-ins

How does Solr work

4.1. Web Management UI

The URL is: http://139.198.13.12:7000/solr/admin.html. Please note: for Solr5.5, be sure to add admin.html. If not, press Enter to return 404 (indicating that the page could not be found).

4.2 Installation and configuration of Solr server

4.2.1 Installing Solr service: The installation version is 5.5.4.

4.2.2. Build Core

To use Solr, you need to create Core that is similar to a database instance. For each Core, there is a folder created in the Solr Home directory with the same name as the Core:

4.2.3. Configure Core

Take PolicyCore used in Demo on the Solr server as an example to modify the following configuration files:

Solrconfig. XML and managed-schema are copied from configuration files of the same name located in [{Solr Home path}/configsets/basic_configs/conf], while data-config. XML comes from: Decompress the Solr server installation file solr-5.5.4. TGZ to obtain the folder name of solr-5.5.4. Then copy the db-data-config. XML file in solr-5.5.4/example/ example-dih /solr/db/conf to {solr Home Path}/configsets/basic_configs/conf and rename it data-config.xml.

Add the following content to the solrconfig. XML configuration file:

__Mon Dec 18 2017 11:11:46 GMT+0800 (CST)____Mon Dec 18 2017 11:11:46 GMT+0800 (CST)__<lib dir=".. /contrib/extraction/lib" regex=".*\.jar" /> <lib dir=".. /dist/" regex="solr-cell-\d.*\.jar" /> <lib dir=".. /contrib/clustering/lib/" regex=".*\.jar" /> <lib dir=".. /dist/" regex="solr-clustering-\d.*\.jar" /> <lib dir=".. /contrib/langid/lib/" regex=".*\.jar" /> <lib dir=".. /dist/" regex="solr-langid-\d.*\.jar" /> <lib dir=".. /contrib/velocity/lib" regex=".*\.jar" /> <lib dir=".. /dist/" regex="solr-velocity-\d.*\.jar" /> <lib dir=".. /dist/" regex="solr-dataimporthandler-\d.*\.jar" />__Mon Dec 18 2017 11:11:46 GMT+0800 (CST)____Mon Dec 18 2017 11:11:46  GMT+0800 (CST)__Copy the code

Dir: ${solr.data.dir:}

__Mon Dec 18 2017 11:11:46 GMT+0800 (CST)____Mon Dec 18 2017 11:11:46 GMT+0800 (CST)__<requestHandler name="/dataimport"  class="solr.DataImportHandler"> <lst name="defaults"> <str name="config">data-config.xml</str> </lst> </requestHandler>__Mon Dec 18 2017 11:11:46 GMT+0800 (CST)____Mon Dec 18 2017 11:11:46 GMT+0800 (CST)__Copy the code

Please see the following figure for the positions added above:

Modify the managed-schema file: add the following contents to the node:

__Mon Dec 18 2017 11:11:46 GMT+0800 (CST)____Mon Dec 18 2017 11:11:46 GMT+0800 (CST)__<fieldType name="textPolicy_ik" class="solr.TextField">
    <analyzer type="index" useSmart="false" class="org.wltea.analyzer.lucene.IKAnalyzer" />
    <analyzer type="query" useSmart="true" class="org.wltea.analyzer.lucene.IKAnalyzer" />
</fieldType>__Mon Dec 18 2017 11:11:46 GMT+0800 (CST)____Mon Dec 18 2017 11:11:46 GMT+0800 (CST)__Copy the code

Comment out the following configuration:

__Mon Dec 18 2017 11:11:46 GMT+0800 (CST)____Mon Dec 18 2017 11:11:46 GMT+0800 (CST)__<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />__Mon Dec 18 2017 11:11:46 GMT+0800 (CST)____Mon Dec 18 2017 11:11:46 GMT+0800 (CST)__Copy the code

Then add the following configuration to it:

__Mon Dec 18 2017 11:11:46 GMT+0800 (CST)____Mon Dec 18 2017 11:11:46 GMT+0800 (CST)__<field name="PolicyID" type="string" indexed="true" stored="true" required="true" multiValued="false" /> <field name="PolicyGroupID" type="long" indexed="true" stored="true" /> <field name="PolicyOperatorID" type="long" indexed="true" stored="true" /> <field name="PolicyOperatorName" type="textPolicy_ik" indexed="true" stored="true" omitNorms="true" /> <field name="PolicyCode" type="textPolicy_ik" indexed="true" stored="true" omitNorms="true" /> <field name="PolicyName" type="textPolicy_ik" indexed="true" stored="true" omitNorms="true" /> <field name="PolicyType" type="string" indexed="true" stored="true" /> <field name="TicketType" type="int" indexed="true" stored="true" /> <field name="FlightType" type="int" indexed="true" stored="true" /> <field name="DepartureDate" type="tdate" indexed="true" stored="true" default="NOW+8HOUR" /> <field name="ArrivalDate" type="tdate" indexed="true" stored="true" default="NOW+8HOUR" /> <field name="ReturnDepartureDate" type="tdate" indexed="true" stored="true" default="NOW+8HOUR" /> <field name="ReturnArrivalDate" type="tdate" indexed="true" stored="true" default="NOW+8HOUR" /> <field name="DepartureCityCodes" type="textPolicy_ik" indexed="true" stored="true" omitNorms="true" /> <field name="TransitCityCodes" type="textPolicy_ik" indexed="true" stored="true" omitNorms="true" /> <field name="ArrivalCityCodes" type="textPolicy_ik" indexed="true" stored="true" omitNorms="true" /> <field name="OutTicketType" type="int" indexed="true" stored="true" /> <field name="OutTicketStart" type="tdate" indexed="true"  stored="true" default="NOW+8HOUR" /> <field name="OutTicketEnd" type="tdate" indexed="true" stored="true" default="NOW+8HOUR" /> <field name="OutTicketPreDays" type="int" indexed="true" stored="true" /> <field name="Remark" type="textPolicy_ik" indexed="true" stored="true" omitNorms="true" /> <field name="Status" type="int" indexed="true" stored="true" /> <field name="SolrUpdatedTime" type="tdate" indexed="true" stored="true" default="NOW+8HOUR" /> <uniqueKey>PolicyID</uniqueKey>__Mon Dec 18 2017 11:11:46 GMT+0800 (CST)____Mon Dec 18 2017 11:11:46 GMT+0800 (CST)__Copy the code

Attribute Description:

  • Name: indicates the domain name.
  • Type: indicates the type of the domain. The value must match the type; otherwise, an error message is displayed. If a word split is required, pass a word divider name such as textPolicy_ik; In addition, the date is recommended to pass tDate, because it can speed up the range lookup.
  • Indexed whether or not to be indexed.
  • Stored: If I want to store it.
  • Required: Indicates whether the field is mandatory.
  • MultiValued: Whether there are more than one value. If set to multi-value, the values are stored as arrays.

Modify the data-config. XML file: Comment out the default dataConfig and add the following configuration to the annotated content:

__Mon Dec 18 2017 11:11:46 GMT+0800 (CST)____Mon Dec 18 2017 11:11:46 GMT+0800 (CST)__<dataConfig> <dataSource Driver = "com. Microsoft. Essentially. JDBC. SQLServerDriver" url = "JDBC: essentially: / / {used to} server IP address: {port, if port 1433 is the default, }; /> <document name="Info"> <entity name="Policy" dataSource="SolrDB" transformer="ClobTransformer" pk="PolicyID" query="SELECT [PolicyID], [PolicyGroupID], [PolicyOperatorID], [PolicyOperatorName], [PolicyCode], [PolicyName], [PolicyType], [TicketType], [FlightType], DATEADD(HOUR, 8, CAST([DepartureDate] AS DATETIME)) [DepartureDate], DATEADD(HOUR, 8, CAST([ArrivalDate] AS DATETIME)) [ArrivalDate], DATEADD(HOUR, 8, CAST([ReturnDepartureDate] AS DATETIME)) [ReturnDepartureDate], DATEADD(HOUR, 8, CAST([ReturnArrivalDate] AS DATETIME)) [ReturnArrivalDate], [DepartureCityCodes], [TransitCityCodes], [ArrivalCityCodes], [OutTicketType], [OutTicketStart], [OutTicketEnd], [OutTicketPreDays], [Remark], [Status], DATEADD(HOUR, 8, CAST([SolrUpdatedTime] AS DATETIME)) [SolrUpdatedTime] FROM [Policy]" deltaImportQuery="SELECT [PolicyID], [PolicyGroupID], [PolicyOperatorID], [PolicyOperatorName], [PolicyCode], [PolicyName], [PolicyType], [TicketType], [FlightType], DATEADD(HOUR, 8, CAST([DepartureDate] AS DATETIME)) [DepartureDate], DATEADD(HOUR, 8, CAST([ArrivalDate] AS DATETIME)) [ArrivalDate], DATEADD(HOUR, 8, CAST([ReturnDepartureDate] AS DATETIME)) [ReturnDepartureDate], DATEADD(HOUR, 8, CAST([ReturnArrivalDate] AS DATETIME)) [ReturnArrivalDate], [DepartureCityCodes], [TransitCityCodes], [ArrivalCityCodes], [OutTicketType], [OutTicketStart], [OutTicketEnd], [OutTicketPreDays], [Remark], [Status], DATEADD(HOUR, 8, CAST([SolrUpdatedTime] AS DATETIME)) [SolrUpdatedTime] FROM [Policy] WHERE PolicyID = '${dataimporter.delta.PolicyID}'" deltaQuery="SELECT [PolicyID] FROM [Policy] WHERE [SolrUpdatedTime] > '${dataimporter.last_index_time}'"> <field column="PolicyID" name="PolicyID"/> <field column="PolicyGroupID" name="PolicyGroupID"/> <field column="PolicyOperatorID" name="PolicyOperatorID"/> <field column="PolicyOperatorName" name="PolicyOperatorName"/> <field column="PolicyCode" name="PolicyCode"/> <field column="PolicyName" name="PolicyName"/> <field column="PolicyType"  name="PolicyType"/> <field column="TicketType" name="TicketType"/> <field column="FlightType" name="FlightType"/> <field column="DepartureDate" name="DepartureDate"/> <field column="ArrivalDate" name="ArrivalDate"/> <field column="ReturnDepartureDate" name="ReturnDepartureDate"/> <field column="ReturnArrivalDate" name="ReturnArrivalDate"/> <field column="DepartureCityCodes" name="DepartureCityCodes"/> <field column="TransitCityCodes" name="TransitCityCodes"/> <field column="ArrivalCityCodes" name="ArrivalCityCodes"/> <field column="OutTicketType" name="OutTicketType"/> <field column="OutTicketStart" name="OutTicketStart"/> <field column="OutTicketEnd" name="OutTicketEnd"/> <field column="OutTicketPreDays" name="OutTicketPreDays"/> <field column="Remark" name="Remark"/> <field column="Status" name="Status"/> <field column="SolrUpdatedTime" name="SolrUpdatedTime"/> </entity> </document> </dataConfig>__Mon Dec 18 2017 11:11:46 GMT+0800 (CST)____Mon Dec 18 2017 11:11:46 GMT+0800 (CST)__Copy the code

Attribute Description:

  • Query: Queries the matching record data in a database table.
  • DeltaImportQuery: indicates the query. The second query is to get the ID of the above step, and then get all its data. Based on the obtained data, the index library can be updated, possibly by deleting, adding, or modifying. This query works only for incremental imports and can return the value of multiple fields, generally all columns.
  • DeltaQuery: Queries the ids of all records that have been modified to the data that needs to be incremented by an index. This may be the result of a modify, add, or delete operation. This query only works on incremental imports and only returns an ID value.

Add a field and trigger to the Policy table of the SolrDB database

__Mon Dec 18 2017 11:11:46 GMT+0800 (CST)____Mon Dec 18 2017 11:11:46 GMT+0800 (CST)__USE [SolrDB] GO CREATE TRIGGER [dbo].[TR_Solr_UPDATE_Policy] ON [dbo].[Policy] FOR UPDATE, INSERT AS BEGIN IF UPDATE(PolicyID) OR UPDATE(PolicyGroupID) OR UPDATE(PolicyOperatorID) OR UPDATE(PolicyOperatorName) OR UPDATE(PolicyCode) OR UPDATE(PolicyName) OR UPDATE(PolicyType) OR UPDATE(TicketType) OR UPDATE(FlightType) OR UPDATE(DepartureDate) OR UPDATE(ArrivalDate) OR UPDATE(ReturnDepartureDate) OR UPDATE(ReturnArrivalDate) OR UPDATE(DepartureCityCodes) OR UPDATE(TransitCityCodes) OR UPDATE(ArrivalCityCodes) OR UPDATE(OutTicketType) OR UPDATE(OutTicketStart) OR UPDATE(OutTicketEnd) OR UPDATE(OutTicketPreDays) OR UPDATE(Remark) OR UPDATE(Status) BEGIN UPDATE dbo.Policy SET SolrUpdatedTime = GETDATE() FROM dbo.Policy p, inserted i WHERE p.PolicyID = i.PolicyID END END GO__Mon Dec 18 2017 11:11:46 GMT+0800 (CST)____Mon Dec 18 2017 11:11:46  GMT+0800 (CST)__Copy the code

4.4, SolrNet

SolrNet is the open-source version of Solr. NET client.

4.5. Periodically import full and incremental data from the database to Solr

Solr itself provides a scheduled incremental import function, but the apache-Solr-DataimPort Scheduler 1.0 version is no longer available on Solr5.5, unless the source code of apache-Solr-DataimPort Scheduler is modified. So, we adopted the following approach:

Firstly, the RESTful service of Job task scheduling is developed to implement periodic incremental data import and periodic full data import.

Then, configure all relevant contents in the self-developed [Job Centralized Management Platform], as shown in the figure below.

In this way, our JobServer will periodically request full/incremental import links in HTTP GET, HTTP POST or HTTP HEAD, thus achieving the function of scheduled full and incremental data import. Also, if you want to know how to implement full and incremental imports with SolrNet, refer to the FullDataImport() and DeltaDataImport() examples in the Demo code, respectively.

4.6 Quasi-real-time data import, delete and query

Use SolrNet’s CURD API. See Demo Add(), Delete(), and Query() for examples. Quasi-real-time data import is closer to real-time than periodic incremental data import. In practice, it is better to update database and Solr simultaneously through message queue.

5. Demo download and more information

  • SolrDemo can be downloaded at github.com/das2017/Sol…
  • Solr website: lucene.apache.org/solr/
  • Lucene official website: lucene.apache.org/
  • SolrNet official website: github.com/mausch/Solr…

The list of topics covered in this series is as follows. If you are interested, please pay attention:

  • Introduction: Small and medium-sized R&D team architecture practice three points
  • Cache Redis: Quick start and application of Redis
  • Message queue RabbitMQ: How to use the good news queue RabbitMQ?
  • Centralized log ELK: A centralized log ELK of architectural practices for small and medium r&d teams
  • Task scheduling Job: Task scheduling Job in the architectural practice of small and medium-sized R&D teams
  • Metrics: What does app monitoring do?
  • Microservices Framework MSA: This is what you want. NET stack microservice architecture practice
  • Solr
  • Distributed coordinator ZooKeeper
  • Small tools:
  • Dapper.NET/EmitMapper/AutoMapper/Autofac/NuGet
  • Release tool Jenkins
  • Overall architecture design: How to do e-commerce enterprise overall architecture?
  • Single project architecture design
  • Uniform application layering: How to standardize all application layering in a company?
  • Debugging tool WinDbg
  • Single sign-on (sso)
  • Enterprise Payment Gateway
  • “Article

The authors introduce

Yang Li has years of experience in Internet application system research and development. She used to work in Gooda Group and now works as the system architect of Zhongqing E-Travel. She is mainly responsible for the architecture design of the business system of the company’s R&D center, as well as the accumulation and training of new technologies. The current focus is on open source software, software architecture, microservices and big data.

Zhang Huiqing, an IT veteran of more than 10 years, has successively served as architect of Ctrip, chief architect of Gooda Group, CTO of Zhongqing E-Travel and led the upgrading and transformation of the technical architecture of the two companies. Focus on architecture and engineering efficiency, technology and business matching and integration, technology value and innovation.

Thanks to Yuta Tian guang for correcting this article.