Introduction to Solr

Solr is introduced

What is a solr

Solr is apache’s top open source project. It is a full-text search server developed in Java and based on Lucene.

Solr provides more query statements than Lucene, it is extensible, configurable, and it optimizes Lucene’s performance.

How does Solr implement full text retrieval?

Indexing process: Solr clients (browsers, Java programs) can send POST requests to Solr servers. The request content is an XML document containing Field information, through which Solr implements index maintenance (adding, deleting, changing).

Search flow: The Solr client (browser, Java program) can send a GET request to the Solr server, and the Solr server returns an XML document.

Solr also has no view rendering capability.

Difference between Solr and Lucene

Lucene is a full text search engine toolkit, it is a JAR package, can not run independently, external services.

Solr is a full-text search server that can run in a servlet container and provide search and indexing separately. Solr is faster and more convenient than Lucene in developing full-text search functions.

Solr installation configuration

download

Solr and Lucene are updated at the same time. The latest version is 5.2.1

Download address: http://archive.apache.org/dist/lucene/solr/

The environment

JDK 1.7 or later Solr :4.10.3 mysql:5x Web server: Tomcat7

Initialize the database script

Solr installation configuration

Install tomcat
Copy solr.war from solr-4.10.3\example\webapps to tomcat webapps
Decompress the war package. After decompression, delete the war package
Add the Solr extension service pack. Copy the jar from solr-4.10.3\example\lib\ext to tomcat Solr Web-INF lib
Add log4j properties. Copy log4j.properties from solr-4.10.3\example\resources to apache-tomcat-7.0.57\webapps\solr\ web-INF \classes if the directory does not exist.
Specify the directory for Solrhome in web.xml

Solrcore installation

Solrcore and solrhome

Solrhome is the home directory of the Solr service. A Solrhome directory contains multiple Solrcore directories, and a Solrcore directory contains configuration files and data files required by a Solr instance.

Each Solrcore can independently provide external search and indexing services. There is no relationship between multiple Solrcores.

Directory structures for Solrcore and SolrHome

Solrhome: solr-4.10.3\example\solr solrcore: solr-4.10.3\example\solr\collection1 Contains configuration files, index file log information

The installation of solrcore

To install SolrCore, you need to install SolrHome first. Copy the files under SolrHome to the solrhome specified in web.xml

Solrcore configuration

The solrconfig. XML configuration file in the solrcore conf directory is used to configure the solrcore running information.

In this file, you configure three main tags: lib, Datadir, and requestHandler.

Lib label

Solrcore needs to add an extension dependency package, and specify the address of the dependency package through the lib tag

Solr.install. dir indicates the solrcore installation directory

Copy the contrib and dis directories in example to

Modifying lib tags

Datadir label

Each Solrcore has its own index file directory, which is in data under solrcore by default.

Data Contains the index index directory and tlog file directory. If you do not want to use the default directory, you can also change the index directory using solrconfig.xml as follows:

RequestHandler label

The requestHandler requests the handler and defines the access to the index and search. You can add, modify, and delete indexes by maintaining /update indexes.

Search the index by /select

<requestHandler name="/select" class="solr.SearchHandler">
    <! -- Set the default parameter values, you can change these parameters in the request address -->
    <lst name="defaults">
        <str name="echoParams">explicit</str>
        <int name="rows">10</int><! -- Display quantity -->
        <str name="wt">json</str><! -- Display format -->
        <str name="df">text</str><! -- Default search field -->
    </lst>
</requestHandler>
Copy the code

Solr interface

Dashboard

Dashboard, showing solr instance running time, version, system resources, JVM and other information. ,

Logging

Solr run log information

Cloud

Cloud is SolrCloud, SolrCloud (cluster), this menu is displayed when running in SolrCloud mode.

Core Admin

Solrcore management interface, where you can add solrcore instances.

java properties

Solr attributes in the JVM runtime environment.

Thread Dump

Display the current active thread information in Solr Server, and also trace the thread running stack information.

Core selector

Select a Solrcore for detailed operations as follows:

Analysis

This interface allows you to test the execution of index and search parsers. Note: In SOLr, parsers are bound to domain types.

dataimport

You can define a data import handler to import from a relational database into the SolR index library. The default configuration must be manually configured.

Document

By default, Solr updates the contents of the Document based on the ID (unique constraint) field. If the ID field cannot be found based on the ID value, solr performs the add operation. If it finds the ID field, solr updates the document.

This menu allows you to create, update, and delete indexes. The interface is as follows:

Overwrite =”true” : Solr replaces documents in XML if they already exist when indexing
CommitWithin =”1000″ : Solr commits a document every 1000 (1 second) milliseconds while indexing. In order to facilitate testing, you can also submit the Document immediately and add “” after it.

Query

Use /select to execute a search index. You must specify the q query condition.

More solrcore configuration

The benefits of configuring multiple SolrCores are as follows: 1. When performing SolrCloud, you must configure multiple Solrcores. 2. Different business modules can use different Solrcore to provide search and indexing services.

add

Step 1: Copy collection1 from solrhome to collection2
Modify core.properties in the solrcore directory

This completes the configuration of solrcore.

Basic use of Solr

schema.xml

In the schema. XML file, solrcore data information is mainly configured, including the definition of fields and fieldTypes. In Solr, fields and fieldTypes need to be defined before being used.

Field

Define the Field domain

Name: specifies the domain Name Type: specifies the domain Type Indexed Stored Required: specifies whether multiValued If multiValued, for example, if a commodity has multiple images and a Field stores multiple values, you must set multiValued to true.

dynamicField

Dynamic domain

Name: specifies the naming rule of the dynamic domain

uniqueKey

Specify a unique key

The ID is the domain name already defined in the Field tag, and the Field is set to Required to true.

There must be only one unique key in a schema.xml file

copyField

Replication domain

Source: indicates the domain name of the Source domain to be replicated. Dest: indicates the domain name of the destination domain

The target domain specified by dest must be multiValued to true.

FieldType

Define the type of the domain

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <! -- in this example, we will only use synonyms at query time <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> -->
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>
Copy the code

Name: specifies the Name of the domain Type. Class: specifies the solr Type of the domain Type. Analyzer: specifies the Analyzer Type

Chinese word divider

Use ikanalyzer for Chinese word segmentation

Step 1: copy the ikanalyzer jar package to the following directory
Step 2: copy the configuration files of ikanalyzer’s extended thesaurus to a directory
Configuration FieldType
Configure a field that uses a Chinese word split
Restart the tomcat

Configuring service Field

demand

Index the data of the Products table in jingdong case, so the corresponding field field needs to be defined.

Analysis of the configuration

You need to add pid, Name, catalog, catalog_NAME, price, Description, picture to the index database

FieldType: FieldType has been configured for Chinese word segmentation, so it is not needed.

Field: Pid: Since Pid is the unique key in the Products table and there is already a unique key configuration with an ID in Solr’s shema.xml, there is no need to redefine the Pid Field.

Name:

<! --> <field name="product_name" type="text_ik" indexed="true" stored="true"/>
Copy the code

The Catalog, catalog_name:

<! -- category ID --> <field name="product_catalog" type="string" indexed="true" stored="true"/ > <! --> <field name="product_catalog_name" type="string" indexed="true" stored="false"/>
Copy the code

Price:

<! --> <field name="product_price" type="float" indexed="true" stored="true"/>
Copy the code

Description:

<! --> <field name="product_description" type="text_ik" indexed="true" stored="false"/>
Copy the code

Picture:

<! --> <field name="product_picture" type="string" indexed="false" stored="true"/> 
Copy the code

<! --> <field name="product_keywords" type="text_ik" indexed="true" stored="true" multiValued="true"/ > <! Add the commodity name to the target domain --> <copyFieldsource="product_name" dest="product_keywords"/ > <! Add the product description to the target domain --> <copyFieldsource="product_description" dest="product_keywords"/>
Copy the code

Dataimport

The plug-in imports the results of specified SQL statements in the database into the Solr index library.

Step 1: Add jar package Dataimport jar package (solr-4.10.3\dist\ Solr-DataimPorthandler – Extras-4.10.3.jar) to copy to

Modify the solrconfig. XML file and add lib tagsThe lib dir = "${. Solr. Install dir:.. /.. }/contrib/dataimporthandler/lib" regex=".*\.jar" />

Mysql driver package copy mysql driver package to:

Modify the solrconfig. XML file and add lib tags<lib dir="${solr.install.dir:.. /.. }/contrib/db/lib" regex=".*\.jar" />

Step 2: Configure requesthandler In solrconfig. XML, add a Requesthandler for dataimPort

<requestHandler name="/dataimport"     class="org.apache.solr.handler.dataimport.DataImportHandler">
      <lst name="defaults">
          <str name="config">data-config.xml</str>
      </lst>
</requestHandler>
Copy the code

- Step 3: create data-config. XML in solrconfig. XML directory, create data-config. XML! [](http://pbzzkhjh1.bkt.clouddn.com/1c648865-30cb-4dbe-9133-525692211bf4.jpg) ```xml <dataConfig> <dataSourcetype="JdbcDataSource"
        driver="com.mysql.jdbc.Driver"
        url="jdbc:mysql://localhost:3306/taotao" 
        user="root"
        password="root"/>

    <document>
        <entity name="products" query="select pid,name,catalog,catalog_name,price,description,picture from products ">
            <field column="pid" name="id" />
            <field column="name" name="product_name" />
            <field column="catalog" name="product_catalog" />
            <field column="catalog_name" name="product_catalog_name" />
            <field column="price" name="product_price" />
            <field column="description" name="product_description" />
            <field column="picture" name="product_picture" />
        </entity>
    </document>
</dataConfig>
Copy the code

Step 4: Restart Tomcat

The use of solrj

What is the solrj

Solrj is the Java client for solr server

Environment to prepare

jdk ide tomcat solrj

Building engineering

Solrj’s dependencies and core packages
Solrj’s extended service pack

Index maintenance is performed using Solrj

Add/modify indexes

In Solr, there will always be a unique key in the index library, if a Document ID exists, then modify the operation, if not, then add the operation.

    @Test
    public void insertAndUpdateIndex(a) throws Exception {
        / / create HttpSolrServer
        HttpSolrServer server = new HttpSolrServer("http://localhost:8080/solr");
        // Create Document object
        SolrInputDocument doc = new SolrInputDocument();
        doc.addField("id"."c001");
        doc.addField("name"."solr test111");
        // Add the Document object to the index library
        server.add(doc);
        / / submit
        server.commit();
    }
Copy the code

Remove the index

Delete by the specified ID

Delete according to conditions

@Test
    public void deleteIndex(a) throws Exception {
        / / create HttpSolrServer
        HttpSolrServer server = new HttpSolrServer("http://localhost:8080/solr");

        // Drop the index with the specified ID
        // server.deleteById("c001");

        // Delete according to conditions
        server.deleteByQuery("id:c001");

        // Delete all
        server.deleteByQuery(: "* *");

        / / submit
        server.commit();
    }
Copy the code

Index of the query

A simple query

    @Test
    public void search01(a) throws Exception {
        / / create HttpSolrServer
        HttpSolrServer server = new HttpSolrServer("http://localhost:8080/solr");
        // Create SolrQuery
        SolrQuery query = new SolrQuery();
        // Enter the query criteria
        query.setQuery("Product_name: minion");
        // Execute the query and return the result
        QueryResponse response = server.query(query);
        // Get all matching results
        SolrDocumentList list = response.getResults();
        // Total number of matching results
        long count = list.getNumFound();
        System.out.println("Total number of matching results: + count);
        for (SolrDocument doc : list) {
            System.out.println(doc.get("id"));
            System.out.println(doc.get("product_name"));
            System.out.println(doc.get("product_catalog"));
            System.out.println(doc.get("product_price"));
            System.out.println(doc.get("product_picture"));
            System.out.println("= = = = = = = = = = = = = = = = = = = = ="); }}Copy the code

Complex queries

Solr’s query syntax

1. Q – Query keywords, required if all queries use *:*. The requested q is a string

Fq – (filter query); fQ – (filter query);

[product_price:[

3. The sort – sorting, format: sort = + < desc | asc > [, + < desc | asc >]… . Example:

4. Start – paging display, start to record subscripts, starting from 0

5. Rows – Specifies the maximum number of records to return results, in conjunction with start for paging. In actual development, you know the current page number and the number of pages displayed and then you figure out the starting subscript.

6. Fl – Specifies which fields to return, separated by commas or Spaces.

7. Df – Specifies a search Field

8. Wt – (writer type) specifies the output format, which can be XML, JSON, PHP, PHPS, or solr 1.3.

9. Whether hl is highlighted, set highlight Field, set format prefix and suffix.

code

    @Test
    public void search02(a) throws Exception {
        / / create HttpSolrServer
        HttpSolrServer server = new HttpSolrServer("http://localhost:8080/solr");
        // Create SolrQuery
        SolrQuery query = new SolrQuery();

        // Enter the query criteria
        query.setQuery("Product_name: minion");
        Set ("q", "product_name: minion "); // query.set("q", "product_name: minion ");

        // Set the filter criteria
        Query. AddFilterQuery (fq)
        query.setFilterQueries("product_price:[1 TO 10]");

        // Set the sort
        query.setSort("product_price", ORDER.asc);
        // Set the paging information (use the default)
        query.setStart(0);
        query.setRows(10);

        // Set the set of fields to display
        query.setFields("id,product_name,product_catalog,product_price,product_picture");

        // Set the default field
        query.set("df"."product_keywords");

        // Set the highlighting information
        query.setHighlight(true);
        query.addHighlightField("product_name");
        query.setHighlightSimplePre("<em>");
        query.setHighlightSimplePost("</em>");

        // Execute the query and return the result
        QueryResponse response = server.query(query);
        // Get all matching results
        SolrDocumentList list = response.getResults();
        // Total number of matching results
        long count = list.getNumFound();
        System.out.println("Total number of matching results: + count);

        // Get the highlighted information
        Map<String, Map<String, List<String>>> highlighting = response
                .getHighlighting();
        for (SolrDocument doc : list) {
            System.out.println(doc.get("id"));

            List<String> list2 = highlighting.get(doc.get("id")).get(
                    "product_name");
            if(list2 ! =null)
                System.out.println("Highlighted trade name:" + list2.get(0));
            else {
                System.out.println(doc.get("product_name"));
            }

            System.out.println(doc.get("product_catalog"));
            System.out.println(doc.get("product_price"));
            System.out.println(doc.get("product_picture"));
            System.out.println("= = = = = = = = = = = = = = = = = = = = ="); }}Copy the code

Solr is introduced

What is a solr

Difference between Solr and Lucene

Solr installation configuration

download

The environment

Initialize the database script

Solr installation configuration

Solrcore installation

Solrcore and solrhome

Directory structures for Solrcore and SolrHome

The installation of solrcore

Solrcore configuration

Lib label

Datadir label

RequestHandler label

Solr interface

Dashboard

Logging

Cloud

Core Admin

java properties

Thread Dump

Core selector

Analysis

dataimport

Document

Query

More solrcore configuration

Basic use of Solr

schema.xml

Field

dynamicField

uniqueKey

copyField

FieldType

Chinese word divider

Configuring service Field

demand

Analysis of the configuration

Dataimport

The use of solrj

What is the solrj

Environment to prepare

Building engineering

Index maintenance is performed using Solrj

Add/modify indexes

Remove the index

Delete by the specified ID

Delete according to conditions

Index of the query

A simple query

Complex queries

Related Posts

Iqiyi TensorFlow Serving Memory Leak optimization practice

JAVA – Two List collections find intersection, union, and difference templates

Audio and video communication services that save money, time and effort