FlexSearch v0.7.0

The new version is finally available. FlexSearch V0.7.0 is a modern re-implementation that was developed from the ground up. The result is improvements in every area, covering a large number of enhancements and improvements collected over the past three years.

This new version has good compatibility with the old version, but it may require some migration steps in the code.

FlexSearch outperforms all search libraries in terms of raw search speed and offers flexible search capabilities such as multi-field search, voice conversion, or partial matching.

It also provides the most memory-efficient index, depending on the options used. FlexSearch introduces a new scoring algorithm called “context indexing,” which is based on a pre-rated lexicon architecture and actually performs queries a million times faster than other libraries. FlexSearch also provides a non-blocking asynchronous processing model for you, as well as web workers, to execute any index update or query in parallel through dedicated balanced threads.

Supported platform:

The browser
node . js

Get the latest stable version (recommended)

Build	File	CDN
flexsearch.bundle.js	Download	Rawcdn.githack.com/nextapps-de…
flexsearch.light.js	Download	Rawcdn.githack.com/nextapps-de…
flexsearch.compact.js	Download	Rawcdn.githack.com/nextapps-de…
flexsearch.es5.js *	Download	Rawcdn.githack.com/nextapps-de…
ES6 Modules	Download	The /dist/module/ folder of this Github repository

“Flexsearch.es5.js” includes pollyfills that support EcmaScript 5.

Get the latest NPM

npm install flexsearch
Copy the code

Context search

Note: This feature is disabled by default because it expands memory usage. Read here for more information on and how to enable it.

FlexSearch has introduced a new scoring mechanism called Contextual Search, which was invented by Thomas Wilkerling, the author of the library. Contextual search takes queries to a whole new level, but also requires some extra memory (depending on the depth). The basic idea of this concept is to limit correlation by context, rather than calculate it by the entire distance of the corresponding document. In this way, contextual search can also improve the results of correlation queries based on large amounts of text data.

Load the library

There are three types of indexes:

Index is a flat, high-performance Index for storing ID-content pairs.
The Worker/WorkerIndex is also a flat index that stores the ID-Content-pair, but runs behind the scenes as a dedicated Worker thread.
Documents are multi-field indexes that can store complex JSON documents (there can also be working indexes).

Most people probably only need one or the other.

ES6 Modules (Browser):

import Index from "./index.js";
import Document from "./document.js";
import WorkerIndex from "./worker/index.js";

const index = new Index(options);
const document = new Document(options);
const worker = new WorkerIndex(options);
Bundle (Browser)
<html>
<head>
    <script src="js/flexsearch.bundle.js"></script>
</head>
...
Copy the code

Or via CDN:

< script SRC = "https://cdn.jsdelivr.net/gh/nextapps-de/[email protected]/dist/flexsearch.bundle.js" > < / script >Copy the code

AMD:

var FlexSearch = require("./flexsearch.js");
Load one of the builds from the folder dist within your html as a script and use as follows:

var index = new FlexSearch.Index(options);
var document = new FlexSearch.Document(options);
var worker = new FlexSearch.Worker(options);
Copy the code

Node.js

npm install flexsearch
Copy the code

In your code include as follows:

const { Index, Document, Worker } = require("flexsearch");

const index = new Index(options);
const document = new Document(options);
const worker = new Worker(options);
Copy the code

Basic usage and variations

index.add(id, text);
index.search(text);
index.search(text, limit);
index.search(text, options);
index.search(text, limit, options);
index.search(options);

document.add(doc);
document.add(id, doc);
document.search(text);
document.search(text, limit);
document.search(text, options);
document.search(text, limit, options);
document.search(options);

worker.add(id, text);
worker.search(text);
worker.search(text, limit);
worker.search(text, options);
worker.search(text, limit, options);
worker.search(text, limit, options, callback);
worker.search(options);
Copy the code

The worker inherits from the Index type, not the Document type. Thus, WorkerIndex basically works like a standard FlexSearch Index. Worker-support in documentation needs to be enabled by passing the appropriate option {worker: true} during creation.

Each method invoked on the Worker index is treated as an asynchronous method. You’ll get a Promise, or you can provide a callback function as the last argument.

Summary of the API

The global method

FlexSearch.registerCharset(name, charset)
FlexSearch.registerLanguage(name, language)

Index methods:

Index.add(id, string) *
Index.append(id, string) *
Index.update(id, string) *
Index.remove(id) *
Index.search(string, , ) *
Index.search(options) *
async Index.export(handler)
async Index.import(key, data)

WorkerIndex methods:

async Index.add(id, string)
async Index.append(id, string)
async Index.update(id, string)
async Index.remove(id)
async Index.search(string, , )
async Index.search(options)
async Index.export(handler) (WIP)
async Index.import(key, data) (WIP)

Document methods:

Document.add(, document) *
Document.append(, document) *
Document.update(, document) *
Document.remove(id || document) *
Document.search(string, , ) *
Document.search(options) *
async Document.export(handler)
async Document.import(key, data)

For each of these methods, there is an equivalent asynchronous method:

Async Version:

async .addAsync( … .)
async .appendAsync( … .)
async .updateAsync( … .)
async .removeAsync( … .)
async .searchAsync( … .)

The Async method will return a Promise, or you can pass a callback function as the last argument.

The methods export and import are always asynchronous, as are each method called based on the worker’s Index.

options

FlexSearch is highly customizable. Using the right options can really improve your results, as well as memory economy and query time.

The Index options

Option	Values	Description	Default
preset	“memory” “performance” “match” “score” “default”	Use the configuration file as a shortcut or as the basis for custom Settings.	“default”
tokenize	“strict” “forward” “reverse” “full”	Index pattern (marker). Select a built-in function or pass a custom marker function.	“strict”
cache	Boolean Number	Enables/disables and/or sets the capacity of cached entries. When passing a number as a limit,The cache automatically balances stored items based on their popularity. Note: When only “true” is used, the cache is unlimited and growth is unlimited.	false
resolution	Number	Set score Resolution	9
context	Enable/disable context indexing. When “true” is passed as a value, it accepts the default value of the context.	false
optimize	Boolean	When enabled, it uses a memory-optimized stack flow for the index.	true
boost	function(arr, str, int) => float	Custom enhancers to use when indexing content to indexes. The function has the signature ‘function (words[], term, index) => Float’. It takes three arguments, and you get an array of all the words, the current item and the index of the current item in the word array. You can apply your own calculations, for example, to the occurrence of a term and return the factor (<1 for reduced correlation, >1 for increased correlation). Note: This feature is currently only available with the tag bestower “strict”.	null
charset	Charset Payload String (key)	Provides a custom character set payload or a key that passes the built-in character set.	“latin”
language	Language Payload String (key)	Provides a custom language payload or a language shorthand flag (ISO-3166) to pass the built-in language.	null
encode	false “default” “simple” “balance” “advanced” “extra” function(str) => [words]	Encoding type. Select a built-in function or pass a custom encoding function.	“default”
stemmer	false String Function		false
filter	false String Function		false
matcher	false String Function		false
worker	Boolean	Enables/disables and sets the count of running worker threads.	false
document	Document Descriptor	Includes definitions for document indexing and storage.

The Context options

Option	Values	Description	Default
resolution	Number	Sets the scoring resolution of the context	1
depth	false Number	Enable/disable context index and set context association distance. Depth is the maximum number of words/tokens a term is considered relevant to.	1
bidirectional	Boolean		true

The Document options

Option	Values	Default
id	String	“id”
tag	false String	“tag”
index	String Array Array
store	Boolean String Array	false

Character set options

Option	Values	Description	Default
split	false RegExp String	When using non-custom taggers (built-in, e.g. Forward “). Use the string /char or use the regular expression (default :/\W+/).	`/[\W_]+/`
rtl	Boolean	Supports right – to – left encoding.	false
encode	function(str) => [words]	Custom encoding functions.	/lang/latin/default.js

Search options

Option	Values	Description	Default
limit	number	Set limits on the outcome.	100
offset	number	Apply offsets (skip items).	0
suggest	Boolean	Enable recommendations in the results.	false

Document search option

In addition to the index search options above, there are the following.

Option	Values	Description	Default
index	String Array Array
tag	String Array
enrich	Boolean	Populate the id in the result with the appropriate documentation.	false
bool	“and” “or”	Sets the logical operator to use when searching for multiple fields or tags.	“or”

Tokenizer(Prefix search)

Token generators also affect the memory required, such as query time and flexibility of partial matching. Try selecting the topmost of these markers to suit your needs:

Option	Description	Example	Memory Factor (n = length of word)
“strict”	Index whole words	`foobar`	* 1
“forward”	Incrementing the index word forward	`fo`obar `foob`ar	* n
“reverse”	Increments index words in both directions	foob`ar` fo`obar`	* 2n – 1
“full”	Index all possible combinations	fo`oba`r f`oob`ar	* n * (n – 1)

Encoders

Encoding also affects required memory, such as query time and speech matching. Try selecting the topmost of these encoders to meet your needs, or pass in a custom encoder:

Option	Description	False-Positives	The compression
false	Close the coding	no	0%
“default”	The situation is coded sensitively	no	0%
“simple”	Case-sensitive coded character set normalization	no	~ 3%
“balance”	Normalized text conversion for case-sensitive coded character sets	no	~ 30%
“advanced”	Case-sensitive coded character set normalized text conversion normalized speech	no	~ 40%
“extra”	Case-sensitive coded character set normalized text conversion normalized Soundex conversion	yes	~ 65%
function()	Pass custom encoding via function(string):[words]

use

Create an index

var index = new Index();
Copy the code

Create a new index and select a preset:

var index = new Index("performance");
Copy the code

Create a new index with custom options:

var index = new Index({
    charset: "latin:extra",
    tokenize: "reverse",
    resolution: 9
});
Copy the code

Create a new index and extend the presets with custom options:

var index = new FlexSearch({
    preset: "memory",
    tokenize: "forward",
    resolution: 5
});
Copy the code

Adds a text entry to the index

Every piece of content that should be added to the index needs an ID. If your content does not have an ID, you need to create one by passing an index or count or something else as an ID(a value of type number is strongly recommended). These ids are unique references to a given content. This is important when you update or add content with an existing ID. When you don’t need to worry about references, you can simply use a simple method like count++.

Index.add(id, string)

index.add(0, "John Doe");
Copy the code

Search project

Index.search(string | options, , )

index.search("John");
Copy the code

Limit the number of search results:

index.search("John", 10);
Copy the code

Check whether the ID that has been indexed exists

You can check if an ID has been indexed:

if(index.contain(1)){
    console.log("ID is already in index");
}
Copy the code

Async

You can call each method in an asynchronous version of it, such as index. AddAsync or index. SearchAsync.

You can assign callbacks to each asynchronous function:

index.addAsync(id, content, function(){
    console.log("Task Done");
});

index.searchAsync(query, function(result){
    console.log("Results: ", result);
});
Copy the code

Or instead of passing the callback, return a Promise:

index.addAsync(id, content).then(function(){
    console.log("Task Done");
});

index.searchAsync(query).then(function(result){
    console.log("Results: ", result);
});
Copy the code

Or use async await:

async function add(){
    await index.addAsync(id, content);
    console.log("Task Done");
}

async function search(){
    const results = await index.searchAsync(query);
    console.log("Results: ", result);
}
Copy the code

Additional content

You can add content to an existing index, such as:

index.append(id, content);
Copy the code

This does not overwrite the old index content as it does when the index is executed. Update (ID, content). Remember this index. Add (ID, Content) will also perform “updates” when the ID has already been indexed.

Additional content will have its own context and its own full parsing. Therefore, correlations are not stacked, but have their own context. Here’s an example:

index.add(0, "some index");
index.append(0, "some appended content");

index.add(1, "some text");
index.append(1, "index appended content");
Copy the code

When you query index.search(“index”), you get index ID 1 as the first item in the result, because the context appends data from 0 (not stacked to the old context), where “index” is the first item.

If you don’t want this behavior, just use standard indexes. Add (ID, content) and provide the full length of the content.

Updates items from the index

Index.update(id, string)

index.update(0, "Max Miller");
Copy the code

Removes an item from an index

Index.remove(id)

index.remove(0);
Copy the code

Add a custom compiler

Taggers break down words/terms into components or parts.

Define a private custom tag during creation/initialization:

var index = new FlexSearch({ tokenize: function(str){ return str.split(/\s-//g); }});Copy the code

The tokenizer function takes a string as an argument and must return an array of strings representing a word or term. In some languages, each character is a term and is not separated by Spaces.

Add language-specific stems and/or filters

Stemmer: Several linguistic variants of the same word (e.g., root). “Run” and “Run”)

Filter: a blacklist of words completely filtered from an index (e.g., “and”, “to”, or “be”)

Assign a private custom stem or filter during creation/initialization:

var index = new FlexSearch({
    stemmer: {
        // object {key: replacement}
        "ational": "ate",
        "tional": "tion",
        "enci": "ence",
        "ing": ""
    },
    filter: [
        // array blacklist
        "in",
        "into",
        "is",
        "isn't",
        "it",
        "it's"
    ]
});
Copy the code

Use custom filters, for example:

var index = new FlexSearch({ filter: function(value){ // just add values with length > 1 to the index return value.length > 1; }});Copy the code

Or assign stems/filters to the language globally:

Stemmers are passed as objects (key-value pairs) and filters as arrays.

FlexSearch.registerLanguage("us", {
    stemmer: { /* ... */ },
    filter:  [ /* ... */ ]
});
Copy the code

Support from right to left

When using RTL, set the tag bestower to at least “reverse” or “full”.

Simply set the field “RTL” to true and use a compatible marker:

var index = new Index({
    encode: str => str.toLowerCase().split(/[^a-z]+/),
    tokenize: "reverse",
    rtl: true
});
Copy the code

Index file (domain search)

File descriptor

Suppose our document has this data structure:

{ 
    "id": 0, 
    "content": "some text"
}
Copy the code

Old syntax FlexSearch V0.6.3 (no longer supported!) :

const index = new Document({
    doc: {
        id: "id",
        field: ["content"]
    }
});
Copy the code

The document descriptor has changed slightly, no longer having field branches, but just applying a higher level, so that key becomes the primary member of the option.

For the new syntax, the field “doc” was renamed to document and the field “field” was renamed to index:

const index = new Document({
    document: {
        id: "id",
        index: ["content"]
    }
});

index.add({ 
    id: 0, 
    content: "some text"
});
Copy the code

The field ID describes the location of the ID or unique key in the document. The default key gets the value ID by default when not passed, so you can shorten the example above to:

const index = new Document({
    document: {
        index: ["content"]
    }
});
Copy the code

A member index has a list of fields that you want to index from the document. When only one field is selected, you can pass a string. When the default key ID is also used, this is shortened to:

const index = new Document({ document: "content" });
index.add({ id: 0, content: "some text" });
Copy the code

Assuming you have several fields, you can add multiple fields to the index:

var docs = [{
    id: 0,
    title: "Title A",
    content: "Body A"
},{
    id: 1,
    title: "Title B",
    content: "Body B"
}];
Copy the code

const index = new Document({
    id: "id",
    index: ["title", "content"]
});
Copy the code

You can pass custom options for each field:

const index = new Document({
    id: "id",
    index: [{
        field: "title",
        tokenize: "forward",
        optimize: true,
        resolution: 9
    },{
        field:  "content",
        tokenize: "strict",
        optimize: true,
        resolution: 5,
        minlength: 3,
        context: {
            depth: 1,
            resolution: 3
        }
    }]
});
Copy the code

Field options are also inherited when passing global options, for example:

const index = new Document({
    tokenize: "strict",
    optimize: true,
    resolution: 9,
    document: {
        id: "id",
        index:[{
            field: "title",
            tokenize: "forward"
        },{
            field: "content",
            minlength: 3,
            context: {
                depth: 1,
                resolution: 3
            }
        }]
    }
});
Copy the code

Note: The context option in the field “Content” is also inherited by the corresponding field option, which is inherited by the global option.

Nested data fields (complex objects)

Suppose the document array looks more complex (with nested branches, etc.), for example:

{
  "record": {
    "id": 0,
    "title": "some title",
    "content": {
      "header": "some text",
      "footer": "some text"
    }
  }
}
Copy the code

Then use the colon-separated symbol “root:child” to define the hierarchy in the document descriptor:

const index = new Document({
    document: {
        id: "record:id",
        index: [
            "record:title",
            "record:content:header",
            "record:content:footer"
        ]
    }
});
Copy the code

Simply add the fields you want to query. Do not add fields to the index, only in the result (but not in the query). For this purpose, you can store documents independently of the index (see below).

When you want to query through a field, you must pass the exact key of the field you defined in the document as the field name (colon syntax) :

index.search(query, {
    index: [
        "record:title",
        "record:content:header",
        "record:content:footer"
    ]
});
Copy the code

Same as:

index.search(query, [
    "record:title",
    "record:content:header",
    "record:content:footer"
]);
Copy the code

Choice of field of use:

index.search([{
    field: "record:title",
    query: "some query",
    limit: 100,
    suggest: true
},{
    field: "record:title",
    query: "some other query",
    limit: 100,
    suggest: true
}]);
Copy the code

You can use different queries to perform searches through the same fields.

When passing field-specific options, you need to provide complete configuration for each field. They are not inherited like document descriptors.

Complex documents

Your document should follow two rules:

Documents cannot start with Array at the root index. This will introduce sequential data, which is not currently supported. For a solution for such data, see below.

[ // not allowed as document start!
  {
    "id": 0,
    "title": "title"
  }
]
Copy the code

The ID cannot be nested in an array (nor can the parent field be an array). This will introduce sequential data, which is not currently supported. For a solution for such data, see below.

{
  "records": [ // not allowed when ID or tag lives inside!
    {
      "id": 0,
      "title": "title"
    }
  ]
}
Copy the code

Here is an example of a supported complex document:

{
  "meta": {
    "tag": "cat",
    "id": 0
  },
  "contents": [
    {
      "body": {
        "title": "some title",
        "footer": "some text"
      },
      "keywords": ["some", "key", "words"]
    },
    {
      "body": {
        "title": "some title",
        "footer": "some text"
      },
      "keywords": ["some", "key", "words"]
    }
  ]
}
Copy the code

The corresponding document descriptor (when all fields should be indexed) looks like this:

const index = new Document({
    document: {
        id: "meta:id",
        tag: "meta:tag",
        index: [
            "contents[]:body:title",
            "contents[]:body:footer",
            "contents[]:keywords"
        ]
    }
});
Copy the code

Again, when searching, you must use the same colon-delimited string as the field definition.

index.search(query, { 
    index: "contents[]:body:title"
});
Copy the code

Unsupported documents (sequential data)

This example breaks both of the above rules:

[ // not allowed as document start!
  {
    "tag": "cat",
    "records": [ // not allowed when ID or tag lives inside!
      {
        "id": 0,
        "body": {
          "title": "some title",
          "footer": "some text"
        },
        "keywords": ["some", "key", "words"]
      },
      {
        "id": 1,
        "body": {
          "title": "some title",
          "footer": "some text"
        },
        "keywords": ["some", "key", "words"]
      }
    ]
  }
]
Copy the code

You need to apply some kind of structural normalization.

The solution to such a data structure looks like this:

const index = new Document({
    document: {
        id: "record:id",
        tag: "tag",
        index: [
            "record:body:title",
            "record:body:footer",
            "record:body:keywords"
        ]
    }
});

function add(sequential_data){

    for(let x = 0, data; x < sequential_data.length; x++){

        data = sequential_data[x];

        for(let y = 0, record; y < data.records.length; y++){

            record = data.records[y];

            index.add({
                id: record.id,
                tag: data.tag,
                record: record
            });
        }
    }  
}

// now just use add() helper method as usual:

add([{
    // sequential structured data
    // take the data example above
}]);
Copy the code

The first loop can be skipped when the document data has only one index as the outer array.

Add/update/delete documents to the index

Simply pass an array of documents (or a single object) to the index:

index.add(docs);
Copy the code

Update an index with a single object or an array of objects:

index.update({
    data:{
        id: 0,
        title: "Foo",
        body: {
            content: "Bar"
        }
    }
});
Copy the code

To remove a single object or array of objects from an index:

index.remove(docs);
Copy the code

When the ID is known, you can also simply delete it (faster):

index.remove(id);
Copy the code

Join/Append array

In the complex example above, the field keyword is an array, but the tag here does not have parentheses like the keyword []. It will also detect arrays, but instead of appending each entry to a new context, the array will be appended to a large string and added to the index.

The difference between these two methods of adding the contents of an array is the relevance of the search. When the syntax field [] is used to add each item of the array to its own context via append(), the correlation of the last item is concurrent with the first. When you leave parentheses in a symbol, it concatenates the array to a space-separated string. Here, the first item has the highest correlation and the last item has the lowest.

So assuming that the keywords in the above example are pre-ordered by their relevance in popularity, you want to keep that order (relevance information). For this purpose, do not add parentheses to symbols. Otherwise, it will accept entries in the new scoring context (the old order will be lost).

You can also use the open parenthesis notation for better performance and a smaller footprint. Use it when you don’t need the correlation granularity of an entry.

Domain search

Search all fields:

index.search(query);
Copy the code

Search for specific fields:

index.search(query, { index: "title" });
Copy the code

Search for a given set of fields:

index.search(query, { index: ["title", "content"] });
Copy the code

Same as:

index.search(query, ["title", "content"]);
Copy the code

Passing custom modifiers and queries to each field:

index.search([{
    field: "content",
    query: "some query",
    limit: 100,
    suggest: true
},{
    field: "content",
    query: "some other query",
    limit: 100,
    suggest: true
}]);
Copy the code

You can use different queries to perform searches through the same fields.

The result set

Pattern of result set:

fields[] => { field, result[] => { document }}

The first index is an array of fields to apply the query to. Each field has a record (object) with two attributes “field” and “result”. “Result” is also an array containing the results of this particular field. The result can be an array of ids or an array enriched with stored document data.

Non-rich result sets now look like:

[{
    field: "title",
    result: [0, 1, 2]
},{
    field: "content",
    result: [3, 4, 5]
}]
Copy the code

A rich result set now looks like:

[{
    field: "title",
    result: [
        { id: 0, doc: { /* document */ }},
        { id: 1, doc: { /* document */ }},
        { id: 2, doc: { /* document */ }}
    ]
},{
    field: "content",
    result: [
        { id: 3, doc: { /* document */ }},
        { id: 4, doc: { /* document */ }},
        { id: 5, doc: { /* document */ }}
    ]
}]
Copy the code

When using pluck instead of “field” you can explicitly select a field and get a flat expression:

index.search(query, { pluck: "title", enrich: true });
Copy the code

[
    { id: 0, doc: { /* document */ }},
    { id: 1, doc: { /* document */ }},
    { id: 2, doc: { /* document */ }}
]
Copy the code

This result set replaces “Boolean search”. Instead of applying bool logic to nested objects, you can apply your own logic dynamically on the result set. This opens up tremendous power in how you process results. As a result, the results in each field are no longer compressed into a single result. It retains important information, such as the name of the domain and the relevance of each domain’s results, which are no longer mixed.

By default, field searches apply queries with Boolean or logic. Each field has its own result for a given query.

There is one case where the bool attribute is still supported. When you want to convert the default “OR” logic from field search to “and”, for example:

index.search(query, { 
    index: ["title", "content"],
    bool: "and" 
});
Copy the code

You only get results that contain the query in both fields.

The Tag search

You can also get results from one or more tags when no query is passed:

index.search({ tag: ["cat", "dog"] });
Copy the code

In this case, the result set looks like:

[{
    tag: "cat",
    result: [ /* all cats */ ]
},{
    tag: "dog",
    result: [ /* all dogs */ ]
}]
Copy the code

Limit & Offset

By default, each query is limited to 100 entries. Borderless queries can cause problems. You need to set the limit as an option to resize.

You can set limits and offsets for each query:

index.search(query, { limit: 20, offset: 100 });
Copy the code

The size of the result set cannot be precomputed. This is a design limitation of FlexSearch. When you really need to count all the results you can page, just assign a high enough limit and return all the results, and manually apply your paging offsets (this also works on the server side). FlexSearch is fast enough that this is not a problem.

Document storage

Only document indexes can have storage. You can also do this by using a document index instead of a flat index when only the ID-Content pair is stored.

You can independently define which fields should be indexed and which fields should be stored. This allows you to index fields that should not be included in search results.

Do not use stores when an array of ids is sufficient as a result, or 2. You have stored the content/document somewhere else (outside the index).

When setting the Store property, you must include all fields that should be stored explicitly (similar to whitelists).

If the Store property is not set, the original document is stored as a standby document.

This adds the entire original content to the store:

const index = new Document({
    document: { 
        index: "content",
        store: true
    }
});

index.add({ id: 0, content: "some text" });
Copy the code

Access documents from internal storage

You can get index documents from store:

var data = index.get(1);
Copy the code

You can update/change the stored content directly without changing the index by:

index.set(1, data);
Copy the code

To update the storage and index, simply use index. Update indexes. Add or index.append.

When you execute a query, whether it’s a document index or a flat index, you get an array of ids.

You can choose rich query results with stored content automatically:

index.search(query, { enrich: true });
Copy the code

Your results now look like:

[{
    id: 0,
    doc: { /* content from store */ }
},{
    id: 1,
    doc: { /* content from store */ }
}]
Copy the code

Configuring storage (recommended)

This will add specific fields from the document to the store (ID is not required in the store):

const index = new Document({
    document: {
        index: "content",
        store: ["author", "email"]
    }
});

index.add(id, content);
Copy the code

You can independently configure what should be indexed and what should be stored. It is strongly recommended that you use it wherever possible.

Here is a useful example of configuring doc and store:

const index = new Document({
    document: { 
        index: "content",
        store: ["author", "email"] 
    }
});

index.add({
    id: 0,
    author: "Jon Doe",
    email: "[email protected]",
    content: "Some content for the index ..."
});
Copy the code

You can query the content and will get the stored value:

index.search("some content", { enrich: true });
Copy the code

Your results now look like:

[{
    field: "content",
    result: [{
        id: 0,
        doc: {
            author: "Jon Doe",
            email: "[email protected]",
        }
    }]
}]
Copy the code

The Author and Email fields are not indexed.

The chain

The simple chain method is as follows:

Var index = FlexSearch. The create (). AddMatcher ({' a ':' a '}), add (0, 'foo'). The add (1, "bar");Copy the code

index.remove(0).update(1, 'foo').add(2, 'foobar');
Copy the code

Score context

Create the index and use the default context:

var index = new FlexSearch({

    tokenize: "strict",
    context: true
});
Copy the code

Create an index and apply custom options to the context:

var index = new FlexSearch({

    tokenize: "strict",
    context: { 
        resolution: 5,
        depth: 3,
        bidirectional: true
    }
});
Copy the code

Context indexes actually only support the tag giver “strict”.

Context indexes require additional memory, depending on the depth.

Auto-balanced cache (by popularity)

You need to initialize the cache and its limits at index creation time:

const index = new Index({ cache: 100 });
Copy the code

const results = index.searchCache(query);
Copy the code

A common scenario for using caching is auto-complete or instant search as you type.

When passing a number as a limit, the cache automatically balances the stored items relative to their popularity.

When only “true” is used, the cache is unlimited and execution is actually 2-3 times faster (because there is no need to run the balancer).

Work parallel (Browser + Node.js)

The new worker model in V0.7.0 is divided into “fields” in the document (1 worker = 1 field index). In this way, the worker is able to solve the task (subtask) completely. The downside of this pattern is that they may not be perfectly balanced when storing content (fields may have different content lengths). On the other hand, there is no indication that balancing storage will bring any benefit (they all need the same amount).

When using a document index, simply apply the option “worker”:

const index = new Document({
    index: ["tag", "name", "title", "text"],
    worker: true
});

index.add({ 
    id: 1, tag: "cat", name: "Tom", title: "some", text: "some" 
}).add({
    id: 2, tag: "dog", name: "Ben", title: "title", text: "content" 
}).add({ 
    id: 3, tag: "cat", name: "Max", title: "to", text: "to" 
}).add({ 
    id: 4, tag: "dog", name: "Tim", title: "index", text: "index" 
});
Copy the code

Worker 1: { 1: "cat", 2: "dog", 3: "cat", 4: "dog" }
Worker 2: { 1: "Tom", 2: "Ben", 3: "Max", 4: "Tim" }
Worker 3: { 1: "some", 2: "title", 3: "to", 4: "index" }
Worker 4: { 1: "some", 2: "content", 3: "to", 4: "index" }
Copy the code

When you perform a field search across all fields, the task is perfectly balanced across all workers, which can independently solve their subtasks.

Index of the worker

As we saw above, the document automatically creates the worker for each field. You can also create WorkerIndex directly (similar to using Index instead of Document).

Used as an ES6 module:

import WorkerIndex from "./worker/index.js";
const index = new WorkerIndex(options);
index.add(1, "some")
     .add(2, "content")
     .add(3, "to")
     .add(4, "index");
Copy the code

Or when using a bound version:

var index = new FlexSearch.Worker(options);
index.add(1, "some")
     .add(2, "content")
     .add(3, "to")
     .add(4, "index");
Copy the code

Such a WorkerIndex works in much the same way as an Index instance is created.

WorkerIndex supports only asynchronous variants of all methods. This means that when you call index.search() on WorkerIndex, it will also be executed in async in the same way as index.searchAsync().

Worker thread (Node.js)

Node.js’s worker thread model is based on “worker threads” and works in exactly the same way:

const { Document } = require("flexsearch");

const index = new Document({
    index: ["tag", "name", "title", "text"],
    worker: true
});
Copy the code

Or create a single worker instance for a non-document index:

const { Worker } = require("flexsearch");
const index = new Worker({ options });
Copy the code

Worker asynchronous model (best practice)

A worker will always execute as async. In a query method call, you should always handle the returned promise(for example, using await) or pass a callback function as the last argument.

const index = new Document({
    index: ["tag", "name", "title", "text"],
    worker: true
});
Copy the code

All requests and subtasks will run in parallel (order “all completed tasks” by priority):

index.searchAsync(query, callback);
index.searchAsync(query, callback);
index.searchAsync(query, callback);
Copy the code

Also (prioritize all completed tasks):

index.searchAsync(query).then(callback);
index.searchAsync(query).then(callback);
index.searchAsync(query).then(callback);
Copy the code

Or when you only have one callback, simply use ‘promise.all ()’ when all requests are completed, which will also take precedence over ‘all completed tasks’ :

Promise.all([
    index.searchAsync(query),
    index.searchAsync(query),
    index.searchAsync(query)
]).then(callback);
Copy the code

In the callback function of promise.all (), you also get an array of results as the first argument to each query you enter.

When using await, you can prioritize the order (priority being “first task completed”), solve requests one by one, and just process subtasks in parallel:

await index.searchAsync(query);
await index.searchAsync(query);
await index.searchAsync(query);
Copy the code

The same goes for index.add(), index.append(), index.remove(), or index.update(). There is a special case where the library is not disabled, but needs to be kept in mind when using Workers.

When you call the “synchronized” version on a working index:

index.add(doc);
index.add(doc);
index.add(doc);
// contents aren't indexed yet,
// they just queued on the message channel 
Copy the code

Of course, you can do this, but remember that the main thread has no extra queues for distributed work tasks. Running these functions in a long loop internally floods the message channel with worker.postMessage(). Fortunately, browsers and Node.js will automatically handle these incoming tasks for you (as long as there’s enough free RAM available). When using the “synchronized” version on a working index, the content is not indexed on the next line, because by default all calls are treated as async.

It is recommended to use async versions and async/await to keep memory footprint low in long processes when adding/updating/removing large amounts of content (or high frequency) to an index.

Export / Import

Export

Exports changed slightly. Exports now consist of several smaller components rather than one big chunk. You need to pass a callback that takes two arguments “key” and “data”. This callback function is called by each section, for example:

index.export(function(key, data){ 
    
    // you need to store both the key and the data!
    // e.g. use the key for the filename and save your data
    
    localStorage.setItem(key, data);
});
Copy the code

Exporting data to localStorage is not a good practice, but you can choose to use it regardless of size. Exports are used primarily for use in Node.js or to store indexes that you want to delegate from the server to the client.

The size of the export corresponds to the memory consumption of the library. To reduce the size of the export, you must use a configuration with less memory footprint (use the table at the bottom to get information about the configuration and its memory allocation).

When your save program runs asynchronously, you must return a Promise:

index.export(function(key, data){ 
    
    return new Promise(function(resolve){
        
        // do the saving as async

        resolve();
    });
});
Copy the code

You cannot export additional tables for the “FastUpdate” feature. These tables have references, and when stored, they are fully serialized and become too large. Lib will handle these issues for you automatically. When data is imported, index fastUpdate is automatically disabled.

Import

Before importing data, you need to create indexes. For document indexes, provide the same document descriptor as the one used when exporting data. This configuration is not stored in the export.

var index = new Index({ ... });
Copy the code

To import data, just pass a key and data:

index.import(key, localStorage.getItem(key));
Copy the code

You need to import each key! Otherwise, the index will not work. You need to store the key in the export and use it for the import (the order of the keys may be different).

This is just a demo and not recommended, as you may have other keys in your localStorage that are not supported as imports:

var keys = Object.keys(localStorage);
for(let i = 0, key; i < keys.length (>); i++){    
    key = keys[i];
    index.import(key, localStorage.getItem(key));
}
Copy the code

Best Practices

Use numeric ID When adding content to an index, it is recommended to use numeric ID values as a reference. The byte length of the ID passed can significantly affect memory consumption. If this is not possible, you should consider using indexed tables and mapping ids to indexes, which becomes very important, especially when context indexes are used for large amounts of content.

When you can, try to divide content into categories and add them to your own index, for example:

var action = new FlexSearch();
var adventure = new FlexSearch();
var comedy = new FlexSearch();
Copy the code

This way, you can also provide different Settings for each category. This is actually the fastest way to perform a fuzzy search.

To make this solution more scalable, you can use a short helper:

var index = {};

function add(id, cat, content){
    (index[cat] || (
        index[cat] = new FlexSearch
    )).add(id, content);
}

function search(cat, query){
    return index[cat] ?
        index[cat].search(query) : [];
}
Copy the code

Add content to index:

add(1, "action", "Movie Title");
add(2, "adventure", "Movie Title");
add(3, "comedy", "Movie Title");
Copy the code

Execute query:

var results = search("action", "movie title"); / / -- > [1]Copy the code

Partitioning indexes by category can significantly improve performance.

🔹🔸◻️ FlexSearch V0.7.0