FlexSearch v0.7.0
The new version is finally available. FlexSearch V0.7.0 is a modern re-implementation that was developed from the ground up. The result is improvements in every area, covering a large number of enhancements and improvements collected over the past three years.
This new version has good compatibility with the old version, but it may require some migration steps in the code.
FlexSearch outperforms all search libraries in terms of raw search speed and offers flexible search capabilities such as multi-field search, voice conversion, or partial matching.
It also provides the most memory-efficient index, depending on the options used. FlexSearch introduces a new scoring algorithm called “context indexing,” which is based on a pre-rated lexicon architecture and actually performs queries a million times faster than other libraries. FlexSearch also provides a non-blocking asynchronous processing model for you, as well as web workers, to execute any index update or query in parallel through dedicated balanced threads.
Supported platform:
- The browser
- node . js
Get the latest stable version (recommended)
Build | File | CDN |
---|---|---|
flexsearch.bundle.js | Download | Rawcdn.githack.com/nextapps-de… |
flexsearch.light.js | Download | Rawcdn.githack.com/nextapps-de… |
flexsearch.compact.js | Download | Rawcdn.githack.com/nextapps-de… |
flexsearch.es5.js * | Download | Rawcdn.githack.com/nextapps-de… |
ES6 Modules | Download | The /dist/module/ folder of this Github repository |
- “Flexsearch.es5.js” includes pollyfills that support EcmaScript 5.
Get the latest NPM
npm install flexsearch
Copy the code
Context search
Note: This feature is disabled by default because it expands memory usage. Read here for more information on and how to enable it.
FlexSearch has introduced a new scoring mechanism called Contextual Search, which was invented by Thomas Wilkerling, the author of the library. Contextual search takes queries to a whole new level, but also requires some extra memory (depending on the depth). The basic idea of this concept is to limit correlation by context, rather than calculate it by the entire distance of the corresponding document. In this way, contextual search can also improve the results of correlation queries based on large amounts of text data.
Load the library
There are three types of indexes:
- Index is a flat, high-performance Index for storing ID-content pairs.
- The Worker/WorkerIndex is also a flat index that stores the ID-Content-pair, but runs behind the scenes as a dedicated Worker thread.
- Documents are multi-field indexes that can store complex JSON documents (there can also be working indexes).
Most people probably only need one or the other.
ES6 Modules (Browser):
import Index from "./index.js";
import Document from "./document.js";
import WorkerIndex from "./worker/index.js";
const index = new Index(options);
const document = new Document(options);
const worker = new WorkerIndex(options);
Bundle (Browser)
<html>
<head>
<script src="js/flexsearch.bundle.js"></script>
</head>
...
Copy the code
Or via CDN:
< script SRC = "https://cdn.jsdelivr.net/gh/nextapps-de/[email protected]/dist/flexsearch.bundle.js" > < / script >Copy the code
AMD:
var FlexSearch = require("./flexsearch.js");
Load one of the builds from the folder dist within your html as a script and use as follows:
var index = new FlexSearch.Index(options);
var document = new FlexSearch.Document(options);
var worker = new FlexSearch.Worker(options);
Copy the code
Node.js
npm install flexsearch
Copy the code
In your code include as follows:
const { Index, Document, Worker } = require("flexsearch");
const index = new Index(options);
const document = new Document(options);
const worker = new Worker(options);
Copy the code
Basic usage and variations
index.add(id, text);
index.search(text);
index.search(text, limit);
index.search(text, options);
index.search(text, limit, options);
index.search(options);
document.add(doc);
document.add(id, doc);
document.search(text);
document.search(text, limit);
document.search(text, options);
document.search(text, limit, options);
document.search(options);
worker.add(id, text);
worker.search(text);
worker.search(text, limit);
worker.search(text, options);
worker.search(text, limit, options);
worker.search(text, limit, options, callback);
worker.search(options);
Copy the code
The worker inherits from the Index type, not the Document type. Thus, WorkerIndex basically works like a standard FlexSearch Index. Worker-support in documentation needs to be enabled by passing the appropriate option {worker: true} during creation.
Each method invoked on the Worker index is treated as an asynchronous method. You’ll get a Promise, or you can provide a callback function as the last argument.
Summary of the API
The global method
- FlexSearch.registerCharset(name, charset)
- FlexSearch.registerLanguage(name, language)
Index methods:
- Index.add(id, string) *
- Index.append(id, string) *
- Index.update(id, string) *
- Index.remove(id) *
- Index.search(string, , ) *
- Index.search(options) *
- async Index.export(handler)
- async Index.import(key, data)
WorkerIndex methods:
- async Index.add(id, string)
- async Index.append(id, string)
- async Index.update(id, string)
- async Index.remove(id)
- async Index.search(string, , )
- async Index.search(options)
- async Index.export(handler) (WIP)
- async Index.import(key, data) (WIP)
Document methods:
- Document.add(, document) *
- Document.append(, document) *
- Document.update(, document) *
- Document.remove(id || document) *
- Document.search(string, , ) *
- Document.search(options) *
- async Document.export(handler)
- async Document.import(key, data)
- For each of these methods, there is an equivalent asynchronous method:
Async Version:
- async .addAsync( … .)
- async .appendAsync( … .)
- async .updateAsync( … .)
- async .removeAsync( … .)
- async .searchAsync( … .)
The Async method will return a Promise, or you can pass a callback function as the last argument.
The methods export and import are always asynchronous, as are each method called based on the worker’s Index.
options
FlexSearch is highly customizable. Using the right options can really improve your results, as well as memory economy and query time.
The Index options
Option | Values | Description | Default |
---|---|---|---|
preset | “memory” “performance” “match” “score” “default” | Use the configuration file as a shortcut or as the basis for custom Settings. | “default” |
tokenize | “strict” “forward” “reverse” “full” | Index pattern (marker). Select a built-in function or pass a custom marker function. | “strict” |
cache | Boolean Number | Enables/disables and/or sets the capacity of cached entries. When passing a number as a limit,The cache automatically balances stored items based on their popularity. Note: When only “true” is used, the cache is unlimited and growth is unlimited. | false |
resolution | Number | Set score Resolution | 9 |
context | Enable/disable context indexing. When “true” is passed as a value, it accepts the default value of the context. | false | |
optimize | Boolean | When enabled, it uses a memory-optimized stack flow for the index. | true |
boost | function(arr, str, int) => float | Custom enhancers to use when indexing content to indexes. The function has the signature ‘function (words[], term, index) => Float’. It takes three arguments, and you get an array of all the words, the current item and the index of the current item in the word array. You can apply your own calculations, for example, to the occurrence of a term and return the factor (<1 for reduced correlation, >1 for increased correlation). Note: This feature is currently only available with the tag bestower “strict”. | null |
charset | Charset Payload String (key) | Provides a custom character set payload or a key that passes the built-in character set. | “latin” |
language | Language Payload String (key) | Provides a custom language payload or a language shorthand flag (ISO-3166) to pass the built-in language. | null |
encode | false “default” “simple” “balance” “advanced” “extra” function(str) => [words] | Encoding type. Select a built-in function or pass a custom encoding function. | “default” |
stemmer | false String Function | false | |
filter | false String Function | false | |
matcher | false String Function | false | |
worker | Boolean | Enables/disables and sets the count of running worker threads. | false |
document | Document Descriptor | Includes definitions for document indexing and storage. |
The Context options
Option | Values | Description | Default |
---|---|---|---|
resolution | Number | Sets the scoring resolution of the context | 1 |
depth | false Number | Enable/disable context index and set context association distance. Depth is the maximum number of words/tokens a term is considered relevant to. | 1 |
bidirectional | Boolean | true |
The Document options
Option | Values | Description | Default |
---|---|---|---|
id | String | “id” | |
tag | false String | “tag” | |
index | String Array Array | ||
store | Boolean String Array | false |
Character set options
Option | Values | Description | Default |
---|---|---|---|
split | false RegExp String | When using non-custom taggers (built-in, e.g. Forward “). Use the string /char or use the regular expression (default :/\W+/). | /[\W_]+/ |
rtl | Boolean | Supports right – to – left encoding. | false |
encode | function(str) => [words] | Custom encoding functions. | /lang/latin/default.js |
Search options
Option | Values | Description | Default |
---|---|---|---|
limit | number | Set limits on the outcome. | 100 |
offset | number | Apply offsets (skip items). | 0 |
suggest | Boolean | Enable recommendations in the results. | false |
Document search option
- In addition to the index search options above, there are the following.
Option | Values | Description | Default |
---|---|---|---|
index | String Array Array | ||
tag | String Array | ||
enrich | Boolean | Populate the id in the result with the appropriate documentation. | false |
bool | “and” “or” | Sets the logical operator to use when searching for multiple fields or tags. | “or” |
Tokenizer(Prefix search)
Token generators also affect the memory required, such as query time and flexibility of partial matching. Try selecting the topmost of these markers to suit your needs:
Option | Description | Example | Memory Factor (n = length of word) |
---|---|---|---|
“strict” | Index whole words | foobar |
* 1 |
“forward” | Incrementing the index word forward | fo obar foob ar |
* n |
“reverse” | Increments index words in both directions | foobar foobar |
* 2n – 1 |
“full” | Index all possible combinations | fooba r foob ar |
* n * (n – 1) |
Encoders
Encoding also affects required memory, such as query time and speech matching. Try selecting the topmost of these encoders to meet your needs, or pass in a custom encoder:
Option | Description | False-Positives | The compression |
---|---|---|---|
false | Close the coding | no | 0% |
“default” | The situation is coded sensitively | no | 0% |
“simple” | Case-sensitive coded character set normalization | no | ~ 3% |
“balance” | Normalized text conversion for case-sensitive coded character sets | no | ~ 30% |
“advanced” | Case-sensitive coded character set normalized text conversion normalized speech | no | ~ 40% |
“extra” | Case-sensitive coded character set normalized text conversion normalized Soundex conversion | yes | ~ 65% |
function() | Pass custom encoding via *function(string):[words]* |
use
Create an index
var index = new Index();
Copy the code
Create a new index and select a preset:
var index = new Index("performance");
Copy the code
Create a new index with custom options:
var index = new Index({
charset: "latin:extra",
tokenize: "reverse",
resolution: 9
});
Copy the code
Create a new index and extend the presets with custom options:
var index = new FlexSearch({
preset: "memory",
tokenize: "forward",
resolution: 5
});
Copy the code
Adds a text entry to the index
Every piece of content that should be added to the index needs an ID. If your content does not have an ID, you need to create one by passing an index or count or something else as an ID(a value of type number is strongly recommended). These ids are unique references to a given content. This is important when you update or add content with an existing ID. When you don’t need to worry about references, you can simply use a simple method like count++.
Index.add(id, string)
index.add(0, "John Doe");
Copy the code
Search project
Index.search(string | options, , )
index.search("John");
Copy the code
Limit the number of search results:
index.search("John", 10);
Copy the code
Check whether the ID that has been indexed exists
You can check if an ID has been indexed:
if(index.contain(1)){
console.log("ID is already in index");
}
Copy the code
Async
You can call each method in an asynchronous version of it, such as index. AddAsync or index. SearchAsync.
You can assign callbacks to each asynchronous function:
index.addAsync(id, content, function(){
console.log("Task Done");
});
index.searchAsync(query, function(result){
console.log("Results: ", result);
});
Copy the code
Or instead of passing the callback, return a Promise:
index.addAsync(id, content).then(function(){
console.log("Task Done");
});
index.searchAsync(query).then(function(result){
console.log("Results: ", result);
});
Copy the code
Or use async await:
async function add(){
await index.addAsync(id, content);
console.log("Task Done");
}
async function search(){
const results = await index.searchAsync(query);
console.log("Results: ", result);
}
Copy the code
Additional content
You can add content to an existing index, such as:
index.append(id, content);
Copy the code
This does not overwrite the old index content as it does when the index is executed. Update (ID, content). Remember this index. Add (ID, Content) will also perform “updates” when the ID has already been indexed.
Additional content will have its own context and its own full parsing. Therefore, correlations are not stacked, but have their own context. Here’s an example:
index.add(0, "some index");
index.append(0, "some appended content");
index.add(1, "some text");
index.append(1, "index appended content");
Copy the code
When you query index.search(“index”), you get index ID 1 as the first item in the result, because the context appends data from 0 (not stacked to the old context), where “index” is the first item.
If you don’t want this behavior, just use standard indexes. Add (ID, content) and provide the full length of the content.
Updates items from the index
Index.update(id, string)
index.update(0, "Max Miller");
Copy the code
Removes an item from an index
Index.remove(id)
index.remove(0);
Copy the code
Add a custom compiler
Taggers break down words/terms into components or parts.
Define a private custom tag during creation/initialization:
var index = new FlexSearch({ tokenize: function(str){ return str.split(/\s-//g); }});Copy the code
The tokenizer function takes a string as an argument and must return an array of strings representing a word or term. In some languages, each character is a term and is not separated by Spaces.
Add language-specific stems and/or filters
Stemmer: Several linguistic variants of the same word (e.g., root). “Run” and “Run”)
Filter: a blacklist of words completely filtered from an index (e.g., “and”, “to”, or “be”)
Assign a private custom stem or filter during creation/initialization:
var index = new FlexSearch({
stemmer: {
// object {key: replacement}
"ational": "ate",
"tional": "tion",
"enci": "ence",
"ing": ""
},
filter: [
// array blacklist
"in",
"into",
"is",
"isn't",
"it",
"it's"
]
});
Copy the code
Use custom filters, for example:
var index = new FlexSearch({ filter: function(value){ // just add values with length > 1 to the index return value.length > 1; }});Copy the code
Or assign stems/filters to the language globally:
Stemmers are passed as objects (key-value pairs) and filters as arrays.
FlexSearch.registerLanguage("us", {
stemmer: { /* ... */ },
filter: [ /* ... */ ]
});
Copy the code
Support from right to left
When using RTL, set the tag bestower to at least “reverse” or “full”.
Simply set the field “RTL” to true and use a compatible marker:
var index = new Index({
encode: str => str.toLowerCase().split(/[^a-z]+/),
tokenize: "reverse",
rtl: true
});
Copy the code
Index file (domain search)
File descriptor
Suppose our document has this data structure:
{
"id": 0,
"content": "some text"
}
Copy the code
Old syntax FlexSearch V0.6.3 (no longer supported!) :
const index = new Document({
doc: {
id: "id",
field: ["content"]
}
});
Copy the code
The document descriptor has changed slightly, no longer having field branches, but just applying a higher level, so that key becomes the primary member of the option.
For the new syntax, the field “doc” was renamed to document and the field “field” was renamed to index:
const index = new Document({
document: {
id: "id",
index: ["content"]
}
});
index.add({
id: 0,
content: "some text"
});
Copy the code
The field ID describes the location of the ID or unique key in the document. The default key gets the value ID by default when not passed, so you can shorten the example above to:
const index = new Document({
document: {
index: ["content"]
}
});
Copy the code
A member index has a list of fields that you want to index from the document. When only one field is selected, you can pass a string. When the default key ID is also used, this is shortened to:
const index = new Document({ document: "content" });
index.add({ id: 0, content: "some text" });
Copy the code
Assuming you have several fields, you can add multiple fields to the index:
var docs = [{
id: 0,
title: "Title A",
content: "Body A"
},{
id: 1,
title: "Title B",
content: "Body B"
}];
Copy the code
const index = new Document({
id: "id",
index: ["title", "content"]
});
Copy the code
You can pass custom options for each field:
const index = new Document({
id: "id",
index: [{
field: "title",
tokenize: "forward",
optimize: true,
resolution: 9
},{
field: "content",
tokenize: "strict",
optimize: true,
resolution: 5,
minlength: 3,
context: {
depth: 1,
resolution: 3
}
}]
});
Copy the code
Field options are also inherited when passing global options, for example:
const index = new Document({
tokenize: "strict",
optimize: true,
resolution: 9,
document: {
id: "id",
index:[{
field: "title",
tokenize: "forward"
},{
field: "content",
minlength: 3,
context: {
depth: 1,
resolution: 3
}
}]
}
});
Copy the code
Note: The context option in the field “Content” is also inherited by the corresponding field option, which is inherited by the global option.
Nested data fields (complex objects)
Suppose the document array looks more complex (with nested branches, etc.), for example:
{
"record": {
"id": 0,
"title": "some title",
"content": {
"header": "some text",
"footer": "some text"
}
}
}
Copy the code
Then use the colon-separated symbol “root:child” to define the hierarchy in the document descriptor:
const index = new Document({
document: {
id: "record:id",
index: [
"record:title",
"record:content:header",
"record:content:footer"
]
}
});
Copy the code
Simply add the fields you want to query. Do not add fields to the index, only in the result (but not in the query). For this purpose, you can store documents independently of the index (see below).
When you want to query through a field, you must pass the exact key of the field you defined in the document as the field name (colon syntax) :
index.search(query, {
index: [
"record:title",
"record:content:header",
"record:content:footer"
]
});
Copy the code
Same as:
index.search(query, [
"record:title",
"record:content:header",
"record:content:footer"
]);
Copy the code
Choice of field of use:
index.search([{
field: "record:title",
query: "some query",
limit: 100,
suggest: true
},{
field: "record:title",
query: "some other query",
limit: 100,
suggest: true
}]);
Copy the code
You can use different queries to perform searches through the same fields.
When passing field-specific options, you need to provide complete configuration for each field. They are not inherited like document descriptors.
Complex documents
Your document should follow two rules:
- Documents cannot start with Array at the root index. This will introduce sequential data, which is not currently supported. For a solution for such data, see below.
[ // not allowed as document start!
{
"id": 0,
"title": "title"
}
]
Copy the code
- The ID cannot be nested in an array (nor can the parent field be an array). This will introduce sequential data, which is not currently supported. For a solution for such data, see below.
{
"records": [ // not allowed when ID or tag lives inside!
{
"id": 0,
"title": "title"
}
]
}
Copy the code
Here is an example of a supported complex document:
{
"meta": {
"tag": "cat",
"id": 0
},
"contents": [
{
"body": {
"title": "some title",
"footer": "some text"
},
"keywords": ["some", "key", "words"]
},
{
"body": {
"title": "some title",
"footer": "some text"
},
"keywords": ["some", "key", "words"]
}
]
}
Copy the code
The corresponding document descriptor (when all fields should be indexed) looks like this:
const index = new Document({
document: {
id: "meta:id",
tag: "meta:tag",
index: [
"contents[]:body:title",
"contents[]:body:footer",
"contents[]:keywords"
]
}
});
Copy the code
Again, when searching, you must use the same colon-delimited string as the field definition.
index.search(query, {
index: "contents[]:body:title"
});
Copy the code
Unsupported documents (sequential data)
This example breaks both of the above rules:
[ // not allowed as document start!
{
"tag": "cat",
"records": [ // not allowed when ID or tag lives inside!
{
"id": 0,
"body": {
"title": "some title",
"footer": "some text"
},
"keywords": ["some", "key", "words"]
},
{
"id": 1,
"body": {
"title": "some title",
"footer": "some text"
},
"keywords": ["some", "key", "words"]
}
]
}
]
Copy the code
You need to apply some kind of structural normalization.
The solution to such a data structure looks like this:
const index = new Document({
document: {
id: "record:id",
tag: "tag",
index: [
"record:body:title",
"record:body:footer",
"record:body:keywords"
]
}
});
function add(sequential_data){
for(let x = 0, data; x < sequential_data.length; x++){
data = sequential_data[x];
for(let y = 0, record; y < data.records.length; y++){
record = data.records[y];
index.add({
id: record.id,
tag: data.tag,
record: record
});
}
}
}
// now just use add() helper method as usual:
add([{
// sequential structured data
// take the data example above
}]);
Copy the code
The first loop can be skipped when the document data has only one index as the outer array.
Add/update/delete documents to the index
Simply pass an array of documents (or a single object) to the index:
index.add(docs);
Copy the code
Update an index with a single object or an array of objects:
index.update({
data:{
id: 0,
title: "Foo",
body: {
content: "Bar"
}
}
});
Copy the code
To remove a single object or array of objects from an index:
index.remove(docs);
Copy the code
When the ID is known, you can also simply delete it (faster):
index.remove(id);
Copy the code
Join/Append array
In the complex example above, the field keyword is an array, but the tag here does not have parentheses like the keyword []. It will also detect arrays, but instead of appending each entry to a new context, the array will be appended to a large string and added to the index.
The difference between these two methods of adding the contents of an array is the relevance of the search. When the syntax field [] is used to add each item of the array to its own context via append(), the correlation of the last item is concurrent with the first. When you leave parentheses in a symbol, it concatenates the array to a space-separated string. Here, the first item has the highest correlation and the last item has the lowest.
So assuming that the keywords in the above example are pre-ordered by their relevance in popularity, you want to keep that order (relevance information). For this purpose, do not add parentheses to symbols. Otherwise, it will accept entries in the new scoring context (the old order will be lost).
You can also use the open parenthesis notation for better performance and a smaller footprint. Use it when you don’t need the correlation granularity of an entry.
Domain search
Search all fields:
index.search(query);
Copy the code
Search for specific fields:
index.search(query, { index: "title" });
Copy the code
Search for a given set of fields:
index.search(query, { index: ["title", "content"] });
Copy the code
Same as:
index.search(query, ["title", "content"]);
Copy the code
Passing custom modifiers and queries to each field:
index.search([{
field: "content",
query: "some query",
limit: 100,
suggest: true
},{
field: "content",
query: "some other query",
limit: 100,
suggest: true
}]);
Copy the code
You can use different queries to perform searches through the same fields.
The result set
Pattern of result set:
fields[] => { field, result[] => { document }}
The first index is an array of fields to apply the query to. Each field has a record (object) with two attributes “field” and “result”. “Result” is also an array containing the results of this particular field. The result can be an array of ids or an array enriched with stored document data.
Non-rich result sets now look like:
[{
field: "title",
result: [0, 1, 2]
},{
field: "content",
result: [3, 4, 5]
}]
Copy the code
A rich result set now looks like:
[{
field: "title",
result: [
{ id: 0, doc: { /* document */ }},
{ id: 1, doc: { /* document */ }},
{ id: 2, doc: { /* document */ }}
]
},{
field: "content",
result: [
{ id: 3, doc: { /* document */ }},
{ id: 4, doc: { /* document */ }},
{ id: 5, doc: { /* document */ }}
]
}]
Copy the code
When using pluck instead of “field” you can explicitly select a field and get a flat expression:
index.search(query, { pluck: "title", enrich: true });
Copy the code
[
{ id: 0, doc: { /* document */ }},
{ id: 1, doc: { /* document */ }},
{ id: 2, doc: { /* document */ }}
]
Copy the code
This result set replaces “Boolean search”. Instead of applying bool logic to nested objects, you can apply your own logic dynamically on the result set. This opens up tremendous power in how you process results. As a result, the results in each field are no longer compressed into a single result. It retains important information, such as the name of the domain and the relevance of each domain’s results, which are no longer mixed.
By default, field searches apply queries with Boolean or logic. Each field has its own result for a given query.
There is one case where the bool attribute is still supported. When you want to convert the default “OR” logic from field search to “and”, for example:
index.search(query, {
index: ["title", "content"],
bool: "and"
});
Copy the code
You only get results that contain the query in both fields.
Tags
Like the key of ID, define the path of the tag:
const index = new Document({
document: {
id: "id",
tag: "tag",
index: "content"
}
});
Copy the code
index.add({
id: 0,
tag: "cat",
content: "Some content ..."
});
Copy the code
Your data can also have multiple labels as an array:
index.add({
id: 1,
tag: ["animal", "dog"],
content: "Some content ..."
});
Copy the code
You can perform searches for specific tags by:
index.search(query, {
index: "content",
tag: "animal"
});
Copy the code
This will only give you the result with the given label.
Use multiple tags when searching:
index.search(query, {
index: "content",
tag: ["cat", "dog"]
});
Copy the code
This gives the result of marking with one of the given tags.
By default, multiple labels are applied as Boolean values or. It only needs one of the tags to exist.
This is another case where the bool property is still supported. When you want to convert the default “OR” logic from tag search to “and”, for example:
index.search(query, {
index: "content",
tag: ["dog", "animal"],
bool: "and"
});
Copy the code
You only get a result that contains two tags (in this case, only one record has the tags “dog” and “animal”).
The Tag search
You can also get results from one or more tags when no query is passed:
index.search({ tag: ["cat", "dog"] });
Copy the code
In this case, the result set looks like:
[{
tag: "cat",
result: [ /* all cats */ ]
},{
tag: "dog",
result: [ /* all dogs */ ]
}]
Copy the code
Limit & Offset
By default, each query is limited to 100 entries. Borderless queries can cause problems. You need to set the limit as an option to resize.
You can set limits and offsets for each query:
index.search(query, { limit: 20, offset: 100 });
Copy the code
The size of the result set cannot be precomputed. This is a design limitation of FlexSearch. When you really need to count all the results you can page, just assign a high enough limit and return all the results, and manually apply your paging offsets (this also works on the server side). FlexSearch is fast enough that this is not a problem.
Document storage
Only document indexes can have storage. You can also do this by using a document index instead of a flat index when only the ID-Content pair is stored.
You can independently define which fields should be indexed and which fields should be stored. This allows you to index fields that should not be included in search results.
Do not use stores when an array of ids is sufficient as a result, or 2. You have stored the content/document somewhere else (outside the index).
When setting the Store property, you must include all fields that should be stored explicitly (similar to whitelists).
If the Store property is not set, the original document is stored as a standby document.
This adds the entire original content to the store:
const index = new Document({
document: {
index: "content",
store: true
}
});
index.add({ id: 0, content: "some text" });
Copy the code
Access documents from internal storage
You can get index documents from store:
var data = index.get(1);
Copy the code
You can update/change the stored content directly without changing the index by:
index.set(1, data);
Copy the code
To update the storage and index, simply use index. Update indexes. Add or index.append.
When you execute a query, whether it’s a document index or a flat index, you get an array of ids.
You can choose rich query results with stored content automatically:
index.search(query, { enrich: true });
Copy the code
Your results now look like:
[{
id: 0,
doc: { /* content from store */ }
},{
id: 1,
doc: { /* content from store */ }
}]
Copy the code
Configuring storage (recommended)
This will add specific fields from the document to the store (ID is not required in the store):
const index = new Document({
document: {
index: "content",
store: ["author", "email"]
}
});
index.add(id, content);
Copy the code
You can independently configure what should be indexed and what should be stored. It is strongly recommended that you use it wherever possible.
Here is a useful example of configuring doc and store:
const index = new Document({
document: {
index: "content",
store: ["author", "email"]
}
});
index.add({
id: 0,
author: "Jon Doe",
email: "[email protected]",
content: "Some content for the index ..."
});
Copy the code
You can query the content and will get the stored value:
index.search("some content", { enrich: true });
Copy the code
Your results now look like:
[{
field: "content",
result: [{
id: 0,
doc: {
author: "Jon Doe",
email: "[email protected]",
}
}]
}]
Copy the code
The Author and Email fields are not indexed.
The chain
The simple chain method is as follows:
Var index = FlexSearch. The create (). AddMatcher ({' a ':' a '}), add (0, 'foo'). The add (1, "bar");Copy the code
index.remove(0).update(1, 'foo').add(2, 'foobar');
Copy the code
Score context
Create the index and use the default context:
var index = new FlexSearch({
tokenize: "strict",
context: true
});
Copy the code
Create an index and apply custom options to the context:
var index = new FlexSearch({
tokenize: "strict",
context: {
resolution: 5,
depth: 3,
bidirectional: true
}
});
Copy the code
Context indexes actually only support the tag giver “strict”.
Context indexes require additional memory, depending on the depth.
Auto-balanced cache (by popularity)
You need to initialize the cache and its limits at index creation time:
const index = new Index({ cache: 100 });
Copy the code
const results = index.searchCache(query);
Copy the code
A common scenario for using caching is auto-complete or instant search as you type.
When passing a number as a limit, the cache automatically balances the stored items relative to their popularity.
When only “true” is used, the cache is unlimited and execution is actually 2-3 times faster (because there is no need to run the balancer).
Work parallel (Browser + Node.js)
The new worker model in V0.7.0 is divided into “fields” in the document (1 worker = 1 field index). In this way, the worker is able to solve the task (subtask) completely. The downside of this pattern is that they may not be perfectly balanced when storing content (fields may have different content lengths). On the other hand, there is no indication that balancing storage will bring any benefit (they all need the same amount).
When using a document index, simply apply the option “worker”:
const index = new Document({
index: ["tag", "name", "title", "text"],
worker: true
});
index.add({
id: 1, tag: "cat", name: "Tom", title: "some", text: "some"
}).add({
id: 2, tag: "dog", name: "Ben", title: "title", text: "content"
}).add({
id: 3, tag: "cat", name: "Max", title: "to", text: "to"
}).add({
id: 4, tag: "dog", name: "Tim", title: "index", text: "index"
});
Copy the code
Worker 1: { 1: "cat", 2: "dog", 3: "cat", 4: "dog" }
Worker 2: { 1: "Tom", 2: "Ben", 3: "Max", 4: "Tim" }
Worker 3: { 1: "some", 2: "title", 3: "to", 4: "index" }
Worker 4: { 1: "some", 2: "content", 3: "to", 4: "index" }
Copy the code
When you perform a field search across all fields, the task is perfectly balanced across all workers, which can independently solve their subtasks.
Index of the worker
As we saw above, the document automatically creates the worker for each field. You can also create WorkerIndex directly (similar to using Index instead of Document).
Used as an ES6 module:
import WorkerIndex from "./worker/index.js";
const index = new WorkerIndex(options);
index.add(1, "some")
.add(2, "content")
.add(3, "to")
.add(4, "index");
Copy the code
Or when using a bound version:
var index = new FlexSearch.Worker(options);
index.add(1, "some")
.add(2, "content")
.add(3, "to")
.add(4, "index");
Copy the code
Such a WorkerIndex works in much the same way as an Index instance is created.
WorkerIndex supports only asynchronous variants of all methods. This means that when you call index.search() on WorkerIndex, it will also be executed in async in the same way as index.searchAsync().
Worker thread (Node.js)
Node.js’s worker thread model is based on “worker threads” and works in exactly the same way:
const { Document } = require("flexsearch");
const index = new Document({
index: ["tag", "name", "title", "text"],
worker: true
});
Copy the code
Or create a single worker instance for a non-document index:
const { Worker } = require("flexsearch");
const index = new Worker({ options });
Copy the code
Worker asynchronous model (best practice)
A worker will always execute as async. In a query method call, you should always handle the returned promise(for example, using await) or pass a callback function as the last argument.
const index = new Document({
index: ["tag", "name", "title", "text"],
worker: true
});
Copy the code
All requests and subtasks will run in parallel (order “all completed tasks” by priority):
index.searchAsync(query, callback);
index.searchAsync(query, callback);
index.searchAsync(query, callback);
Copy the code
Also (prioritize all completed tasks):
index.searchAsync(query).then(callback);
index.searchAsync(query).then(callback);
index.searchAsync(query).then(callback);
Copy the code
Or when you only have one callback, simply use ‘promise.all ()’ when all requests are completed, which will also take precedence over ‘all completed tasks’ :
Promise.all([
index.searchAsync(query),
index.searchAsync(query),
index.searchAsync(query)
]).then(callback);
Copy the code
In the callback function of promise.all (), you also get an array of results as the first argument to each query you enter.
When using await, you can prioritize the order (priority being “first task completed”), solve requests one by one, and just process subtasks in parallel:
await index.searchAsync(query);
await index.searchAsync(query);
await index.searchAsync(query);
Copy the code
The same goes for index.add(), index.append(), index.remove(), or index.update(). There is a special case where the library is not disabled, but needs to be kept in mind when using Workers.
When you call the “synchronized” version on a working index:
index.add(doc);
index.add(doc);
index.add(doc);
// contents aren't indexed yet,
// they just queued on the message channel
Copy the code
Of course, you can do this, but remember that the main thread has no extra queues for distributed work tasks. Running these functions in a long loop internally floods the message channel with worker.postMessage(). Fortunately, browsers and Node.js will automatically handle these incoming tasks for you (as long as there’s enough free RAM available). When using the “synchronized” version on a working index, the content is not indexed on the next line, because by default all calls are treated as async.
It is recommended to use async versions and async/await to keep memory footprint low in long processes when adding/updating/removing large amounts of content (or high frequency) to an index.
Export / Import
Export
Exports changed slightly. Exports now consist of several smaller components rather than one big chunk. You need to pass a callback that takes two arguments “key” and “data”. This callback function is called by each section, for example:
index.export(function(key, data){
// you need to store both the key and the data!
// e.g. use the key for the filename and save your data
localStorage.setItem(key, data);
});
Copy the code
Exporting data to localStorage is not a good practice, but you can choose to use it regardless of size. Exports are used primarily for use in Node.js or to store indexes that you want to delegate from the server to the client.
The size of the export corresponds to the memory consumption of the library. To reduce the size of the export, you must use a configuration with less memory footprint (use the table at the bottom to get information about the configuration and its memory allocation).
When your save program runs asynchronously, you must return a Promise:
index.export(function(key, data){
return new Promise(function(resolve){
// do the saving as async
resolve();
});
});
Copy the code
You cannot export additional tables for the “FastUpdate” feature. These tables have references, and when stored, they are fully serialized and become too large. Lib will handle these issues for you automatically. When data is imported, index fastUpdate is automatically disabled.
Import
Before importing data, you need to create indexes. For document indexes, provide the same document descriptor as the one used when exporting data. This configuration is not stored in the export.
var index = new Index({ ... });
Copy the code
To import data, just pass a key and data:
index.import(key, localStorage.getItem(key));
Copy the code
You need to import each key! Otherwise, the index will not work. You need to store the key in the export and use it for the import (the order of the keys may be different).
This is just a demo and not recommended, as you may have other keys in your localStorage that are not supported as imports:
var keys = Object.keys(localStorage);
for(let i = 0, key; i < keys.length (>); i++){
key = keys[i];
index.import(key, localStorage.getItem(key));
}
Copy the code
Best Practices
Use numeric ID When adding content to an index, it is recommended to use numeric ID values as a reference. The byte length of the ID passed can significantly affect memory consumption. If this is not possible, you should consider using indexed tables and mapping ids to indexes, which becomes very important, especially when context indexes are used for large amounts of content.
When you can, try to divide content into categories and add them to your own index, for example:
var action = new FlexSearch();
var adventure = new FlexSearch();
var comedy = new FlexSearch();
Copy the code
This way, you can also provide different Settings for each category. This is actually the fastest way to perform a fuzzy search.
To make this solution more scalable, you can use a short helper:
var index = {};
function add(id, cat, content){
(index[cat] || (
index[cat] = new FlexSearch
)).add(id, content);
}
function search(cat, query){
return index[cat] ?
index[cat].search(query) : [];
}
Copy the code
Add content to index:
add(1, "action", "Movie Title");
add(2, "adventure", "Movie Title");
add(3, "comedy", "Movie Title");
Copy the code
Execute query:
var results = search("action", "movie title"); / / -- > [1]Copy the code
Partitioning indexes by category can significantly improve performance.