Image vector similarity Retrieval service (3) - Based on ES

An overview of the

In order to make the similar image retrieval scene of “search by image”, a search by image system is designed based on ES vector index calculation and image feature extraction model VGG16.
Open source: github.com/thirtyonele…

Retrieve the scene

Reasoning process: the image is read and the algorithm generates feature vectors
Feature storage: The feature vector is stored in ES
Retrieval process: on-line real-time vector retrieval
The specific process is as follows:

ES vector index

Dense Vector: Stores Dense vectors as single-valued field arrays with maximum length of 2048 and different array lengths for each document
Sparse Vector: stores Sparse vectors as non-nested json objects. Key is the location of the Vector, that is, a string of type integer, ranging from [0,65535], and value is the Vector value. However, sparse vectors are not supported after version 7.6, so please use them with caution

ES retrieval implementation

Provide cosine, Manhattan, Euclian and dot product four distance methods, the specific code is as follows:

# cosine distance script_query = {" script_score ": {" query" : {" match_all ": {}}," script ": {" source" : "CosineSimilarity (params.query_vector, doc['image_vector']) + 1.0", "params": {"query_vector": Query_vector}}}} # Manhattan distance script_query = {" script_score": {" query": {"match_all": {}}, "script": {" source": "1 / (1 + l1norm(params.queryVector, doc['image_vector']))", "params": { "queryVector": Query_vector}}}} # Euclidean distance script_query = {" script_score": {" query": {"match_all": {}}, "script": {" source": "1 / (1 + l2norm(params.queryVector, doc['image_vector']))", "params": { "queryVector": Query_vector}}}} # DotProduct implement script_query = {" script_score": {" query": {"match_all": {}}, "script": { "source": """ double value = doc['image_vector'].size() == 0 ? 0 : dotProduct(params.query_vector, doc['image_vector']); return value; """, "params": {"query_vector": query_vector} } } } response = self.client.search( index=self.index_name, body={ "size": search_size, "query": script_query, "_source": {"includes": ["id", "name", "face_vector"]} } )Copy the code

ES server installation

docker run -it -d -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" Docker. Elastic. Co/elasticsearch/elasticsearch: 7.5.0Copy the code

Introduction to operation

Download the project source code: github.com/thirtyonele…
Operation 1: Build the base index

Python index.py --train_data: specifies the path to the training images folder. The default path is' <ROOT_DIR>/data/train '--index_file: Custom index file storage path, default is' <ROOT_DIR>/index/train.h5 'Copy the code

Operation two: Use similarity search

Python Retrieval. Py --engine=es --test_data: Custom test image details address, default '<ROOT_DIR>/data/test/001_accordion_image_0001.jpg' --index_file: H5 '--db_name: specifies the ES or Milvus index name. The default is' image_retrieval' --engine: User-defined search engine type. The default search engine type is' numpy '. The options are numpy, FAiss, ES, or MilvusCopy the code

conclusion

Extend ElasticSearch’s ability to make it support vector retrieval
Easy to take advantage of ElasticSearch’s distributed and extensible capabilities
ElasticSearch query functions and other plug-ins make it easy to extend the search for other dimensions
ES vector calculation is linear scan, time-consuming and the number of documents, hardware performance positive correlation, please verify before use

That’s all!

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Image vector similarity Retrieval service (3) — Based on ES

An overview of the

Retrieve the scene

ES vector index

ES retrieval implementation

ES server installation

Introduction to operation

conclusion

Image vector similarity Retrieval service (3) — Based on ES

An overview of the

Retrieve the scene

ES vector index

ES retrieval implementation

ES server installation

Introduction to operation

conclusion

Related Posts

What is HALIDE

And the cover paper of Nature, chatting about the scientific research story of the movement

Good programmer big data documentation: HBase knowledge points summary