An overview of the
- In order to make the similar image retrieval scene of “search by image”, a search by image system is designed based on ES vector index calculation and image feature extraction model VGG16.
- Open source: github.com/thirtyonele…
Retrieve the scene
- Reasoning process: the image is read and the algorithm generates feature vectors
- Feature storage: The feature vector is stored in ES
- Retrieval process: on-line real-time vector retrieval
- The specific process is as follows:
ES vector index
- Dense Vector: Stores Dense vectors as single-valued field arrays with maximum length of 2048 and different array lengths for each document
- Sparse Vector: stores Sparse vectors as non-nested json objects. Key is the location of the Vector, that is, a string of type integer, ranging from [0,65535], and value is the Vector value. However, sparse vectors are not supported after version 7.6, so please use them with caution
ES retrieval implementation
- Provide cosine, Manhattan, Euclian and dot product four distance methods, the specific code is as follows:
# cosine distance script_query = {" script_score ": {" query" : {" match_all ": {}}," script ": {" source" : "CosineSimilarity (params.query_vector, doc['image_vector']) + 1.0", "params": {"query_vector": Query_vector}}}} # Manhattan distance script_query = {" script_score": {" query": {"match_all": {}}, "script": {" source": "1 / (1 + l1norm(params.queryVector, doc['image_vector']))", "params": { "queryVector": Query_vector}}}} # Euclidean distance script_query = {" script_score": {" query": {"match_all": {}}, "script": {" source": "1 / (1 + l2norm(params.queryVector, doc['image_vector']))", "params": { "queryVector": Query_vector}}}} # DotProduct implement script_query = {" script_score": {" query": {"match_all": {}}, "script": { "source": """ double value = doc['image_vector'].size() == 0 ? 0 : dotProduct(params.query_vector, doc['image_vector']); return value; """, "params": {"query_vector": query_vector} } } } response = self.client.search( index=self.index_name, body={ "size": search_size, "query": script_query, "_source": {"includes": ["id", "name", "face_vector"]} } )Copy the code
ES server installation
docker run -it -d -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" Docker. Elastic. Co/elasticsearch/elasticsearch: 7.5.0Copy the code
Introduction to operation
- Download the project source code: github.com/thirtyonele…
- Operation 1: Build the base index
Python index.py --train_data: specifies the path to the training images folder. The default path is' <ROOT_DIR>/data/train '--index_file: Custom index file storage path, default is' <ROOT_DIR>/index/train.h5 'Copy the code
- Operation two: Use similarity search
Python Retrieval. Py --engine=es --test_data: Custom test image details address, default '<ROOT_DIR>/data/test/001_accordion_image_0001.jpg' --index_file: H5 '--db_name: specifies the ES or Milvus index name. The default is' image_retrieval' --engine: User-defined search engine type. The default search engine type is' numpy '. The options are numpy, FAiss, ES, or MilvusCopy the code
conclusion
- Extend ElasticSearch’s ability to make it support vector retrieval
- Easy to take advantage of ElasticSearch’s distributed and extensible capabilities
- ElasticSearch query functions and other plug-ins make it easy to extend the search for other dimensions
- ES vector calculation is linear scan, time-consuming and the number of documents, hardware performance positive correlation, please verify before use
That’s all!