Writing in the front
Recently, I tested SPTAG, a near-neighbor search tool, and decided to use something to test my hand. Then I thought of creating an index of the previous beauty images to see if I could find the desired beauty images.
The following environments are required:
- python3.7
- SPTAG (Docker version)
- MongoDB
The code is at github.com/nladuo/MMFi…
Ready to pictures
First of all, get ready the pictures of beautiful women. Here are some pictures from a long time ago, nearly 10,000 in total.
If not, here are the download links: Google Drive drive.google.com/file/d/1shZ… . Unzip password: nladuo.
Image filtering
Then the next is to do some preprocessing of the image, here I mainly intend to do is not the whole image search, the search of the whole image will be more troublesome. I’m just looking for faces here.
Here, through the library of Face_Recognition, the picture with only one face is saved, and the face is extracted, and then put into mongodb.
cd MMFinder/data_preprocess
python3 filter_images.py
Copy the code
I filter it out here9477
image
Characteristics of the engineering
The next step is to convert the picture into a vector, which is the feature project. Because I do not do the image itself, so not very understand the specific method, roughly checked, the traditional method has SIFT, now is the neural network.
For a general graph, use the vector trained by ImageNet to extract the last layer as the feature. Here I found a specialized face recognition for a pre-training model: sefiks.com/2018/08/06/… , the accuracy should be higher.
The model is VGG, and the last layer is a vector with dimension 2622. Through processing, all faces are converted into a vector of 2622 length, and then saved to mongodb.
cd MMFinder/data_preprocess
python3 feature_extraction.py
Copy the code
indexing
The next step is to build the nearest neighbor index, also called denser index. In text search engines, we generally use TF-IDF, which is sparse index, or inverted index. The denser index is generally the index of the graph structure.
Why do you want indexes?
It’s really just to speed up the search, because normally, if we were to search for a graph, we would actually have to run through all of the 9,477 vectors in the database to get an accurate similarity ranking. It’s order N time, which doesn’t look too high, but every time you search, it’s order N, and if you have 100 million images, I don’t know what year.
Once you have an index, you can guarantee that the search results will return in a fixed time of O(C), if it is slow, just add the machine.
Installation of the index system
For the selection of indexes, I use SPTAG here because it is an efficient and extensible nearest neighbor search system. A similar system is Facebook’s Faiss. See my previous article: Installing and testing SPTAG under Docker
If you want to run and play, you can also look at the KDTree, LSH and other approximate nearest neighbor search methods in Scikit-Learn.
indexing
After installing the SPTAG under Docker, all image data is exported to the input format specified by the SPTAG.
cdMMFinder/index_construction python3 export_SPTAG_indexbuilder_input.py docker cp mm_index_input.txt (your container ID) :/app/Release/Copy the code
Then the exported data is put into the SPtag-Docker container and indexed by indexBuilder.
Docker attach (your container ID)./ indexBuilder -d 2622 -v Float -i./mm_index_input. TXT -o data/mm_index -a BKT -t 2Copy the code
After the index is established, the RPC service for the search is started.
python3 SPTAG_rpc_search_service.py
Copy the code
Query test
Now it’s time to go to the query test, and if you’re a MAC user you can install imgCat and view pictures from the command line.
python3 search_test.py
Copy the code
Here’s what it looks like. It feels ok
Front-end Demo
Finally, integrate each module, write upload, search interface, form a complete application Demo.
cd web_demo
python3 main.py
Copy the code
The effect is as follows:
(The beauty pictures are easy to block, here is another set of images as an alternative)