Writing in the front
Get started with Django REST Framework
The FULLTEXT INDEX of Mysql can be used directly for FULLTEXT retrieval in a project. Today, we will continue our discussion about Whoosh, which is used in a Django project
The code in this article is developed based on the Django REST Framework (1)
Extension project
Django-haystack is a third-party django search app that supports multiple search engines such as Solr, Elasticsearch, Whoosh, Xapian, and jieba, a well-known Chinese natural language processing library. Can provide a good effect of the text search system.
Configuration haystack
This is a Django REST framework based project to continue to configure haystack, so drF-Haystack is installed
1. Install dependencies
pipenv install drf-haystack whoosh jieba
Copy the code
2. Configure the project
- Create the file article/models.py
from django.db import models
class Article(models.Model) :
creator = models.CharField(max_length=50, null=True, blank=True)
tag = models.CharField(max_length=50, null=True, blank=True)
title = models.CharField(max_length=50, null=True, blank=True)
content = models.TextField()
Copy the code
- Create the article/search_indexes. Py file
from .models import Article
from haystack import indexes
class ArticleIndex(indexes.SearchIndex, indexes.Indexable) :
text = indexes.CharField(document=True, use_template=True)
title = indexes.CharField(model_attr="title")
content = indexes.CharField(model_attr="content")
tag = indexes.CharField(model_attr="tag")
creator = indexes.CharField(model_attr="creator")
id = indexes.CharField(model_attr="pk")
autocomplete = indexes.EdgeNgramField()
@staticmethod
def prepare_autocomplete(obj) :
return "".join((
obj.title,
))
def get_model(self) :
return Article
def index_queryset(self, using=None) :
return self.get_model().objects.all(a)Copy the code
- Create article/serializers. Py
from rest_framework import serializers
from drf_haystack.serializers import HaystackSerializer
from .search_indexes import ArticleIndex
from .models import Article
class ArticleSerializer(serializers.ModelSerializer) :
class Meta:
model = Article
fields = '__all__'
class ArticleHaystackSerializer(HaystackSerializer) :
def update(self, instance, validated_data) :
pass
def create(self, validated_data) :
pass
class Meta:
index_classes = [ArticleIndex]
fields = ['title'.'creator'.'content'.'tag']
Copy the code
- Create the file article/urls.py
from django.conf.urls import url, include
from rest_framework import routers
from . import views
router = routers.DefaultRouter()
# router.register('article', views.ArticleViewSet)
router.register("article/search", views.ArticleSearchView, basename='article-search')
urlpatterns = [
url(r'^', include(router.urls)),
]
Copy the code
- Create the article/views.py file
from .models import Article
from rest_framework import viewsets
from .serializers import ArticleSerializer, ArticleHaystackSerializer
from drf_haystack.viewsets import HaystackViewSet
class ArticleSearchView(HaystackViewSet) :
index_models = [Article]
serializer_class = ArticleHaystackSerializer
class ArticleViewSet(viewsets.ModelViewSet) :
""" API path that allows users to view or edit. "" "
queryset = Article.objects.all()
serializer_class = ArticleSerializer
Copy the code
New project settings.py
# specify the search engine
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.whoosh_backend.WhooshEngine'.'PATH': os.path.join(BASE_DIR, 'whoosh_index'),}}# specify how to paginate search results (10 results per page, 20 results per page by default
HAYSTACK_SEARCH_RESULTS_PER_PAGE = 10
When the database changes, the index will be automatically updated, very convenient
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
Copy the code
Add the project to INSTALLED_APPS
INSTALLED_APPS = [
...
'haystack',
'article'
]
Copy the code
3. Configure data
- Generate data table
#Generate database modeling statements
python manage.py makemigrations
#Execute modeling statement
python manage.py migrate
Copy the code
- Initialize data
INSERT INTO article_article (creator, tag, title, content)
VALUES ('admin'.'Modern Poetry'.'如果'.'I'll never think of you again in this life except in some night wetted with tears if you will'),
('admin'.'Modern Poetry'.'love'.'One day the signpost changes I hope you take it easy one day the pier breaks I hope you cross one day the beams fall I hope you stay strong one day expectations wither I hope you understand'),
('admin'.'Modern Poetry'.'Far and near'.'You look at me and you look at the clouds and I think you look at me very far away and you look at the clouds very close'),
('admin'.'Modern Poetry'.Chapter 'off'.'You stand on the bridge and look at the scenery. The scenic man looks at you from upstairs. The moon adorns your window, you adorn someone else's dream. '),
('admin'.'Modern Poetry'.'language alone'.'I pour out my thoughts to you like a statue of stone silence should not be if silence is your sorrow you know it hurts the most');
Copy the code
4. Build indexes
#The new file
templates/search/indexes/article/article_text.txt
{{ object.title }}
{{ object.tag }}
{{ object.content }}
{{ object.creator }}
#Create indexes
$ python manage.py rebuild_index
WARNING: This will irreparably remove EVERYTHING from your search index in connection 'default'.
Your choices after this are to restore from backups or rebuild via the `rebuild_index` command.
Are you sure you wish to continue? [y/N] y
Removing all documents from your index because you said so.
All documents removed.
Indexing 5 articles
Copy the code
5. View the result
Using curl to verify results, support multi-condition query
$ curl -H 'Accept: application/json; indent=4'-u admin: admin at http://127.0.0.1:8000/api/article/search/\? Content__contains \ = tears{" count ": 1, the" next ", null, "previous", null, "results" : [{" title ":" if ", "content" : "Never in this life will I think of you again \n except \n except on some \n nights \n wet with tears \n if you will "," Tag ": "Modern poetry "," Creator ": "admin"}]}
$ curl -H 'Accept: application/json; indent=4'-u admin: admin at http://127.0.0.1:8000/api/article/search/\? Content__contains \= tears \&title\= If{" count ": 1, the" next ", null, "previous", null, "results" : [{" title ":" if ", "content" : "Never in this life will I think of you again \n except \n except on some \n nights \n wet with tears \n if you will "," Tag ": "Modern poetry "," Creator ": "admin"}]}Copy the code
Change the word segmentation tool
Since the default word segmentation tool does not fully support Chinese, it can be changed to jieba word segmentation tool
1. Modify the whoosh configuration file
Generic files are found in site-Packages in the current Python environment
#1. Copy 'haystack/backends/whoosh_backends.py' to the current./article
#2. Search for and modify the SettingsChange all StemmingAnalyzer to ChineseAnalyser...#Note that you need to find this first and then modify it, rather than add it directly
schema_fields[field_class.index_fieldname] = TEXT(stored=True, analyzer=ChineseAnalyzer(),field_boost=field_class.boost, sortable=True)
Copy the code
2. Modify Settings configuration files
Change the HAYSTACK configuration only
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'article.whoosh_backend.WhooshEngine'.'PATH': os.path.join(BASE_DIR, 'whoosh_index'),}}Copy the code
3. Generate indexes again
$ python manage.py rebuild_indexWARNING: This will irreparably remove EVERYTHING from your search index in connection 'default'. Your choices after this are to restore from backups or rebuild via the `rebuild_index` command. Are you sure you wish to continue? [y/N] y Removing all documents from your index because you said so. All documents removed. Indexing 5 articles Building prefix dict from the default dictionary ... Loading the model from the cache/var/folders/st/b16fyn3s57x_5vszjl599njw0000gn/T/jieba cache Loading model cost 0.764 seconds. Prefix dict has been built successfully.Copy the code
4. Verify the results
Results can also search to the desired results
$ curl -H 'Accept: application/json; indent=4'-u admin: admin at http://127.0.0.1:8000/api/article/search/\? Content__contains \ = scenery{" count ": 1, the" next ", null, "previous", null, "results" : [{" title ":" poems ", "content" : "You stand on the bridge and look at the scenery, \n look at the scenery people look at you from upstairs. \n Bright moon decorates your window, \n You decorate someone else's dream. "," Tag ": "Modern poetry "," Creator ": "admin"}]}Copy the code
Project directory
Let’s have a bucket for the project
$tree | grep -v pyc. ├ ─ ─ Pipfile ├ ─ ─ Pipfile. Lock ├ ─ ─ article │ ├ ─ ─ migrations │ │ ├ ─ ─ 0001 _initial. Py │ │ ├ ─ ─ just set py │ ├ ─ ─ models. Py │ ├ ─ ─ search_indexes. Py │ ├ ─ ─ serializers. Py │ ├ ─ ─ urls. Py │ ├ ─ ─ views. Py │ └ ─ ─ whoosh_backend. Py ├ ─ ─ demo │ ├ ─ ─ Just set py │ ├ ─ ─ asgi. Py │ ├ ─ ─ serializers. Py │ ├ ─ ─ Settings. Py │ ├ ─ ─ urls. Py │ ├ ─ ─ views. Py │ └ ─ ─ wsgi. Py ├ ─ ─ Manu.py ├── search │ ├─ article │ ├─ manu.txt ├─ ├─ MAIN_WRITELOCK ├─ main_ox1iJ98muwsyw2qv. seg ├─ _main_1.tocCopy the code
conclusion
So far, we have two different implementations of full-text search. For simple projects, mysql can be used to solve the problem. There are no language restrictions. If your project happens to be Django development, whoosh + Jieba is also a good choice
The resources
- Haystack
- Whoosh