Python Elasticsearch queries, filters, and aggregates DSL instances

Column address: github.com/yongxinz/te…

At the same time, you are also welcome to follow my wechat public account AlwaysBeta, more exciting content waiting for you.

Basic concepts of Elasticsearch

Index: Logical area used by Elasticsearch to store data. It is similar to the concept of database in relational databases. An index can reside on one or more Shards, and a shard may have multiple Replicas.

Document: Entity data stored in Elasticsearch, similar to a row of data in a table in relational data.

Document is composed of multiple fields, and the fields with the same name in different documents must have the same type. So you can have multiple fields in document, so you can have multiple values in a field, so multivalued.

Document type: For query purposes, an index may have multiple documents, that is, Document type. It is similar to the table concept in relational databases. But notice that the field of the same name in different documents must be of the same type.

Mapping: This is similar to the concept of schema definitions in relational databases. Store field mapping information. Different document types have different mapping.

Here is a comparison of terms for ElasticSearch and relational databases:

Relationnal database	Elasticsearch
Database	Index
Table	Type
Row	Document
Column	Field
Schema	Mapping
Schema	Mapping
Index	Everything is indexed
SQL	Query DSL
SELECT * FROM table…	GET http://…
UPDATE table SET	PUT http://…

Introduction to the Python Elasticsearch DSL

Connect the Es:

import elasticsearch

es = elasticsearch.Elasticsearch([{'host': '127.0.0.1'.'port': 9200}])
Copy the code

Size specifies the number, from_ specifies the starting position, and filter_path specifies the data to be displayed. In this example, only _id and _type are displayed in the final result.

res_3 = es.search(index="bank", q="Holmes", size=1, from_=1)
res_4 = es.search(index="bank", q="39225, 5686", size=1000, filter_path=['hits.hits._id'.'hits.hits._type'])
Copy the code

Query all data at the specified index:

Where index specifies an index, and the string represents an index. For example, index=[“bank”, “banner”, “country”]; In the regular form, multiple indexes that meet the conditions are displayed. For example, index=[“apple*”], which indicates all indexes starting with apple.

You can also specify a specific doc-type in search.

from elasticsearch_dsl import Search

s = Search(using=es, index="index-test").execute()
print s.to_dict()
Copy the code

Multiple search conditions can be added based on a certain field:

s = Search(using=es, index="index-test").query("match", sip="192.168.1.1")
s = s.query("match", dip="192.168.1.2 instead")
s = s.excute()
Copy the code

Multi-field query:

from elasticsearch_dsl.query import MultiMatch, Match

multi_match = MultiMatch(query='hello', fields=['title'.'content'])
s = Search(using=es, index="index-test").query(multi_match)
s = s.execute()

print s.to_dict()
Copy the code

You can also do multi-field queries with Q() objects, where fields is a list and query is the value to be queried.

from elasticsearch_dsl import Q

q = Q("multi_match", query="hello", fields=['title'.'content'])
s = s.query(q).execute()

print s.to_dict()
Copy the code

The first argument to Q() is the query method, which can also be a bool.


q = Q('bool', must=[Q('match', title='hello'), Q('match', content='world')])
s = s.query(q).execute()

print s.to_dict()
Copy the code

A combined query with Q() is equivalent to another way of writing the query above.

q = Q("match", title='python') | Q("match", title='django')
s = s.query(q).execute()
print(s.to_dict())
# {"bool": {"should": [...] }}

q = Q("match", title='python') & Q("match", title='django')
s = s.query(q).execute()
print(s.to_dict())
# {"bool": {"must": [...] }}

q = ~Q("match", title="python")
s = s.query(q).execute()
print(s.to_dict())
# {"bool": {"must_not": [...] }}
Copy the code

Filter, here is the range filter, range is the method, timestamp is the name of the field to query, gte is greater than or equal to, lt is less than, set as needed.

The difference between term and match is that term is an exact match, match will be blurred, will be segmented, and will return the match score. (If term is a lowercase string, it will return null and no match. If match is case insensitive, it will return the same result)

# Range query
s = s.filter("range", timestamp={"gte": 0."lt": time.time()}).query("match", country="in")
# Common filter
res_3 = s.filter("terms", balance_num=["39225"."5686"]).execute()
Copy the code

Other ways to write:

s = Search()
s = s.filter('terms', tags=['search'.'python'])
print(s.to_dict())
# {'query': {'bool': {'filter': [{'terms': {'tags': ['search', 'python']}}]}}}

s = s.query('bool', filter=[Q('terms', tags=['search'.'python'])])
print(s.to_dict())
# {'query': {'bool': {'filter': [{'terms': {'tags': ['search', 'python']}}]}}}
s = s.exclude('terms', tags=['search'.'python'])
# or
s = s.query('bool', filter=[~Q('terms', tags=['search'.'python'])])
print(s.to_dict())
# {'query': {'bool': {'filter': [{'bool': {'must_not': [{'terms': {'tags': ['search', 'python']}}]}}]}}}
Copy the code

Aggregations can be superimposed after queries, filters, and other operations, requiring aggs.

The bucket is the group, where the first parameter is the group name, the second parameter is the method, and the third parameter is the specified field.

The same is true for metric methods, such as sum, avg, Max, min, and so on, but it should be noted that there are two methods that can return these values at once, stats and extended_STATS, which can also return variance equivalences.

Example # 1
s.aggs.bucket("per_country"."terms", field="timestamp").metric("sum_click"."stats", field="click").metric("sum_request"."stats", field="request")

Example # 2
s.aggs.bucket("per_age"."terms", field="click.keyword").metric("sum_click"."stats", field="click")

Example # 3
s.aggs.metric("sum_age"."extended_stats", field="impression")

Example # 4
s.aggs.bucket("per_age"."terms", field="country.keyword")

# Example 5, this aggregation is based on interval
a = A("range", field="account_number", ranges=[{"to": 10}, {"from": 11."to": 21}])

res = s.execute()
Copy the code

Execute () is still executed, but the s.agos operation is not accepted as a variable (for example, res= s.agos, which is incorrect). The aggregated result is stored in the RES for display.

The sorting

s = Search().sort(
    'category'.'-title',
    {"lines" : {"order" : "asc"."mode" : "avg"}})Copy the code

paging

s = s[10:20]
# {"from": 10, "size": 10}
Copy the code

Some extension methods, interested students can see:

s = Search()

# Set extended properties using the '.extra() 'method
s = s.extra(explain=True)

# Set parameters using '.params() '
s = s.params(search_type="count")

To restrict the fields returned, use the 'source()' method
# only return the selected fields
s = s.source(['title'.'body'])
# don't return any fields, just the metadata
s = s.source(False)
# explicitly include/exclude fields
s = s.source(include=["title"], exclude=["user.*"])
# reset the field selection
s = s.source(None)

# Serialize a query with dict
s = Search.from_dict({"query": {"match": {"title": "python"}}})

Modify an existing query
s.update_from_dict({"query": {"match": {"title": "python"}}, "size": 42})
Copy the code

Reference documents:

Fingerchou.com/2017/08/12/…

Fingerchou.com/2017/08/13/…

Blog.csdn.net/JunFeng666/…

Python Elasticsearch queries, filters, and aggregates DSL instances

Basic concepts of Elasticsearch

Introduction to the Python Elasticsearch DSL

Related Posts

Interviewer: Please stop asking me what deadlock is.

Blog feature: get a list of adjacent associated articles according to article categories

Spring integrates dubbo annotation + new admin tutorial