Use Redis to implement a lightweight search engine, awesome!

scenario

If everyone is doing the backend development, must implement the list query interface, and of course some query condition is very simple, have to a SQL, but some query conditions are extremely complex, coupled with the library table design all kinds of unreasonable, cause the query interface is particularly difficult to write, and work overtime of what not to mention the (I wonder if you have the feeling ~).

Let’s start with an example. This is the search criteria of a shopping website. If you implement such a search interface, how will you implement it? (Of course you said you could do it with a search engine like Elasticsearch. But what I’m saying is, what if you had to do it yourself?)

As you can see from the figure above, the search is divided into six categories, each subcategory. In the middle, the intersection of various categories of conditions is taken, and there are single selection, multiple selection, and custom cases in each subclass, and the final output meets the conditions of the result set.

Ok, now that the requirements are clear, let’s start implementing them. Follow the public account of Java technology stack reply interview, you can get organized Redis series of interview questions and all the answers.

To achieve 1

The first entry is A student, he is writing SQL “expert”. Small A confidently said: “is not A query interface? It looks like a lot of conditions, but with my rich SQL experience, this is not difficult for me.”

The result is the following code (MYSQL as an example) :

select ... from table_1
left join table_2
left join table_3
left join (select ... from table_x where ...) tmp_1
...
where ...
order by ...
limit m,n
Copy the code

The code ran through the test environment and seemed to match, so it was ready to be preshipped. This advance, the problem began to reveal.

In order to make the online environment as realistic as possible, the amount of data is naturally much larger than the test. So such a complex SQL, its execution efficiency can be imagined. Test students decisively to the small A code to hit back.

The 2

Summarized the lessons of small A failure, small B began to optimize SQL, first through the EXPLAIN keyword SQL performance analysis, the index of the place are added to the index. At the same time, a complex SQL is split into multiple SQL, and the calculation results are calculated in the program memory. This article explains the most complete summary, recommended to read.

The pseudocode is as follows:

$result_1 = query('select ... from table_1 where ... '); $result_2 = query('select ... from table_2 where ... '); $result_3 = query('select ... from table_3 where ... '); . $result = array_intersect($result_1, $result_2, $result_3, ...) ;Copy the code

This solution was significantly better in terms of performance than the first solution, but at the time of feature acceptance, the product manager still felt that the query was not fast enough. MySQL implements a simple version of the search engine.

Small B also knows that every query will query the database for many times, and some historical reasons, part of the conditions are not able to do a single table query, so the query waiting time is unavoidable.

Implement 3

Little C sees room for optimization from the above scheme. He found that little B had no problem in thinking. He split the complex conditions, calculated the result sets of each sub-dimension, and finally combined all the sub-result sets to get the final result he wanted.

So he wondered if it would be possible to cache the result sets of each subdimension in advance, so that when querying, he could directly fetch the subset he wanted instead of checking the library for calculation every time.

Here little C uses Redis to store cached data, the main reason for using it is that it provides a variety of data structures, and it is very easy to do the union operation of sets in Redis.

The specific scheme is shown in the figure:

Here, each condition stores the calculated result Set ID into the corresponding key in advance, and the data structure chosen is Set. The query operations include:

Single subclass: directly obtain the corresponding result set according to the condition key;
Subclass multiple selection: Perform the union operation according to multiple condition keys to obtain the corresponding result set.
Final result: The final result is obtained by the intersection operation of all subclass result sets;

This is called a reverse index.

And you’ll notice that there’s a price condition missing. We know from demand that the price condition is a range and is infinite. Therefore, the above key-value method of exhaustive conditions cannot be done. Here, we use another data structure of Redis, Sorted Set:

Add all goods into the ordered set with Key as price, value as commodity ID, and the score corresponding to each value is the value of commodity price. In this way, the ZRANGEBYSCORE command can be used to obtain the corresponding result set according to the score (price) interval in the ordered set of Redis.

At this point, the optimization of scheme 3 has been completed, and the data query and calculation have been separated by means of caching. In each search, the result can be obtained by simply searching Redis several times. The query speed meets the acceptance requirements.

extension

paging

Here you may have found a serious flaw in how list queries can be paginated. Yes, we’ll see how Redis implements paging in a minute. Follow the public account of Java technology stack reply interview, you can get organized Redis series of interview questions and all the answers.

Paging is mostly about sorting, so for simplicity, let’s use creation time as an example.

As shown in the figure:

In the figure, the blue part is the ordered set of goods with the creation time as the score value, and the result set below the blue is the result obtained by the conditional calculation. Through the ZINTERSTORE command, the centralized weight of the result is 0, and the product time result is 1. The result set obtained from the intersection is given a new ordered set with the creation time score value. An operation on a new result set yields the various data required for paging:

Total number of pages: ZCOUNT command
Current page contents: ZRANGE command
In reverse order: ZREVRANGE command

Data update

There are two ways to update index data. One is through the modification of commodity data, to trigger the update operation immediately, the other is through the timing script for batch update.

Note here that the index content update, if the violence of the delete Key, then reset the Key. Since the two operations in Redis will not be atomic, there may be a blank gap between them. It is suggested to remove only invalid elements in the set and add new elements.

Performance optimization

Redis is a memory-level operation, so a single query is fast. However, if there are multiple Redis operations in our implementation, the Redis multiple connection time may be unnecessary time consumption. By using MULTI command, a transaction is started, multiple operations of Redis are placed in one transaction, and atomic execution is performed by EXEC (note: transactions are only performed in one connection and are not rolled back if the execution fails).

conclusion

This is just a simple Demo of using Redis to optimize query search, which is lighter and cheaper to learn than existing open source search engines. Second, some of its ideas are similar to those of open source search engines, and if word parsing is added, full-text search functions can also be achieved.