Introduction to the
I recently found an interesting open source project called Sonic on Github. The description of the Sonic project is very simple.
🦔 Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.
In this paragraph, we can quickly learn about sonic’s features.
First of all, it is fast, much faster than Elasticsearch, and in the official benchmark it searches in milliseconds.
Elasticsearch has become heavier and heavier over the years, embracing big data as well as search, storage, analysis and visualization, making the learning curve of Elasticsearch very high and the cost of using it very high. Normal machines are completely inadequate and Sonic is very light, fast, with few apis and a focus on search.
Third, schema-less. (Elasticsearch) (string name) (string name) (string name) (string name) (string name) (string name) (string name) Most of the time, it is the database’s job to define a fixed structure to store data, but Elasticsearch supports data stores, so you must complete this step before you can use Elasticsearch. While Sonic is paradigmless, sonic does not store data, it only does search, so you do not need to do mappings.
Fourth, save money. In the development and operation of any real project, cost is often the first priority, and Sonic can save you a lot of money with its low runtime requirements and low memory footprint.
Having said that, would you like to try Sonic, too? Next, let’s practice it and see if we can get a glimpse of the leopard.
use
The installation
First of all, Sonic doesn’t support Windows, so the best way to use it is docker, so make sure you know how to use Docker, just a few concepts.
Type the following command on the terminal:
Docker pull valeriansaliou/sonic: v1.2.0Copy the code
Wait a little while, Docker will take care of everything for us, once pulled, we need a simple sonic configuration file – config.cfg. The configuration file is as follows:
# Sonic
# Fast, lightweight and schema-less search backend
# Configuration file
# Example: https://github.com/valeriansaliou/sonic/blob/master/config.cfg
[server]
log_level = "debug"
[channel]
inet = "0.0.0.0:1491"
tcp_timeout = 300
auth_password = "SecretPassword"
[channel.search]
query_limit_default = 10
query_limit_maximum = 100
query_alternates_try = 4
suggest_limit_default = 5
suggest_limit_maximum = 20
[store]
[store.kv]
path = "/var/lib/sonic/store/kv/"
retain_word_objects = 1000
[store.kv.pool]
inactive_after = 1800
[store.kv.database]
flush_after = 900
compress = true
parallelism = 2
max_files = 100
max_compactions = 1
max_flushes = 1
write_buffer = 16384
write_ahead_log = true
[store.fst]
path = "/var/lib/sonic/store/fst/"
[store.fst.pool]
inactive_after = 300
[store.fst.graph]
consolidate_after = 180
Copy the code
There are only two things you might want to notice in this configuration file:
- Inet, Sonic’s listening port, this Rimmer thinks
"0.0.0.0:1491"
. - Auth_password, Sonic’s password, which Rimmer thinks
"SecretPassword"
.
Sonic has chosen the more efficient TCP protocol for communication, and has developed its own scripting language, which is no more than a few simple query operations.
Please send the configuration file stored in an appropriate storage location, such as the storage location in/Users/pedro/Desktop/sonic – test/config. CFG.
We start a Sonic service by entering the following command at the terminal:
Docker run - p - 1491-1491 v ~ / Desktop/sonic - test/config. The CFG: / etc/sonic. The CFG valeriansaliou/sonic: v1.2.0Copy the code
Wait for a while. If the following information is displayed on the terminal, the operation is successful:
(INFO) - starting up (INFO) - started (DEBUG) - spawn managed thread: tasker (DEBUG) - spawn managed thread: Channel (INFO) -tasker is now active (INFO) -Listening on TCP ://0.0.0.0:1491Copy the code
concept
It’s important to understand how Sonic works before we go into the actual data manipulation. Remember, this is important because knowing it will give you a clear enough picture to be able to get a glimpse of the big picture.
Sonic’s operation can be divided into three modes:
- Sonic’s Search mode is very hardcore. In Search mode, you can only perform Search operations, not data insertion and backup operations. The core of a
QUERY
andSUGGEST
Two operations, one for theword
Search and pairword
Perform completion. - Ingest mode, remember that
Sonic can only insert data in insert mode
. Sonic’s data insert core has three operations, respectivelyPUSH
,POP
andFLUSH
. Push adds an element to the store, POP pops it out of the store, and Flush clears the store entirely. - Sonic can consolidate, backup, and restore data in Control mode. The core operations are
TRIGGER
andINFO
Trigger consolidates, backs up, and restores data, while Info is used to check sonic’s operating status.
Earlier we talked about the sonic protocol, which we call sonic Channel Protocol. This protocol is built on top of TCP, and if you are familiar with Redis, you may notice that the two are very similar.
It is not hard to find that the core concepts and usage of Sonic are quite simple. Of course, I cannot pull out all of them here. In the documentation for Sonic, the detailed details and practical methods of sonic Channel Protocol are given. If you are interested, be sure to check it out.
operation
Once Sonic’s service is up and running, let’s use Telnet as a useful tool to operate it.
Enter:
telnet localhost 1491
Copy the code
If the following information is displayed, the connection is successful.
Trying ::1...
Connected to localhost.
Escape character is '^]'. CONNECTED < sonic - server v1.2.0 >Copy the code
Before we can actually insert, we need to do a brief overview of Sonic’s storage. At the beginning of the article, I said that Sonic only focuses on search, leaving data storage to other databases. So does Sonic really not need storage?
The answer is obvious, yes! Is this cheating? Of course not, Sonic does not store data, but it does need to index and store part of the searched data. You might think it’s a little convoluted, but that’s okay. Let’s do an example.
An article may have a title, summary, body, author… And a series of data. Therefore, when searching this article, it is impossible for us to search all the field data. We usually take a compromise way to search the data of a few fields. For example, we search for summaries and headlines instead of searching for large body data, which both improves search efficiency and reduces search costs.
It does not store all the fields of the article, i.e. title, summary, body, author, etc., but it does need to store some of the data that it uses for searching, i.e. summary and title. The summary and title are a tiny fraction of the huge amount of data that stores all the fields.
Okay, here we go! How does Sonic store this valid search data? Sonic has two storage points, one for KV storage and one for FST storage. Kv storage is well understood as key-value storage. We need to combine the summary and title into a single value and give it a unique key. This key usually corresponds to the primary key of the database.
I think a lot of people are a little confused about combining the summary and title into one value. How can you search when you combine them? Don’t be afraid, sonic will automatically help us do word segmentation, and stored by means of inverted index, when you are in the word search, in general will only take a few words to do search, but not all, so even if combined, the impact is not big, of course you also can only do a field value, so as not to have a merger.
Well, in the previous paragraph, we threw up the concept of inverted indexes, which I won’t explain in detail here. If you want to know more, just look up some information. An index stores associations between words and sentences, and then searches for incoming words to find sentences in reverse. At this point you may be aware that these indexes are not to be stored in the FST section. Yes, these inverted indexes will be stored in the FST region, nicely separated from the KV region.
Insert data
Well, with all this talk, we can finally get down to the facts. After Connecting sonic via Telnet, we try to insert a piece of data.
telnet localhost 1491
Trying ::1...
Connected to localhost.
Escape character is '^]'. CONNECTED < sonic - server v1.2.0 >SecretPassword is the password, you must enter the password
START ingest SecretPassword
# Sonic return information
STARTED ingest protocol(1) buffer(20000)
Insert data via PUSH
# movie is the collection name
# douban is the bucket name
# 1 is the key value of the object name
# "the knight" is value
PUSH movie douban 1 "the knight"
The value returned after successful insertion is OK
OK
# exit
QUIT
ENDED quit
Copy the code
I’ve explained each line in detail in the comments, but that’s probably not friendly enough. Each connection can be interpreted as a session, and this session starts with the START command. Of course, if the START command has not been executed for some time after connecting through Telnet, Sonic will automatically close the connection.
After the START command is executed, a session is started. The specific command format is START
The data is then inserted using the PUSH command in the format of PUSH
Of course some people ask, what’s the use of that? In the case of this statement PUSH movie Douban 1 “The knight”, it can see the hierarchy in action. It can categorize the search data, and more importantly, the knight is placed in the Douban bucket under the movie set. When there are other sets, like song, We can efficiently search under a bucket in a collection.
After successful insertion, an OK is returned.
Search data
After inserting the data, we try to connect again and enter a session in search mode.
# Start a search session
START search SecretPassword
STARTED search protocol(1) buffer(20000)
# search the data under movie -> Douban, search keyword is the
QUERY movie douban "the"
PENDING Q5Z3lY25
Return key (1)
EVENT QUERY Q5Z3lY25 1
Copy the code
Search, the single most important part of Sonic, is extremely simple to use but powerful. QUERY
”
” [LIMIT(
)]? [OFFSET(
)]? If you are familiar with SQL, you will immediately understand how to use it. Collection and bucket represent the detailed hierarchy, terms represent the keyword to be searched, limit limits the number of results to be returned, and offset represents the offset of the results.
PENDING Q5Z3lY25
EVENT QUERY Q5Z3lY25 1
Copy the code
Both lines are returned by Sonic after the search indicating that an event with event ID Q5Z3lY25 has occurred and the result is 1.
Sonic also supports auto-completion of words, such as th, it will return the word “the” to help your search auto-complete and improve the user experience. SUGGEST
”
” [LIMIT(
)] .
START search SecretPassword
STARTED search protocol(1) buffer(20000)
# Enter the letters th
SUGGEST movie douban "th"
PENDING SukqsbYk
# return the completed word
EVENT SUGGEST SukqsbYk the
Copy the code
Note that SUGGEST supports limit only. Use upper case when writing commands.
other
When sonic is in control mode, consolidate data, backup data, restore data, INFO data, etc.
These operations are important for data maintenance and service operations, but are clearly not the focus of this article. All of the above operations can be found in the sonic documentation, if you are interested, be sure to read it, it is very few and very easy to use.
conclusion
At the beginning and end of this article, I introduce sonic’s features, some of its concepts, and some of how it works. If you simply want to use Sonic, remember, familiarize yourself with the concepts mentioned in this article, make sure you understand the big picture of Sonic, and read its documentation in detail so you can try out Sonic.
At this point, we’ve pretty much covered all of Sonic, and compared to Elasticsearch, it’s really small and simple enough to make searching extremely subtle.
In the next article, I will use Python and mongodb to make a simple search application. Look forward to it, guys.
Overencapsulation brings simplicity, not true simplicity, but more complexity. — Reflections on Sonic vs. Elasticsearch