preface
I’ve found my technical output really good this year. Since the release of First Line of Code 3rd edition, I’ve had a lot of free time, not only updating and maintaining the open source libraries I’ve written, but also attending GDG events to share my technology.
Currently, I am maintaining two open source libraries, LitePal and PermissionX, which belong to the state of cross-maintenance. After upgrading this one, I will write another one as soon as possible. In fact, I was planning to write a new open source project this year, but NOW I don’t know if I can find enough time. I have a complete idea, but I haven’t started yet.
To get back to today’s topic, LitePal has nearly all the functionality required for a basic database since the last time it supported transactions in version 3.1, but for a long time, there have been calls from some friends for LitePal to support indexing.
What is an index
As for the index function, I considered adding it when I was making LitePal 1.x version. At that time, half of the code was written, but the result of measurement was not ideal, so I finally removed this part of function. Why not? Because in the mobile device database, the index actually does not play a big role, only when the amount of data is very large, the index can reflect the efficiency of the query advantage, and mobile devices usually do not have a very large amount of data.
But the lack of indexing can end up being a pain in the neck for me, because every now and then friends ask LitePal to support it. So I decided to add index support in Version 3.2 of LitePal to make up for this missing feature. This is also the last version of the 3. X series. The next version will have major architectural changes, and LitePal will gradually align with the design and use of Room.
In addition, I should note that although LitePal 3.2 supports indexing, I don’t think most people should use it. Because first, you really don’t need it (more on why); Second, you may not be able to use it well (using the index incorrectly can be inefficient). So, when you really know what you’re doing, use indexes.
Reading this, is there any friends think I have been trying to dissuade? True, but that doesn’t stop you from reading this article, because it’s good to know what an index is, even if you don’t use it.
In short, an index is a tool for speeding up database queries.
In our traditional impression, the database query speed is very fast, through a SQL statement in the database to find the data that meets the specified conditions can be completed almost instantaneously.
Have you ever wondered how a database can find the data that meets certain criteria from a mass of data?
In fact, there is no special technology, that is, all the data in the database table all query once, is the so-called full table search.
It sounds crazy, but it’s true, but thanks to the efficient execution of the database itself, queries are often very fast.
In SQLite, to find out if your query is a full table search, simply add the explain Query plan keyword before your query, as shown in the figure below.
As you can see, in the detail column, the information is SCAN TABLE Song, which means that SQLite searched the whole TABLE for Song.
This full-table search approach is a problem for any sane person, because full-table searches are bound to take longer and longer as more data is added to the table. For example, taobao has hundreds of millions of users of the database table, if EVERY time I log in to need to search all the hundreds of millions of user data, find out the account I logged in, it is obviously impractical.
So how to solve this problem? In order to quickly find a specific data from a large amount of data, all major databases provide indexing.
How indexes work
The principle of indexing is simple and complex, so I will try to put it as simple as possible. In simple terms, the principle of indexing is essentially binary search.
Binary search is a very magical algorithm, which can reduce the time complexity of search by one magnitude, so as to significantly improve the query efficiency, but the premise is that the data must be ordered.
So, for example, let’s say I have 2 billion pieces of data in an array, and I want to find one piece of data in that array, and if I use a traversal query, it would take me 2 billion queries in the worst case.
What about binary search? We can take the middle value each time, discard the half of the data that does not meet the criteria, and repeat the operation so that we can find the result only 31 times at most.
(Image from the Internet)
Are you scared by these two different magnitudes?
However, databases store very complex relational data that cannot be represented by simple arrays, and maintaining an orderly array is itself a very expensive thing.
This is where another high-level data structure needs to be introduced: binary search trees. Binary search tree is a tree-like data structure, which consists of three parts: the root node, the left subtree and the right subtree. The value of the left subtree is always less than the root node, and the value of the right subtree is always greater than the root node.
(Image from the Internet)
Did you find anything? Binary search trees can also use binary search properties, because it can also discard half of the data that does not meet the condition each time. In addition, maintaining a binary search tree is not as costly as maintaining an ordered array, because there are already many solutions, such as the well-known red-black tree.
However, the binary search tree scheme still does not work for database indexes, mainly because indexes are not only stored in memory, but also on disk. Hard disk storage is based on data blocks, and disk reading between different data blocks is time-consuming. If we use a binary search tree as the storage structure of the index, the tree height will be very high, so when using the index query, it may have to cross many disk data blocks in order to read data, resulting in low query efficiency.
Therefore, in order to reduce the height of the tree, almost all major databases use the data structure of n-fork tree to build indexes. An n-fork tree is similar to a binary tree, except that each root node can have more than one subtree, rather than being limited to two.
According to the data I queried, the N fork tree used by MySQL (B + tree to be exact), N is about 1200. If we try it out, 1200 to the third power is about 1.7 billion, which means that the height of the n-fork tree is only three levels, and we can store about 31 levels of data in the binary tree. In this way, a more suitable balance is found between memory query efficiency and disk query efficiency.
Although different databases in the specific implementation will be a little different, but the general idea is the same.
Now that you know what an index is, let’s take a look at how indexes are used.
The use of indexes
The use of indexes is very simple, at least in LitePal, because everything in LitePal is very simple.
If you want to add an index to a Column, just add an @column annotation above the Column and specify index = true, as shown below:
public class Song extends LitePalSupport { @Column(index = true) String name; String lyric; . }Copy the code
Then update the version number in litepal.xml so that litepal will automatically index the name field of the Song table.
That’s right. That’s OK. I did write a lot of code to support indexing, but as far as the user is concerned, there’s only so much you need to do.
Now we can check the same query again using the explain query plan keyword:
As you can see, the detail column has different information than before, which means that instead of a full table search, our query will use an index to speed up the query.
So what happens when you use indexes? To be honest, it’s not easy to verify the index, because on mobile we often don’t have a ton of data to verify it.
But unverified indexing is not convincing, so I’ll try my best to show you the results.
You don’t have a ton of data, so build your own.
The storage efficiency of LitePal is actually quite good. With the litepal.saveall () method, it takes about 700 milliseconds to store 10,000 pieces of data. I can simulate 100,000 pieces of data in 7 seconds. The code is as follows:
int loopCount = 10;
for (int i = 0; i < loopCount; i++) {
List<Song> songList = new ArrayList<>();
for (int j = 0; j < 10000; j++) {
Song song = new Song();
song.setName("name" + i * loopCount + j);
song.setLyric("lyric" + i * loopCount + j);
songList.add(song);
}
LitePal.saveAll(songList);
}
Copy the code
So let’s test with 100,000 data. First check the total data in the Song table:
It’s 100,000. That’s correct.
Then use the following query code to find the specified data from these 100,000 data:
long start = System.currentTimeMillis(); LitePal.where("name = ?" , "name10086").find(Song.class); long end = System.currentTimeMillis(); Log.d("TAG", "find with index cost " + (end - start) + "ms");Copy the code
The results are shown in the figure below.
As you can see, with the index, it only takes 6 milliseconds to query the specified data among 100,000 pieces of data. That’s pretty good, right?
So how long does the query take in the case of a full table search without indexes? Let’s try the same thing.
The lyric column is not indexed, so now we are using this column as a condition for a full table search. The code looks like this:
long start = System.currentTimeMillis(); LitePal.where("lyric = ?" , "lyric10086").find(Song.class); long end = System.currentTimeMillis(); Log.d("TAG", "find without index cost " + (end - start) + "ms");Copy the code
The results are shown in the figure below.
As you can see, the total time is 23 milliseconds.
Twenty-three milliseconds is about four times as long as six milliseconds, but on mobile devices, twenty-three milliseconds isn’t that long, and the query is almost over without you even noticing it.
And that’s 100,000 pieces of data, and we typically store far less than 100,000 pieces of data in database tables, which further reduces the performance benefits of indexes.
Of course, we all know that as the data volume increases, the performance advantage of the index will increase, but even if I scaled up the data volume to 1 million, the full table search is still very fast, almost all can be completed in 150ms time.
That’s why LitePal didn’t support indexing for a long time, because you can’t really store that much data on mobile, and even if you do, the performance gains are limited.
However, if the amount of data stored in a table is really large, then indexes must be used, so this technique is quite common in server-side databases.
With great rigor, I expanded the table to 10 million. This level of data is not very good to simulate. It took me more than ten minutes to store the data, and I had to make sure I had enough storage space on my phone. 10 million pieces of data might take up about 1 gigabyte of space (it varies from phone to phone, I tested it on another phone with 700 megabytes).
At this level of data, let’s first try the query speed with an index:
Still very fast, 10 milliseconds to get the data out.
What about not using indexes? Let’s try it:
This is a big comparison. Without indexes, a full table search of 10 million pieces of data would take 2.5 seconds, which is long enough for users to feel a noticeable lag.
Therefore, like the server database may have hundreds of millions of billions of data, this time is necessary to use the index, and mobile terminal may be difficult to imagine that there will be such a data magnitude scenario.
So far, we’ve analyzed what an index is, what an index is used in LitePal, and what an index actually does.
When to use indexes
So the final question is, should we use indexes at all?
In fact, whether you should use it or not depends on whether you need it or not. My personal opinion is that most people should not use it, because the mobile database is almost unlikely to store so much data. Using an index when you don’t need it will reduce the efficiency of your other database operations (such as adding, deleting, and modifying) because maintaining the B + tree of the index is time-consuming.
In addition, try to add the index column to keep the data repetition rate as low as possible, otherwise the index will be ineffective. And it makes sense, for example, if I index the gender column, because there are only men and women, and I have 10 million pieces of data and I have to look up maybe 5 million pieces of data to get all the data for that gender, it’s basically useless.
If you are already familiar with the index itself and know exactly what you are doing, use the index. LitePal is ready for you.
The upgrade is as simple as modifying the configuration in build.gradle:
Dependencies {implementation 'org. Litepal. Guolindev: core: 3.2.0'}Copy the code
All features in version 3.2.0 are backward compatible, so you can upgrade at no cost.
LitePal’s home page is:
Github.com/guolindev/L…
In addition, this article is intended for people who already have a basic understanding of LitePal to help them quickly learn about the new features in version 3.2.0. If you haven’t worked with LitePal before, you can read my technical column, The Android Database Master’s Secrets, for a very detailed explanation of how to use LitePal.
If you want to learn about Kotlin and the latest on Android, check out my new book, First Line of Code Version 3, here for more details.
Follow my technical public number, every day there are quality technical articles push.
Scan the qr code below to follow wechat: