The harder you work, the more lucky you are. This article has been collected in GitHub JavaCommunity, which has an interview to share, source code analysis series articles. Welcome to collect, like github.com/Ccww-lx/Jav…
In actual development, Redis will be used frequently, so how should we choose the right data type during the use process? Which data types apply to which scenarios. It is also common to be asked questions about Redis data structure in interviews:
- Why is Redis fast?
- Why are queries slowing down?
- Redis Hash Rehash process
- Why use a hash table as an index of Redis?
When we analyze and understand Redis data structure, we can make the correct choice of data type when we use Redis and improve system performance.
Redis
Underlying data structure
Redis is a memory key-value key-value database, and key-value pair data is stored in memory, so Redis based on memory data operation, its efficiency is high, fast;
Redis supports String, List, Hash, Set, Sorted Set, BitMap, and other value types. Redis can be widely applied to many business scenarios based on its diverse types of values.
The data type of Redis Value is based on the redisObject system, which is customized for Redis.
typedef struct redisObject{
/ / type
unsigned type:4;
/ / code
unsigned encoding:4;
// A pointer to the underlying implementation data structure
void*ptr; ... . }Copy the code
In addition to the actual data, a redisObject requires additional memory space to record metadata information such as data length and space usage. The redisObject contains 8-byte metadata and an 8-byte pointer that points to the location of the actual data of a specific data type:
The pointer points to the location where data is stored by Redis’s underlying data structure: SDS, bidirectional linked list, hop list, hash table, compressed list and integer set.
So how does Redis implement the underlying data structure?
Redis underlying data structure implementation
Let’s take a look at Redis’ simpler **SDS, bidirectional linked lists, integer sets **.
SDS
, bidirectional linked lists, and integer sets
SDS, which uses len to record the number of bytes used, reduces the complexity of retrieving string length to O(1), and SDS is lazy to free space. You free space, and the system records the data for use next time it wants to use it. Don’t apply for new space.
Integer set, allocate a piece of contiguous address space in memory, data elements will be stored next to each other, no need for additional Pointers to bring space overhead, it is characterized by compact memory saving memory space, query complexity O(1) high efficiency, other operations complexity O(N);
Bidirectionally linked lists, which in memory can be non-contiguous, non-sequential Spaces, concatenate the order between elements with an extra pointer overhead.
It is characterized by high efficiency and O(N) query complexity as well as O(1) section insert/update data complexity.
Hash
Hash table
A hash table, in fact, is similar to an array, each element of the array is called a hash bucket, each hash bucket holds the key-value pair data, and the elements in the hash bucket usedictEntry
The structure,
Therefore, the hash bucket element does not store the key-value pair itself, but rather the pointer to the specific Value. ** Therefore, the storage of each key-value pair costs at least 24 bytes. ** Especially for String key-value pairs, each key-value pair costs 24 bytes of space. When saving data is small and the extra overhead is larger than the data, then to save space, consider changing the data structure.
Let’s have a look at the global hash table:
Although hash table operations are fast, there is a potential risk as Redis data gets larger: hash table conflicts and rehash overhead, which explains why hash table operations are slow?
When more data is written into the hash table, hash conflicts are inevitable. Redis solves hash conflicts by using chained hash. Multiple elements in the same hash bucket are stored in a linked list, and they are connected by Pointers in turn, as shown in the figure:
When there are more and more hash conflicts, it will lead to the excessively long chain of some hash conflicts, which will lead to the time-consuming and inefficient element search on the chain.
To solve the problem of long hashing chains, rehash the existing hash bucket to spread out the number of elements in a bucket. So how does the rehash process work?
Rehash
To make the rehash operation more efficient, two global hash tables are used: hash table 1 and hash table 2, as follows:
- Hash table 2 into a larger space,
- Remap and copy the data from hash table 1 into hash table 2;
- Free space for hash table 1
However, due to the large amount of data in table 1 and Table 2 during remapping and replication, if all the data in table 1 are migrated at one time, the Redis thread will be blocked and unable to serve other requests.
To avoid this problem and ensure that Redis can properly process client requests, Redis uses progressive Rehash.
When processing a request, copy all entries in index positions from hash table 1 to hash table 2 in turn, allocating the cost of a large number of copies to multiple processing requests, avoiding time-consuming operations and ensuring fast data access.
After understanding Hash tables, take a look at unusual compressed lists and hop tables.
Compressed lists and hops
List of compressionOn the basis of array, there are three fields zlBytes, ZLTail and zllen in the header of the compressed list, representing the length of the list, the offset at the end of the list and the number of entries in the list respectively. The compressed list also has a Zlend at the end of the table, indicating the end of the list.
Advantages: Compact memory saves memory space. In memory, a piece of contiguous address space is allocated, and data elements are stored next to each other. There is no need for extra Pointers to bring space overhead. The first and last elements can be located directly by the length of the first three fields in the table. The complexity is O(1).
Skip table, on the basis of linked list, added multi-level index, through several jumps of index position, to achieve fast location of data, as shown in the following figure:
For example, query 33
Features: When the data volume is large, the search complexity of the hop table is O(logN).
To sum up, we can know the time complexity of the underlying data structure:
Data structure type | Time complexity |
---|---|
Hash table | O(1) |
An integer array | O(N) |
Two-way linked list | O(N) |
List of compression | O(N) |
Jump table | O(logN) |
The Redis custom object system type is the data type of Redis Value. The data type of Redis is based on the underlying data structure. What are the data types?
Redis data type
String, List, Hash, Sorted Set, and Set are common types. Their mappings to underlying data structures are as follows:
The data type | The data structure |
---|---|
String | SDS(Simple Dynamic String) |
List | Two-way linked list List of compression |
Hash | List of compression Hash table |
Sorted Set | List of compression Jump table |
Set | Hash table An integer array |
The corresponding characteristics of data types are similar to the underlying data structures implemented by them, and the properties are the same
String, based on SDS implementation, suitable for simple key-value storage, setnx key value implementation distributed lock, counter (atomicity), distributed global unique ID.
List, sorted by the order in which the elements enter the List, follows the FIFO rule and is commonly used in sorting statistics and simple message queues.
Hash is a mapping between the string key and the string value. It is very suitable for representing an object. The complexity of adding and deleting operations is O(1).
A Set is an unordered Set of String elements whose members are unique, which means that no duplicate data can occur in the Set. Hash table based implementation, so add, delete, search complexity is O(1).
Sorted Set is an upgrade of the Set type. The difference is that each element is associated with a score of type double.
So let’s look at these data types, Redis Geo, HyperLogLog, BitMap?
Redis Geo considers the earth as approximately a sphere and converts two-dimensional latitude and longitude into strings based on GeoHash to achieve position division and query of specified distances. Features are commonly used in location-dependent applications.
HyperLogLog is a probabilistic data structure that uses probabilistic algorithms to calculate approximate cardinality of sets with an error rate of about 0.81%. When the set has a large number of elements, the space it needs to calculate the cardinality is always fixed and small enough to be used for UV statistics.
BitMap, with a bit to map the state of an element, only 0 and 1 two states, very typical binary state, and its own is a statistical binary state data type with String type as the underlying data structure, advantages of saving a lot of memory space, but used in binary statistics scenarios.
With this knowledge in mind, what are the strategies for selecting Redis data types for application scenarios?
Choose the right oneRedis
Data type policy
In practical development applications, Redis can be applied to many business scenarios, but how do we choose data type storage?
Based primarily on time/space complexity, the following points can be considered in practical development:
- Data volume, the size of the data itself
- Collection type statistical mode
- Supports single point query/range query
- Special Application Scenario
Data volume, the size of the data itself
When the amount of data is large and the data itself is small, using **String** will use extra space greatly increased, because using hash tables to save key-value pairs and using dictEntry structure will cause the overhead of saving three Pointers to dictEntry for each key-value pair. As a result, the data itself is less than the overhead of the extra space, and ultimately the data size of the storage space is much larger than the original data storage size.
List, Hash, and Sorted sets can be implemented using integer arrays and compressed lists, because both allocate a contiguous space in memory, and then place the elements of the collection one by one. It is very compact, and there is no need to concatenate the elements through extra Pointers. This avoids the space overhead of extra Pointers. In addition, when using the set type, a key corresponds to a set of data, which can save a lot of data, but also only a dictEntry, so as to save memory.
Collection type statistical mode
The common Redis collection type statistical patterns are:
- Aggregate statistics (intersection, difference, and union statistics) : This parameter can be selected for aggregate calculation of multiple sets
Set
; - Collation statistics (requires collection types to preserve order for elements) :
Redis
In theList
和Sorted Set
Is an ordered set,List
Enter by elementList
In the order of,Sorted Set
You can sort by the weight of the elements; - Binary state statistics (set elements have only 0 and 1 values) :
Bitmap
Itself is to useString
Bitmap uses BITCOUNT to count the number of 1 after BITOP operations by bit and, or, xOR. - Cardinality statistics (counting the number of non-repeating elements in a set) :
HyperLogLog
It is a kind of data set type used for statistical cardinality. The statistical result has certain error, the standard error rate is 0.81%. For precise statistics, use Set or Hash.
The Set type is applicable to the aggregation operation of users/friends/followers/fans/interested people, for example
- Count the number of new users of mobile APP every day
- Mutual friends of two users
In Redis, List and Sorted Set are ordered sets, which are used to sort collection elements, such as
- List of recent comments
- list
Bitmap binary status statistics, applicable to statistics that have a large amount of data and can be represented by binary status. For example:
- Check in, the number of users check in that day
- Weekly User Activity
- User Online Status
HyperLogLog is a type of data set used to count cardinality, the number of non-repeating elements in a set, for example
- Count the UV of a web page. Multiple visits by a user in a day only count as one
Supports single point query/range query
In Redis, List and Sorted Set are ordered collections that support range query, but Hash does not support range query
Special Application Scenario
Message queue, using Redis as the realization of message queue, to the basic requirements of message sequence preservation, processing repeated messages and ensure message reliability, the scheme is as follows:
- List-based message queue solution
- A Stream-based message queue solution
Based on the List | Based on the Strems | |
---|---|---|
Message order preserving | useLPUSH/RPOP |
useXADD/XREAD |
Block read | useBRPOP |
useXREAD block |
Duplicate message processing | The producer implements its own globally unique ID | Streams automatically generates globally unique ids |
Message reliability | useBRPOPLPUSH |
usePENDING List Automatically saves messages |
Applicable scenario | Small amount of messages | The total number of messages is large and data needs to be read in the form of consumer groups |
Location-based LBS service is implemented with Redis specific GEO data type. GEO can record the geographical location information in the form of latitude and longitude, which is widely used in LBS service. For example: how taxi hailing apps offer location-based services.
conclusion
The reason why Redis is so fast is that its memory-based data operation and the use of Hash table as index are efficient and fast. Moreover, due to the diversity of its underlying data, Redis can be applied to many scenarios. Selecting the right data type in different scenarios can improve its query performance.
Finally, wechat search “Ccww Technology Blog” to watch more articles, but also welcome to pay attention to a wave.