String object

The string data type is the most commonly used type in Redis. Its key and value are strings, which are very convenient to use. Although values of string data types are generally called strings, the appropriate encoding is automatically selected for actual storage depending on the value. There are three encoding types for string objects: int, RAW, and embstr.

Redis object

Redis uses a uniform data structure to represent an object, defined as follows:

typedef struct redisObject {
    unsigned type:4;
    unsigned encoding:4;
    // The LRU algorithm is used to clear objects in memory when the memory limit is exceeded
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or * LFU data (least significant 8 bits frequency * and most significant 16 bits access time). */
    // The number of references to this object
    int refcount;
    // The value pointer to the object
    void *ptr;
} robj;
Copy the code

The type field represents the type of the object. There are seven values:

/* A redis object, that is a type able to hold a string / list / set */

/* The actual Redis Object */
#define OBJ_STRING 0    /* String object. */
#define OBJ_LIST 1      /* List object. */
#define OBJ_SET 2       /* Collection object. */
#define OBJ_ZSET 3      /* Ordered collection object. */
#define OBJ_HASH 4      /* Hash object. */

/* The "module" object type is a special one that signals that the object * is one directly managed by a Redis module. In this case the value points * to a moduleValue struct, which contains the object value (which is only * handled by the module itself) and the RedisModuleType struct which lists * function pointers in order to serialize, deserialize, AOF-rewrite and * free the object. * * Inside the RDB file, module types are encoded as OBJ_MODULE followed * by a 64 bit module type ID, which has a 54 bits module-specific signature * in order to dispatch the loading to the right module, plus a 10 bits * encoding version. */
#define OBJ_MODULE 5    /* Module object. */
#define OBJ_STREAM 6    /* Stream object. */
Copy the code

Then there is the Encoding field, which represents the actual encoding type of the object value. There are 11 values:

/* Objects encoding. Some kind of objects like Strings and Hashes can be * internally represented in multiple ways. The 'encoding' field of the object * is set to one of this fields for this object. */
#define OBJ_ENCODING_RAW 0     /* Simple dynamic string */
#define OBJ_ENCODING_INT 1     /* Integer of type long */
#define OBJ_ENCODING_HT 2      * / / * dictionary
#define OBJ_ENCODING_ZIPMAP 3  /* Compress dictionary */
#define OBJ_ENCODING_LINKEDLIST 4 /* No longer used for the old list, use a double-ended list. */
#define OBJ_ENCODING_ZIPLIST 5 /* Compress list */
#define OBJ_ENCODING_INTSET 6  /* Set of integers */
#define OBJ_ENCODING_SKIPLIST 7  /* Skip lists and dictionaries */
#define OBJ_ENCODING_EMBSTR 8  /* embstr-encoded simple dynamic string */
#define OBJ_ENCODING_QUICKLIST 9 /* List encoded as ziplist */
#define OBJ_ENCODING_STREAM 10 /* Code the base tree for listPacks */
Copy the code

As mentioned earlier, string objects use only long integers, simple dynamic strings, and simple dynamic strings encoded by EMbstr.

OBJ_ENCODING_INT

When the value of a string object is an integer and can be represented by a long, the encoding of the string object is the OBJ_ENCODING_INT encoding.

As you can see, OBJ_ENCODING_RAW is still used when the value is very large.

OBJ_ENCODING_RAW

When the string object’s value is a string larger than 44 bytes, the encoding of the string object is OBJ_ENCODING_RAW. The specific structure is described below.

OBJ_ENCODING_EMBSTR

When the value of a string object is a string of 44 bytes or less, the encoding of the string object is the OBJ_ENCODING_EMBSTR encoding. The main differences between OBJ_ENCODING_EMBSTR and OBJ_ENCODING_RAW are as follows:

  • OBJ_ENCODING_RAWEncoded objects are allocated twice when memory is allocated, and created separatelyredisObjectThe objects andSDSObject. whileOBJ_ENCODING_EMBSTRThe code is assigned once.
  • In the same way,OBJ_ENCODING_RAWThe encoded object also needs to free memory twice,OBJ_ENCODING_EMBSTRCoding is once.
  • OBJ_ENCODING_EMBSTREncoded data is stored in contiguous memory,OBJ_ENCODING_RAWCoding is not.
/* Create a string object with EMBSTR encoding if it is smaller than * OBJ_ENCODING_EMBSTR_SIZE_LIMIT, otherwise the RAW encoding is * used. * * The current limit of 44 is chosen so that the biggest string object * we allocate as EMBSTR will still fit into the 64 byte arena of jemalloc. */
#define OBJ_ENCODING_EMBSTR_SIZE_LIMIT 44
robj *createStringObject(const char *ptr, size_t len) {
    if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT)
        return createEmbeddedStringObject(ptr,len);
    else
        return createRawStringObject(ptr,len);
}
Copy the code

SDS

Strings are a very common type in Redis, and Redis implemented in C is different from Java. In C strings are implemented as an array of characters of length N+1, with the empty string ‘\0’ as the terminator. Getting the length of the string requires traversing the empty string ‘\0’ to find the length of the string, which is O(N).

If you have a very large string length, single-threaded Redis might block for a long time to get its length, which is unacceptable, so Redis needs a more efficient string type.

Redis implements a string type called SIMPLE Dynamic String (SDS), which has two variables representing the length of the string and the number of unused characters in the character array, so that the length of the string can be obtained with O(1) complexity. It also uses the empty string ‘\0’ as the closing symbol.

struct sdshdr {
    // String length
    int len;
    // The number of unused characters in the character array
    int free;
    // Saves a character array of strings
    char buf[];
}

Copy the code

Expansion mechanism

SDS automatically expands when the character array space is insufficient to accommodate new strings.

If a C string is concatenated to an SDS, when the space of the character array is insufficient, the SDS will first expand to just the length of the new string, and then expand the length of the empty characters of the new string. Finally, the length of the character array of the SDS is equal to 2 * the new string + 1 (ending sign ‘\0’). However, when the size of the new string exceeds 1MB, the size of the expanded null character is fixed at 1MB.

The reason for this mechanism is that Redis, as a NoSQL database, will frequently modify strings. The expansion mechanism is equivalent to making a buffer pool for SDS. The StringBuilder implementation is the same as that of Java StringBuilder, which optimizes the memory reallocation of SDS for N consecutive string increases to N times.

Afterword.

I have read two books about Redis, both of which are about how Redis works, but not about the design and implementation of Redis. This also leads to the interview is very awkward, because the interviewer most like to ask about the principle of things, so in the future when learning technology do not start from the actual class of books, or to understand the principle of the better.

The resources

This is a summary of the string section in Redis Design and Implementation.