One, foreword

KV storage is very common and important for both client and server. For Android clients, SharePreferences(SP) provided by SDK is the most common, but its low efficiency and ANR problems have been criticized. Then kotlin-based DataStore was released. The Preferences DataStore is still the same, and the underlying storage policy remains the same. At the end of 2018, wechat opened its open source MMKV, with high popularity. I previously wrote about a KV storage component for Android client called LightKV, which was open source a little earlier than MMKV, but didn’t get much attention… But then again, LightKV’s design wasn’t mature enough because of the lack of awareness.

1.1 Deficiencies of SP

There is a lot of discussion online about the disadvantages of SP. Here are two main points:

  • The saving speed is slow

SP is stored in the memory layer with HashMap, and the disk layer with XML files. With each change, the entire HashMap needs to be serialized into an XML-formatted message and then written to the file. The reasons for its slowness are as follows: 1. 2. Serialization is time-consuming.

  • Can lead to ANR
public void apply(a) {
    / /... Omit extraneous code...
    QueuedWork.addFinisher(awaitCommit);
    Runnable postWriteRunnable = new Runnable() {
        @Override
        public void run(a) { awaitCommit.run(); QueuedWork.removeFinisher(awaitCommit); }}; SharedPreferencesImpl.this.enqueueDiskWrite(mcr, postWriteRunnable);
}
Copy the code
public void handleStopActivity(IBinder token, boolean show, int configChanges,
                               PendingTransactionActions pendingActions, boolean finalStateRequest, String reason) {
    / /... Omit extraneous code...
    // Make sure any pending writes are now committed.
    if (!r.isPreHoneycomb()) {
        QueuedWork.waitToFinish();
    }
}
Copy the code

When the Activity stops, it waits for the SP write task. If the SP write task is too many and executed slowly, the main thread may be blocked for a long time. If the SP write task is slow, the main thread may be blocked for a long time.

Shortage of 1.2mmkV

  • GetAll MMKV has no type information and does not support getAll MMKV stores the key and value itself in a manner similar to that of Protobuf, which stores no type information (Protobuf tags fields with less information). Since there is no record type information, MMKV cannot be deserialized automatically, so the getAll interface cannot be implemented.
  • SP has deserialized the value into the HashMap when it is loaded, so it can be referenced directly after the index is read. MMKV needs to be re-decoded every time it reads, so in addition to the time consumption, it also needs to create new objects every time. However, this is not a big problem, not much worse than SP.
  • The volume needed to add MMKV is quite a lot, not to mention jar packages and AIDL files, a SINGLE ARM64-V8A SO alone has more than 400 K.

These days, apps are bulky, but increasing the size of the APP has some impact on packaging, distribution, and installation time.

  • The expansion strategy of files only increasing MMKV without decreasing MMKV is still quite radical, and it will not proactively trim the size after expansion. For example, if you have a large value, let it grow to 1M, then delete that value, then even if GC is triggered, even if the valid content is several kappa, the file size remains at 1M.
  • Data may be lost

    None of the previous issues are “critical” issues in general, but this missing data is a real pain in the ass.

    MMKV has an official statement like this:

    The MMAP memory mapping file provides a memory block that can be written at any time. App only writes data into it, and the operating system is responsible for writing the memory back to the file. There is no need to worry about data loss caused by crash.

This statement is half right, half wrong. If the data is written to the block and the system does not crash, the buffer will be flushed to disk even if the process crashes. But if, infrequently, a system crash or power outage occurs before the flush, the data is lost; Alternatively, the process crashes or is killed in the middle of writing data, and the system flushes what has been written to disk, leaving the file incomplete when it opens again. For example, the MMKV will reclaim invalid space when there is insufficient remaining space, and if the process is interrupted during this period, the data may be incomplete. This can be supported by MMKV’s official statement:

After CRC fails, MMKV has two strategies: discard all data directly, or try to read data (which can be set by the user during initialization). Trying to read the data may not recover the data, and you may even read some wrong data, depending on your luck.

This process is relatively easy to reproduce, and the following is one of the reproduce paths:

  1. Add and delete a number of key-values to obtain the following data:

  1. Insert a large string to trigger expansion. Garbage collection will be triggered before expansion

  2. A breakpoint occurs in the loop that executes memmove, executes part of memmove, and then kills the process on the phone

  1. When you open APP again, data is lost

In contrast, SP is inefficient, but at least it doesn’t lose data.

Second, the FastKV

After summarizing the previous experience and perception, the author figured out an efficient and reliable version, and named it: FastKV.

2.1 features

FastKV has the following features:

  1. Fast read/write speed
    • FastKV, binary encoding, encoding volume is much smaller than XML and other text encoding;
    • Incremental encoding: FastKV records the offset of each key-value relative to the file (including invalid key-value), so that data can be directly written to the specified location when updating data.
    • By default, mMAP is used to record data. When updating data, it is directly written to the memory without I/O blocking.
  2. Multiple write modes are supported
    • In addition to non-blocking writes like Mmap, FastKV also supports regular blocking writes, and supports synchronous blocking and asynchronous blocking (similar to commit and Apply for SharePreferences, respectively).
  3. Support for multiple types
    • Support the commonly used Boolean/int/float/long/double/base types such as String;
    • ByteArray (byte[]) is supported.
    • Supports storing objects.
    • Built-in encoder for Set (for easy compatibility with SharePreferences).
  4. Convenient and easy to use
    • FastKV provides a rich API interface, out of the box.
    • The interfaces provided include getAll() and putAll() methods, so it’s easy to migrate data from frameworks like SharePreferences to FastKV, as well as from FastKV to other frameworks.
  5. Stable and reliable
    • Ensure data integrity through methods such as double-write.
    • Provides degradation handling when API throws IO exceptions.
  6. Code to streamline
    • FastKV is implemented in pure Java and compiled into A JAR package with a volume of only over 30 K.

2.2 Implementation Principles

2.2.1 coding

File layout:

[data_len | checksum value | | key – key – value |…

  • Data_len: 4 bytes, which records the number of bytes of all key-values.
  • Checksum: contains 8 bytes and records the checksum of the key-value part.

Key-value data layout:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | delete_flag | external_flag | type | key_len  | key_content | value | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1bit | 1bit | 6bits  | 1 byte | | |Copy the code
  • Delete_flag: indicates whether the current key-value is deleted.
  • External_flag: indicates whether the value part should be written to additional files. Note: Storing a value with a large amount of data in the main file will affect the access performance of other key-values. Therefore, use a separate file to save the value and record the name of the value in the main file.
  • Type: type of the value, currently supports Boolean/int/float/long/double/String/ByteArray containing and custom object.
  • Key_len: records the length of the key. Key_len itself is 1 byte, so the maximum length of the key is 255.
  • Key_content: The content of the key itself, utF8 encoding.
  • Value: value of the base type, directly encoded (little-end); For other types, the length (encoded with varint) is recorded first, followed by the content. String uses UTF-8 encoding, ByteArray does not need encoding, custom objects to implement Encoder interface, respectively in Encoder encode/decode method serialization and deserialization.

2.2.2 storage

  • Mmap To improve write performance, FastKV uses mmap by default.

  • Demote When an I/O exception occurs in the MMAP API, demote to regular blocking I/O and place writes to asynchronous threads so as not to affect the current thread.

  • Data integrity If an interrupt (process or system) occurs while writing a part, the file may be incomplete. Therefore, some means are needed to ensure data integrity. When opened in mmap mode, FastKV adopts double-write mode: data is written to A/B in sequence to ensure that there is always one complete file at any time; When loading data, verify the correctness of data by checksum, marking, and data validity check. Double-write prevents incomplete data after a process crash. However, mMAP periodically flusher data. If the system crashes or is powered off, updates are still lost (the previous data is still available, but only updates are lost). You can force a disk flush by calling force(), but that doesn’t take advantage of mmap. You can’t call force() every time you update, so you can use blocking I/O instead. For this reason, FastKV also supports blocking I/O for writing files. When using blocking I/O, write the temporary file first, then delete the main file after the complete write, and then rename the temporary file as the main file. FastKV supports both synchronous and asynchronous blocking I/O and writes similar to SP’s COMMIT and apply, but serializes the key-value portion incrementally, which is much faster than SP’s serialization of the entire HashMap.

  • Update policy (Add/Delete/Modify) New: Write to the end of data. Delete: delete_flag set to 1. Modify: If the length of the value part is the same as the original, write directly to the original position; Otherwise, write key-value to the end of the data, mark original delete_flag as 1, and update datA_len and checksum of the file.

  • Gc/TRUNCATE Information is collected when a key-value is deleted, such as the number, location, and space occupied. There are two triggering points for GC: 1. When a key-value is added, the remaining space is insufficient, and the deleted space reaches the threshold, and the deleted space is sufficient for writing the current key-value, GC is triggered. 2. When deleting key-values, GC is triggered if the deleted space or number of key-values reaches the threshold. If the unused space reaches the threshold after GC, truncate (reduce file size) is triggered.

2.3 Usage

2.3.1 import

dependencies {
    implementation 'the IO. Making. Billywei01: fastkv: 1.0.2'
}
Copy the code

2.3.2 initialization

    FastKVConfig.setLogger(FastKVLogger)
    FastKVConfig.setExecutor(ChannelExecutorService(4))
Copy the code

Initialization can set log callbacks and executors as needed. It is recommended to pass in your own thread pool to reuse threads.

The logging interface provides three levels of callback that can be implemented on demand.

    public interface Logger {
        void i(String name, String message);

        void w(String name, Exception e);

        void e(String name, Exception e);
    }
Copy the code

2.3.3 Data Read and write

  • Basic usage
    FastKV kv = new FastKV.Builder(path, name).build();
    if(! kv.getBoolean("flag")){
        kv.putBoolean("flag" , true);
    }
Copy the code
  • Save custom objects
FastKV.Encoder<? >[] encoders =new FastKV.Encoder[]{LongListEncoder.INSTANCE};
    FastKV kv = new FastKV.Builder(path, name).encoder(encoders).build();
        
    String objectKey = "long_list";
    List<Long> list = new ArrayList<>();
    list.add(100L);
    list.add(200L);
    list.add(300L);
    kv.putObject(objectKey, list, LongListEncoder.INSTANCE);

    List<Long> list2 = kv.getObject("long_list");
Copy the code

FastKV supports saving custom objects. In order to automatically deserialize files, you need to pass in the encoder of the object when building the FastKV instance. Encoder is an object that implements FastKV.Encoder. For example, the implementation of LongListEncoder is as follows:

public class LongListEncoder implements FastKV.Encoder<List<Long>> {
    public static final LongListEncoder INSTANCE = new LongListEncoder();

    @Override
    public String tag(a) {
        return "LongList";
    }

    @Override
    public byte[] encode(List<Long> obj) {
        return new PackEncoder().putLongList(0, obj).getBytes();
    }

    @Override
    public List<Long> decode(byte[] bytes, int offset, int length) {
        PackDecoder decoder = PackDecoder.newInstance(bytes, offset, length);
        List<Long> list = decoder.getLongList(0);
        decoder.recycle();
        return(list ! =null)? list :newArrayList<>(); }}Copy the code

Encoding objects involves serialization/deserialization. Here is another framework recommended by the author: github.com/BillyWei01/…

  • blocking I/O

To use blocking I/O, just call blocking() or asyncBlocking() when you construct FastKV. Usage:

    FastKV kv = new FastKV.Builder(TestHelper.DIR, "test").blocking().build();
    // Automatic submission
    kv.putLong("time", System.currentTimeMillis());

    // Batch commit
    kv.disableAutoCommit();
    kv.putLong("time", System.currentTimeMillis());
    kv.putString("str"."hello");
    kv.putInt("int".100);
    boolean success = kv.commit();
    if (success) {
        // handle success
    }else {
        // handle failed
    }
Copy the code

2.3.4 For Android

The usage on Android is consistent with the general usage, except that the Android platform has the SharePreferences API and support for Kotlin. The FastKV API is compatible with SharePreferences, making it easy to migrate SharePreferences data to FastKV. For more information: github.com/BillyWei01/…

Three, performance test

  • Test data: Collect part of key-value data summarized by SharePreferenses in the APP (after random confusion) to get a total of more than 400 key-values. In daily use, some key-value accesses are more than others, so a normally distributed access sequence is constructed.
  • Comparison object: SharePreferences and MMKV
  • Test model: Honor 20S

Read and write 10 times respectively, which takes the following time:

Write (ms) Read (ms)
SharePreferences 1490 6
MMKV 34 9
FastKV 14 1
  • The SharePreferences submission is done using Apply and still takes quite a bit of time, taking more than five seconds to commit on the machine.
  • MMKV reads more slowly than SharePreferences and writes more quickly.
  • FastKV reads and writes faster than the other two methods.

Four, conclusion

This paper discusses various KV storage modes of Android platform, proposes and implements a new storage component, and focuses on solving the problems of KV storage efficiency and data reliability. Current code above Github: github.com/BillyWei01/…