The basic concept

  • Large memory – based instances may cause a series of potential problems during instance recovery and master-slave synchronization

    • For example: recovery time increases
    • The primary/secondary switchover costs a lot
    • Buffer overflow prone
  • Pika\, developed by 360’s DBA and infrastructure group

  • Pika target (using SSD smoothing instead of Redis) \

    • A single instance can hold a large amount of data while avoiding the potential problems of instance recovery and master-slave synchronization

    • Compatible with Redis data types, allowing smooth migration of applications using Redis to Pika

Potential problems with large memory Redis instances

  • Potential problems with large memory

    • RDB Generates and restores inefficient snapshots

      • A long fork causes the main thread to block
      • The swap memory may be switched to disk
    • The full synchronization duration increases, causing buffer overflow

      • A large number of RDB files are synchronized in full. As a result, the synchronization duration increases
      • The primary/secondary switchover takes longer, which also affects service availability

Pika overall architecture

  • The overall architecture

    • Network framework
    • Pika thread module
    • Nemo Storage module
    • RocksDB
    • binlog
  • Network framework

    • Function: Receives and sends requests from the underlying network

    • Implementation:

      • The network function at the bottom of the operating system is encapsulated socket

      • The Pika thread module uses a multithreaded model to deal specifically with client request \

        • Request DispatchThread \

        • A set of workerthreads (encapsulating requests into tasks) \

        • ThreadPool \

    • Tuning: Increase the number of worker threads and the number of threads in the thread pool

  • Nemo

    • The data type compatibility of Pika and Redis is realized, and the learning cost of Pika is reduced
  • binlog

    • Record write command, used for command synchronization between master and slave nodes (avoid large memory replication, command is much smaller than data) \

How does Pika store more data based on SSDS?

  • Basic concept: RocksDB, a persistent key value database widely used in the industry, is used

  • RocksDB’s read and write mechanism (which does not take up much memory) \

    • RocksDB uses two small memory Spaces to cache the written data alternately (Memtable1, Memtable2) \

      • Usually several MB, tens of MB
    • Memtable1 is written first, and Memtable1 is written to SSD

    • Now Memtable2 will replace Memtable1

    • Wait until Memtable1 data is written and Memtable2 is full, then switch to Memtable1

  • Why doesn’t PIKA have problems with large file synchronization efficiency and memory overflow

    • The data files are saved based on RocksDB and no longer need to recover from memory snapshots

    • Implementing incremental command synchronization saves memory and avoids the problem of buffer overflow

  • The advantage of pika

    • Pika uses RocksDB to save large amounts of data to SSD while avoiding the generation and recovery problems of memory snapshots \

    • Pika uses the binlog mechanism for master/slave synchronization to avoid the impact of large memory \

\

How does Pika implement Redis data type compatibility?

  • Basic concept: RocksDB provides only single-valued key-value pair types that only satisfy redis’s String data structure

  • The Nemo module converts the collection type of Redis into a single-valued key-value pair \

    • Redis collection type

      • The List and Set types also have single-value \ in their collections

      • The Hash (field-value) and Sorted Set (member-score) types, whose sets of elements are paired \

    • The list of the conversion

      • Key: Ensures that multi-bit components are stored in a meaningful order in the list
      • Value: preceding and succeeding elements, lifetime, value, version, lifetime
    • The set of transformations

      • Key: Saves the key and value of a set
      • Value: saves the version and lifetime
    • Transformation of the hash

      • Key: size hashkey field1
      • Value: value1, field2, value2…..
    • zset

      • It’s like a hash, but sort by score

The list of the conversion

\

Other advantages and disadvantages of Pika

  • The advantages of pika

    • Instance restart fast get data directly from SSD, no need to play back data \

    • Full synchronization is low risk, incremental synchronization with binlog (disk) and no buffer size limitation

    • Multithreaded model reduces the performance impact of read and write SSDS on PIKA

  • The disadvantage of pika

    • The access performance is lower than redis

      • Move storage from cache to memory to SSD
      • Recording binlog is inefficient
      • I feel the data structure is inefficient
  • Application scenario: Keeping large volumes of data is our primary need, so Pika is a good solution

\

conclusion

  • Pika advantages

    • Not only support Redis operation interface, but also support to save large amount of data

    • Support for migrating Redis

  • tuning

    • Increase the amount of thread data to improve the processing capability of concurrent requests
    • SSDS with high configuration are used to improve SSD access performance
  • Redis migration pika

    • Redis data is migrated to Pika\

      • Aof_to_pika -i [Redis AOF file] -h [Pika IP] -p [Pika port] -a [authentication information]
    • Forward the Redis request to Pika\

  • Github.com/Qihoo360/pi… \

\