preface

In the architecture section, we introduce the architecture of modern IM messaging system, the abstract model of Timeline and the typical architecture of a messaging system based on the Timeline model that supports “message roaming”, “multi-end synchronization” and “message retrieval”. In order to simplify readers’ understanding of the Tablestore Timeline model, the basic logical model of Timeline is briefly introduced, and the basic concepts of multiple synchronization modes, storage and index of messages in the message system are popularized.

This article is a supplement to the architecture. It will give a very detailed interpretation of the Timeline model of Tablestore, so that readers can understand the basic functions and core components of Timeline deeply into the implementation level. Finally, based on the SCENARIO of IM messaging system, we will see how to realize basic functions such as message synchronization, storage and index in IM scenarios based on Tablestore Timeline.

Timeline model

Timeline model takes “simplicity” as the design goal, and its core modules are relatively clear, mainly including:

  • Store: Timeline repository, a concept of tables similar to databases.
  • Identifier: Unique Identifier used to distinguish the Timeline.
  • Meta: metadata used to describe the Timeline. The metadata description uses the free-schema structure and can contain any columns.
  • Queue: All messages in a Timeline are stored in a Queue.
  • Message: The Message body transmitted within the Timeline. It is also a free-schema structure and can contain any columns.
  • Index: contains Meta Index and Message Index. You can customize indexes for any column in Meta or Message, and provide flexible multi-condition combination query and search.

Timeline Store



Timeline Store is the repository of Timeline, corresponding to the concept of tables in a database. The figure above shows the structure of the Timeline Store, where all the Timeline data is stored. Timeline is a data model oriented to massive messages. It is used in both message repository and synchronization repository. It must meet various requirements:

  • Massive data storage: For message repositories, if messages are stored permanently, the data scale increases over time. The repository must be able to store massive messages and data for a long time, and the capacity must reach PB level.
  • Low storage cost: The distinction between hot and cold message data is obvious. Most queries focus on hot data. Therefore, it is necessary to have a low-cost storage method for cold data, otherwise the storage cost will be very large as the data volume expands over time.
  • Data life cycle management: Data needs to have a defined life cycle, both for the storage and synchronization of message data. Repositories are used to store the message data itself online and usually require a long retention period. Synchronous libraries are used for online or offline push in write diffusion mode, usually with a short save time.
  • High write throughput: all kinds of scenarios of the messaging system, in addition to similar microblogging, the headline of this type of Feeds flow system, such as the vast majority of instant messaging or circle of friends news scene, is usually used by written message synchronous mode, write diffusion requires the underlying storage have extremely high throughput capacity of writing, the peak flow in response to the news.
  • Low latency read: Message systems are usually used in online scenarios and require low latency for queries.

The bottom layer of Tablestore Timeline is a distributed database based on LSM storage engine. The biggest advantage of LSM is that it is very friendly to write and naturally fits the mode of message write diffusion. At the same time, the query has been greatly optimized, such as hot data caching, Bloom filter and so on. The data table adopts the partitioning mode of Range Partition, which can provide the service capability of horizontal scaling and load balancing policy that can automatically detect and process hotspot partitions. In order to meet the different storage requirements of synchronous libraries and repositories, flexible custom configurations are also provided, including:

  • Time to Live (Data life cycle) : You can customize the data life cycle, for example, save it permanently or save it for N days.
  • Storage Type: Custom Storage type. For a repository, HDD is the best choice. For a synchronous repository, SSD is the best choice.

Timeline Module



Timeline Store stores massive amounts of Timeline. The detailed structure of a single Timeline is shown as follows. It can be seen that Timeline mainly contains three parts:

  • Timeline Meta: metadata section used to describe the Timeline, including:

    • Identifier: Uniquely identifies the Timeline and can contain multiple fields.
    • Meta: Metadata used to describe the Timeline. It can contain any number of fields of any type.
    • Meta Index: Metadata Index, which can be used to create indexes for any attributes in metadata and support multi-field query and retrieval.
  • Timeline Queue: A Queue for storing and synchronizing messages. The elements in the Queue consist of two parts:

    • Sequence Id: indicates the Sequence Id, which is used to locate the Message in the queue. The Sequence Id is incremented in the queue.
    • Message: The entity in the queue that holds the Message and contains the complete content of the Message.
  • Timeline Data: The Data part of the Timeline is Message, which mainly includes:

    • Message: Message entity, which can also contain any number of fields of any type.
    • Message Index: Indicates the Index of Message data. It can Index any column in a Message entity and supports multi-field query and retrieval.

IM messaging system modeling



Take a simple IM system as an example to see how to model based on Tablestore Timeline model. According to the example in the figure above, there are three users A, B and C, A and B have A single chat, A and C have A single chat, and A, B and C form A group chat. Let’s see how message synchronization, storage and read and write processes are modeled respectively based on Tablestore Timeline in this scenario.

Message synchronization model

The write diffusion model for message synchronization can fully take advantage of the Tablestore Timeline and balance read and write and resources of the entire system through write diffusion based on the feature of more read and less write in IM message scenarios. In the write diffusion model, each individual receiving messages has an inbox, and all messages that need to be synchronized to this individual need to be delivered to its inbox. In the example above, users A, B, and C have inboxes respectively, and each user pulls new messages from the same inbox on different devices.

Message synchronization library

The inbox is stored in the synchronization library, and each inbox in the synchronization library corresponds to a Timeline. According to the example in the figure, there are three Timeline inboxes in total. Each message receiver stores the SequenceID of the latest pulled message. Each pulled message starts from this SequenceID. The query to the synchronous library is frequent, usually for the latest information. Therefore, hot data should be cached in the memory to provide high concurrency and low latency. Therefore, the configuration of synchronous library usually requires SSD storage. If a message has been synchronized to all terminals, it means that the message in the inbox has been consumed and can theoretically be cleaned up. However, it is not designed to do active cleansing, but to automatically expire the data with a short life cycle, usually defined as one or two weeks. If you are still pulling new messages synchronously after the data has expired, you need to revert to read-spread mode, pulling messages from the repository.

Message repository

The message repository stores messages for each session, and the outbox of each session corresponds to a Timeline. Messages in the outbox can be pulled by session dimension. For example, historical messages in a session can be read from the outbox. Generally, new messages can be delivered to each receiver via online push or query synchronization libraries, so there are relatively few queries to the repository. Repositories are used to store messages for a long time, such as permanent storage, and have a larger data volume than synchronous repositories. So the repository of choice is typically HDD, and the data life cycle is determined by how long messages need to be stored, which is usually a long time.

Message index library

The Message Index library is attached to the repository and uses the Message Index of the Timeline to Index the messages in the repository, such as the full-text Index of the text content, the Index of the recipient, the sender, and the sending time, and supports advanced queries and searches such as full-text search.

conclusion

This article mainly explains the Tablestore Timeline model in detail, introducing the various modules of Timeline including Store, Meta, Queue, Data and Index, and finally taking a simple IM scenario as an example how to model based on Timeline. In the next implementation, a simple IM system supporting single chat, group chat, metadata management and message retrieval will be realized directly based on Tablestore Timeline. Please look forward to it.


The original link

This article is the original content of the cloud habitat community, shall not be reproduced without permission.