background

In a distributed scenario, globally unique ids need to be generated in many places. For example, after the database is divided into tables, the unique ID needs to be used instead of the auto-increment ID of the stand-alone version. The basic requirements of the transmitter are

  • Globally unique, can’t be repeated in any way

Some scenarios also require monotonous increment, such as sorting requirements.

There are many articles on the Internet, such as “Leaf – Meituan-Review Distributed ID Generation System” by Meituan-Review, and “How to Make a Reliable Generator” by meituan-Review. This article focuses on high availability and high performance

  • High availability: No service unavailability or duplicate number due to system failure
  • High performance: The transmitter is usually a very high concurrency system that can be scaled horizontally while performing well

What are common solutions to basic requirements? Are they highly available and high-performance?

Snowflake scheme

Snowflake is generated using a 41-bit timestamp plus a 10-bit machine ID plus a 12-bit serial number, which can be generated in a single process using an AtomicLong. A 10-bit machine number supports 1024 machines

The advantages of this algorithm are:

  • Algorithm is simple, easy to implement, do not rely on any third party system, performance is very high;
  • The cluster is stateless and can be expanded or shrunk at will. Therefore, it can be considered as a high availability system.

The disadvantage is that:

  • The time stamp of high 10 bits and the sequence number of low self increment can be guaranteed to increase monotonically, but the machine number cannot be guaranteed. For example, if the machine number is 2, the id will be generated at a certain time, and the machine number is 1, the ID will be generated at the same time, then the monotonicity cannot be guaranteed.
  • Depending on the timestamp, duplicate ids may be generated if the clock is set back.

All in all, the Snowflake solution meets the basic requirements and is very high performance, but it has the clock back problem, so it is a high performance but not a high availability solution.

Database based scheme

Using the auto-increasing ID feature of the database, the scheme has the following advantages:

  • Implementation is relatively simple, only rely on the database;
  • No clock callback problem;
  • The generated ID is monotonically increasing.

Disadvantages:

  • The performance is limited by the database. The single write performance of the database is limited, and the capacity of the database cannot be expanded.
  • The database has a single point of failure, if it is a master/slave schema, depending on yes
    Asynchronous replication,
    Semi-synchronous replication,
    All synchronousReplication configuration, only
    Full synchronous replicationOther configurations cannot ensure the consistency of primary and secondary data. If the primary database is faulty and the changes in the primary database are not applied to the secondary database, duplicate numbers may occur after the primary and secondary switchover.

Similarly, the database here can also be replaced by Redis, using The INCR of Redis to achieve, but Redis only has asynchronous replication, which is more unable to ensure data consistency.

In general, a database-based solution is not a high availability solution if full synchronous replication is not enabled. If full synchronous replication is enabled, performance problems will occur (even if full synchronous replication is not enabled).

Number segment scheme based on database

This scheme is a kind of performance optimization of database schema, is not an id for each of the retrieved from a database, but a them roughly, in a separate process through the lock to ensure every time a unique id, you can even when the system is going to issue the number asynchronously to get the next them roughly, the scheme performance significantly higher than the database schema, However, the monotonically increasing ID feature is also lost. If full synchronous replication is enabled, it can be considered as a highly available solution. By increasing the length of the number segment, high performance can be achieved.

Database scheme based on multi-master database

This scheme is also an optimization of the database number scheme. Multiple databases are used. Suppose that the auto-increment ID of three primary databases is set to start with 1, 2, 3 and step size is set to 3, so that the auto-increment ID of database 1 is 1, 4, 7… , 2 database access since the id for 2,5,8… Select * from database 3 where id = 3, id = 6, id = 9… They will never repeat. The system uses a polling policy to obtain the number segment each time. If the number segment fails to be obtained from one database, the system continues to obtain the number segment from the next database. The solution solves the problem of high availability of the database. The breakdown of some databases does not affect the normal operation of the system. High performance is also solved through number segments. It is difficult to expand the database horizontally during the operation.

Conformance protocol based scheme

The high availability problem of the above database mainly comes from the inconsistency of master and slave data. If the consistency protocol is used to ensure the consistency of data, the high availability problem can be solved. The most commonly used raft algorithm can ensure that the data can be copied to more than half of the machines. After we get a number segment, the number segment that has been issued is persisted to more than half of the machines and eventually copied to all the machines, and raft re-elects when the master dies. The great How to Make a Reliable Transmitter takes this approach, using an ETCD component. It is even possible to implement a transmitter based on the open source RAFT library. If you want to implement a reliable RAFT protocol by yourself, it is still difficult. The open source RAFT library can choose SOFAJRaft.

conclusion

  • Number of
    A high performanceRely mainly on
    Them roughlyTo solve;
  • Number of
    High availabilityYou can rely on
    The databaseHigh availability, multiple master libraries,
    Conformance protocolTo implement.