Read the directory
- What does the document number mean
- What’s the difference from a unique ID
- Why is a globally unique bill number generator needed
- What are the ways of doing it
- I recommend the way
- conclusion
What does the document number mean
As a software system, we must be full of all kinds of documents everywhere, and we must have all kinds of document numbers corresponding to it. For example: e-commerce industry order number, payment serial number, refund number and so on. SCM purchase order number, purchase order number, shipment order number, inventory order number, etc. In an enterprise or a 2C platform, it is inevitable to communicate through a certain document number. So a good document number is necessarily easy to communicate, in simple terms, priority is easy to remember > good input > good look, of course, the shorter the better.
What is the difference between unique ID and unique ID
Some people may ask, it seems that the most heard is unique ID, including a large number of articles on distributed unique ID generation, seems to have little to do with the document number. But in fact, I think there is no conflict between the two, just different importance and different scenes. Let’s analyze it from different angles:
1) Uniqueness: The unique ID is actually more to ensure that the ID is unique in the whole system, and its definition of uniqueness is broader. As for the document number, it only needs to ensure that it is unique under the document type. For example, the order number: 00001 and the logistics number: 00001 do not affect each other.
2) Readability: The simplest and most crude way to use a unique ID is to use a UUID, because it is only used by programs and people do not need to understand the meaning of the ID. But the document number is different. As mentioned above, it needs to be readable to facilitate communication between people. Imagine what it would be like if you were on the phone with someone else to get a list of Uids.
3) Business: Document number in most cases also needs to reflect certain business meanings, such as T = Trade in order Number T00001, P = Pay in payment number P00001, etc. There may even be a certain correlation between multiple document numbers, for example, the relevant payment numbers under an order number T00001 must be P00001-1, P00001-2. Some scenarios even need to include some date information in them.
Three, why need global unique bill number generation program
Like unique ID, the generation of document number itself is also a relatively stable and general rule, so refining it into a separate program can provide better reusability and avoid the repetitive labor of maintaining document number in respective projects. Especially in the Internet industry, enterprises with heavy traffic also need to consider performance and high availability issues. So really want to generate the document number this “small function” to do a good job, or need a certain investment. So why not use it as a separate program to amplify the ROI of your investment?
Four, what are the ways to achieve
Here is a list of common implementations and their pros and cons:
1) Prefix column + global increment column
This is similar to the unique ID scheme, which uses the numbers in the increment column. And the easiest way to do this is to rely on the database’s increment column.
Advantages:
Simple implementation, continuous ++ can ensure global uniqueness can ensure increasing readability
Disadvantages: Need to rely on a persistent place to store the current generated “cursor” position, so the performance has an upper limit, basically the TPS upper limit of single application or TPS upper limit of DB is easy to leak some commercial information on some external document numbers. For example, competitors can guess your order quantity every day or even every hour or every minute through the order number.
Improvements to eliminate single points:
① Horizontal split multi-write + synchronous length (example: machine 1 autoincrement is 1, 4, 7… ; The autoincrement of machine 2 is 2, 5, 8… ; The autoincrement of machine 3 is 3, 6, 9…) :
New disadvantage: because it is multi-write, it needs to rely on load balancing policy and network communication delay. It cannot guarantee that the generated serial number is 100% increasing. (For example, even if the round Robin policy requests 1 and then 2, it is still possible for 2 to return the response first.)
② Vertical split multi-write + autoincrement column (machine 1 is specially used to generate order number, machine 2 is specially used to generate payment order number) :
New disadvantages:
A. Due to the distribution according to business, there is a single point of bottleneck for some documents with large requests due to uneven flow.
B. Poor scalability. Each additional business document requires an additional procedure
③ Horizontal split + increase machine code point (assign a number to each program that generates the document number: 1, 2, and 3 to the front of the autoincrement column) :
New disadvantages:
A. This code is either hardcoded into a configuration file or depends on a separate program with an assigned number. And the number length is longer.
B. Increments cannot be guaranteed.
Improvements to improve performance:
(1) Pre-generate to cache, reduce dependence on DB
New disadvantages:
A. If the dependence on DB needs to be completely reduced, then every time the document number is consumed, it should not be written back to DB, which also leads to a large serial number hole once the program restarts.
B. The cache size is negatively correlated with the frequency at which DB obtains the next cache data. When the frequency is high, double caching is required to preload the next cache data to avoid the blocking caused by pulling the latest data from DB after the cache is exhausted.
2) Prefix column + date + increment column:
I think this is the solution that most systems will adopt. The precision of this date is related to the data length of the increment. The higher the date precision, the shorter the data length required for autoincrement, and vice versa.
Advantages:
This is easier to implement to ensure uniqueness and to ensure that the incremental include date represents more business information
Disadvantages: The disadvantages of Option 1 are all present
Resetting the date increment column requires a certain amount of logical judgment, increased complexity (thread safety issues in multi-threading), and reduced performance.
Improvements to eliminate single points:
Improvement scheme in ① 1).
Improvements to improve performance:
Improvement scheme in ① 1).
② The reset of the increment column can ignore the date change (that is, even if the next time period, the increment does not reset, but directly ++ integer, until the next loop automatically entered. In C#, you can write:
var uint32 = (long)UInt32.MaxValue; Interlocked.Add(ref uint32, 1); Console.WriteLine((UInt32)uint32);Copy the code
It is important to note, however, that the numeric upper limit of the increment column must be such that no duplicates occur within the minimum precision of the date.
New disadvantages:
A. Even if the number of requests is not large, it will generate an excessively long document number, because the increment will not be reset actively.
Five, the author recommended the way
I personally think that, taken together,
Add machine code points (give each program that generates the bill number a number: 1, 2, and 3 to the front of the increment column)
This plan is relatively the most permanent. But there is a trade-off between data length and readability. Firstly, in order to ensure incremental increase, we must increase the time to the front of the whole document number. You can either use a regular date format or you can use a timestamp, which is shorter for the same accuracy. Considering that in most of the actual scenarios, as long as the document number can be identified as the type of document, the remaining documents generally need to find the detailed information of the document in the corresponding document list. So it’s not really that readable. (For example, when a customer gives an order number to our customer service staff, the customer service staff will need to check the details of the order.)
OK, then we can design its length like this:
The timestamp and increment are shared globally, so the single document number of a certain type is not continuous, but is increasing trend, which solves the problem of guessing the order quantity according to the order number.
So what is the upper limit that can support the non-repetition of document number under such a design? In fact, the maximum number of single dots in 1 second is 100000000/1000 = 100000/ms, 10W in 1 millisecond. Take snowflake’s generation speed of 4000/ms as the calculation (network source, not verified), and consider the impact of CPU upgrade according to Moore’s Law. It takes about 50 years before duplication is possible. And the theoretical maximum is 100 programs load balancing, 1000W/ms, estimated that this life do not have to worry about the problem of repetition.
Some people may ask, why not directly timestamp to the millisecond bit, will increase the length of 3 bits, then increment can be shorter. In terms of having one more bit of redundancy than Snowflake, it still takes 5 bits to get to the millisecond of the timestamp (Snowflake is 4 bits: 4000/ms), so it doesn’t make a difference. So what’s the advantage of getting to seconds? I think there are two points:
1) Solved the preloading problem, because the accuracy is to second, so even if the program restarts, my increment from 0 will not be repeated.
2) If the precision is millisecond, that means that no matter how many requests I have per second, even if only one request comes in every second, it will take up 3 bits. But if it is seconds, then these three bits are omitted. I think except for companies with the size of Alibaba and Tencent, in the actual environment, millisecond concurrency of 1W is no longer acceptable.
Some of the details are:
1. If the machine code is a bit, add 0 in front of it to avoid duplication after combining with the auto-increment column behind (example: machine 1, serial number 11). And machine 11, serial number 1 is repeated).
2. Clock synchronization on each application server needs to be done well, because we rely on this to ensure incremental problems.
In the end, the number length in the actual production environment ranges from 15 to 19.
Six, the concluding
A well-designed documents, not only can be used for the primary key, depots table can also be used to do, such as we put the user ID according to certain rules of several digital spell to the end of the document number, then use this number to locate the database directly, can ensure that a user’s order all fell on a the same database.
But it is worth reminding that we can not blindly pursue one step in place, we need to combine their own actual situation to choose the right way. Some of the common solutions listed earlier also work well in the early days of the system.
By Zachary_Fan www.cnblogs.com/Zachary-Fan…
If you want to be notified of your own articles, please scan the qr code below.
Pay attention to my
Collect the paper
Talk about the significance of crossover in software development
» Next up:
Empathy is a bridge to a successful architecture