In the last chapter, I learned how to generate the unique identification under the central organization from how the ID number is generated. This chapter will look at how to generate the unique identification when there is no central organization in the huge computer world.


Know the UUID

Let’s start with a well-known module called UUID, which seems to have an implementation in every language, and is even provided directly on some Unix systems.

What is a UUID?

Universally Unique Identifier is the full name of UUID. The UUID itself is composed of a set of 32-bit hexadecimal numbers, so the theoretical total of UUID is 16
32= 2
128Is approximately 3.4 x 10
38. In other words, if 1 trillion UUID is generated every nanosecond, it will take 10 billion years to use up all UUID. In other words, at most 10 billion years, UUID will inevitably repeat itself. However, the existence of a 10 billion year earth seems to be uncertain, so it is not necessary to consider the long term.

Representation of a UUID

UUID requires the same simple rules as a national id number. The standard form of UUID consists of 32 hexadecimal digits, hyphenated into five paragraphs, and 32 characters in the form 8-4-4-4-12, as follows:

xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx

M represents the UUID version. Currently, there are only five versions, that is, 1,2,3,4,5, and one to three most significant bits of N represent UUID variants. Currently, only 8,9, a, and b can occur.

Version evolution history of UUID

UUID based on time and space

The first version of uuid was a bit like the design of id numbers, looking for a central mechanism to solve the problem of uniqueness. In the Internet world, time can become much more precise, to the nanosecond, than in the macro world. However, the control of space is a difficult problem. Under the premise of the globalization of the World Wide Web, it is difficult to find an organization similar to the government to formulate a unified standard, so only the unique code (MAC address) carried by the computer when it is delivered can be used as the identification of space.
Ideally, each computer would have a unique MAC address, and each computer would perform an operation that generates a UUID at some point in time, which must be unique in the world. Similar to the intersection of horizontal and vertical lines in the two-dimensional world, if the horizontal represents time, time will never go back, and the vertical represents the machine, which will not perform two operations at the same time.

But the reality is not so, also from the vertical and horizontal direction to see.

1. Although computer for time accuracy is high, but all around the world in the corner of the case, the computer will not through a center for the current time, but to get the machine according to the internal themselves, that there will be a problem, the computer itself is calibrated after the clock is wrong time uuid generated after the same problems, but generally can be ignored, Time is a relative concept, as long as you keep a clock, there will be no problems. 2. MAC addresses are not unique. The first computer manufacturer is not the only one in the world, and even if a specification is agreed, it does not guarantee that the nic manufacturer will be error free to assign a unique MAC address to the nic. In addition, the computer is in the hands of the user, and the MAC address is on the user’s computer. If the user understands the principle of the computer, can he take the initiative to modify the MAC address? The answer is yes. 3. Run the UUID generation program at the same time. When two processes run a code that generates a UUID at the same time at the same point in time and at the same MAC address, the same UUID is generated.

The above from several perspectives to see the first version of the UUID generation will not be the only reason, but the occurrence of the above situation is still very small probability, so basically at present, the most reliable to ensure the global uniqueness of the implementation method, also because of this, the first version of the UUID in some pre-unique scenarios is very common.


Use the sample

Nodejs version

I looked through the source code of the UUID version, although it is used by many people, but the actual internal implementation does not take the MAC address of the machine, by the random number splicing.

const uuidv1 = require('uuid').v1;
const logger = console.log;
logger('uuid v1 version :%s', uuidv1());
// UUID V1 version: 10E10F40-bd02-11e9-B241-97AA7a999BECCopy the code

Python version

In python’s built-in UUID module, you do get the MAC address of the machine’s nic.

import uuid;
uuid.uuid1();
# UUID('e852b72e-ba4d-11e9-8e8e-acde48001122')Copy the code

Use the ifconfig command to check the MAC address of the network adapter.

From the previous two examples, we can see that the M bit is 1 and the N bit is within a,b,8,9, which conforms to the specification stated at the beginning of the UUID. The last 12-bit ACDE48001122 is the network card of my machine, which remains unchanged.

Security problems caused by exposing MAC addresses

A big problem with this version of UUID is that it contains the MAC address of the user. If each computer is bound to a user, the MAC address also corresponds to the user, which means that the exposure of the MAC address causes privacy and security problems.

Catch the virus maker with the UUID

In 1998, a computer virus was created by David L. Smith, an American, using Word’s macros. It was transmitted mainly by E-mail, usually with the subject line “This information is for you, do not let anyone see”. Once the recipient opened the E-mail, The virus automatically copies the same email to the top 50 friends in the user’s address book. Although the virus does not delete computer system files, it causes a flood of e-mails that clog and paralyze E-mail servers, causing considerable harm. Ultimately, the creator of the virus, David L. Smith, exposed the machine’s MAC information in the UUID used in the script. Finally, with the cooperation of the computer information center, determine its location and arrest.

Second, DCE UUID based on the first but more secure version

This version of uuid is based on the first version, which is also generated from a time plus space perspective, and is then modified by the internal implementation for security purposes. In the internal implementation, the UUID of this version is replaced by the local region number with the least significant 8 bits of the clock sequence, and the least significant 32 bits of the timestamp with an integer identifier that has meaning in the specified local region.
For this version, I have found the UUID package of Nodejs. The uUID package of Python does not implement v2 version, so THERE is no way to find the sample code. It seems that the UUID of this version is also used by very few people.


UUID based on MD5 hash algorithm

This version of the UUID is different from the above two versions from the perspective of the hashing algorithm, when you have the same input, you can get the same UUID result. Its internal implementation has two concepts, namespace and input content. When generating a UUID, you need to determine the namespace first, then connect the namespace to the input value, and finally use the MD5 hash function to complete the operation.

The default namespace

In the nodejs

// namespace predefined in nodejs uUID source code
generateUUID.DNS = '6ba7b810-9dad-11d1-80b4-00c04fd430c8';
generateUUID.URL = '6ba7b811-9dad-11d1-80b4-00c04fd430c8';Copy the code

Python:

# default predefined namespace in Python
import uuid
uuid.NAMESPACE_DNS  #UUID('6ba7b810-9dad-11d1-80b4-00c04fd430c8');
uuid.NAMESPACE_URL  #UUID('6ba7b811-9dad-11d1-80b4-00c04fd430c8');
uuid.NAMESPACE_X500 #UUID('6ba7b811-9dad-11d1-80b4-00c04fd430c8');
uuid.NAMESPACE_XX   #UUID('6ba7b811-9dad-11d1-80b4-00c04fd430c8');Copy the code


Version features:

1. Based on the same namespace, the UUID generated with different input values is not completely different, but may be the same to some extent.

2. In the same namespace, the UUID generated for the same input value is different.

3. UUID generated based on different namespaces must not be the same, of course I understand this is without MD5 collisions.

4. If two input values have the same UUID, they must be from the same input value in the same namespace.


Use the sample

Nodejs version

const uuidv3 = require('uuid/v3');
const logger = console.log;
logger('UUID V3 version :%s', uuidv3('myString', uuidv3.DNS))
// 21fc48e5-63f0-3849-8b9d-838a012a5936Copy the code

Python version

import uuid
uuid.uuid3(uuid.NAMESPACE_DNS, "myString")
# UUID('21fc48e5-63f0-3849-8b9d-838a012a5936')Copy the code

UUID based on random number

This version of the UUID is the most used, the essence of it is based on a random number or to generate pseudorandom number UUID, the biggest problem is the problem of the repetition rate, this type of UUID repetition rate can be calculated, so large long-term website or do not recommend using this version, when used for a long time after increasing the probability of repeated, More problems will be encountered.

A fairly good javascript-based implementation.

function uuidv4() {  
  return 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g.function(c) {   
     var r = Math.random() * 16 | 0, v = c == 'x' ? r : (r & 0x3 | 0x8);    
     return v.toString(16);  
  });
}Copy the code

Use the sample

Nodejs version

const uuidv4 = require('uuid/v4');
const logger = console.log;
logger('uUID V4 version :%s', uuidv4())Copy the code

Python version:

import uuid
uuid.uuid4()# UUID('1a9e40e2-3862-41d4-bd4e-0dd928e81055')Copy the code

Nodejs UUID V4 version source code analysis

Nodejs uUID package, v4 version implementation is relatively simple, you can also go to check. I’m going to cut out some of the code here and leave the main body behind.

// The official definition of randomBytes is to generate encrypted strong pseudo-random data. The size argument is a number indicating the number of bytes to be generated. // This generates a 16-byte strongly pseudorandom number that returns data of type buffer. var rng = require('crypto').randomBytes(16); // A utility function that generates a string of uUID by bytefunctionModule.exports = bytesToUuid(buf) {function v4() { var rnds = rng(); // The bit operator &: the ones bits of two numeric values are the same, and are 1 at the same time. If either value is 0, it is 0. / / an operator | : two as long as there is a 1, then the result is 1. Otherwise 0 / / UUID to M and N, M 4, after processing N as a, b, any value within 8 or 9 RNDS [6] = (RNDS [6] & 0 x0f) | 0 x40; rnds[8] = (rnds[8] & 0x3f) | 0x80;return bytesToUuid(rnds);
}Copy the code

UUID based on SHA1 hash algorithm

This version is similar to the UUID of version 3, but uses a different hashing algorithm. It uses SHA1 instead of MD5, the rest is the same as version 3, but it is more recommended than version 3.


Difference between SHA1 and MD5

First of all, both of them are hash functions. For SHA1, messages with a length less than 2^64 bits will generate a 160-bit message digest. The most significant and important difference between MD5 and SHA1 is that its digest is 32 bits less than SHA1 digest. Sha-1 is more powerful than MD5.

In Nodejs’ uUID implementation, the only difference between V5 and V3 implementations is the hash function.

/ / v3 version
crypto.createHash('md5').update(bytes).digest();
/ / the v5 version
crypto.createHash('sha1').update(bytes).digest();Copy the code


Use the sample

Nodejs version

const uuidv5 = require('uuid/v5');
const logger = console.log;
logger('UUID V5 version :%s', uuidv5('hello.example.com', uuidV5.dns) // UUID V5 version: FDDA765F-FC57-5604-a269-52a7df8164ecCopy the code

Python version

import uuid
uuid.uuid5(uuid.NAMESPACE_DNS, "hello.example.com")
#UUID('fdda765f-fc57-5604-a269-52a7df8164ec')Copy the code

References

[1] Nodejs的uuid:https://www.npmjs.com/package/uuid

[2]


The above content is their own summary, there will be mistakes or understanding bias, if there are questions, I hope you leave a message to correct, so as not to mistake people, if there are any questions please leave a message, we will try our best to answer it. If it’s helpful, don’t forget to share it with your friends or click “Watching” in the lower right corner. You can also follow the author, view historical articles and follow the latest developments to help you become a full stack engineer!