preface

For time in computer systems, if you have ever thought about the following questions, but have no conclusion, then this article will give you a detailed answer:

  1. How did leap seconds come into being, and why did so many Linux servers go down after a leap second was inserted into UTC on June 30, 2012?
  2. How does a computer system keep its time right?
  3. How can computer systems that are often used in subtlety and even nanoseconds provide such high precision time?
  4. A computer system is a machine with no concept of time, so how does it calculate and manage time?

background

Time is a very abstract problem, which has attracted many great theologians, philosophers and physicists to spend their lifetimes trying to explain what the nature of time is. Fortunately, we only need to talk about time-dependent problems in computer systems, and we don’t have to worry about the universe, black holes, relativity and quantum mechanics. But although only limited in the computer this small category, this seemingly no longer complex topic, but not so simple.

The clock of a computer system

There are two main kinds of clocks in computer systems: the wall clock and the monotonic clock. They both measure time, but they are fundamentally different, and we’ll look at each one.

The clock on the wall

The wall clock is also known as clock time, as the name implies, and we usually use the same clock time, the form of date and time. The Linux wall clock is represented as the UTC time, which records the number of seconds and milliseconds (not including leap seconds) since 0 hour 0 minute 0 second on January 1, 1970. The logic behind leap seconds is that Linux uses THE UTC time, but the RECORDED UTC time does not include leap seconds. More on leap seconds later.

Time synchronization

By definition, the standard for a wall clock is defined outside the computer, so the need to ensure the accuracy of the wall clock becomes a problem. The clock inside the computer is a quartz clock, but it is not accurate enough to be too fast or too slow, depending on the temperature of the machine. So it’s impossible for a computer to maintain the accuracy of a wall clock by itself. At present, a common way to take the computer and NTP time server for regular synchronization over the network. Of course, this method is limited by the network environment, generally speaking, there will be at least 35 milliseconds of deviation, the maximum may be more than 1 second.

For some systems with high requirements on time accuracy, synchronization through NTP is not enough. Instead, standard wall clock is accepted through GPS receiver, and then synchronization through precise time protocol (PTP) inside the machine room. PTP is a kind of high precision time synchronization protocol, which can reach sub-microsecond accuracy and deviation accuracy of about 30 nanoseconds according to data. However, only when the nodes (switches) of the network support PTP can the synchronization of nanosecond magnitude be realized.

For time synchronization, Google takes a cooler approach, accepting a standard wall clock through a GPS receiver and then deploying atomic clocks in the machine room that can be accurate to one second every 20 million years to prevent GPS receiver failures. These time coordinators connect to a specific number of master servers, which in turn transmit time readings (TrueTime API) to other computers running throughout the Google network. Google implemented Spanner, the first scalable, globally distributed database, based on the above time-accuracy guarantee.

Leap seconds

There are two systems for measuring time: Universal time based on the Earth’s rotation (UT1), which measures time by the movement of the Earth on its rotation, but because the earth’s rotation rate is slowing, seconds vary slightly, by a few thousandths of a second per day. Atomic time is measured by the radiation frequency of the transition between the two hydine levels of the cesium atom in the microscopic world. It is very accurate, moving less than ten millionths of a second per day. As can be seen from the above, atomic time is a uniform scale for measuring time, but is independent of the position of the earth in space; Universal time is not a uniform measure of time, but it defines the earth’s rotation as one day and its revolution around the sun as one year, which is very important for People’s Daily life.

To unify the direct difference between atomic time and universal time, coordinated Universal Time (UTC) was created. From 0 o ‘clock on January 1, 1972, the coordinated universal time second length adopts atomic time second length, the difference between time and universal time time is kept within plus or minus 0.9 seconds, and adjusted by step 1 whole second if necessary. This one-full second adjustment is called a leap second (an extra second is a positive leap second, an extra second is a negative one). UTC, officially the international standard time since January 1972, is a combination of two time scales, atomic time and universal time.

Leap second processing

Since Linux records the number of seconds and milliseconds since 00:00.0 on January 1, 1970 AD, but does not include leap seconds, this means that in Linux there are 60 seconds per minute and 86,400 system-defined dead seconds per day. So Linux systems need extra logic to handle leap seconds.

Jump adjustment

When a positive leap second is inserted into UTC time, Linux skips one second because the leap second cannot be represented in Linux. When a negative leap second is inserted into UTC time, Linux systems need to insert one second because the leap second does not exist in Linux systems. Linux currently handles leap seconds this way. When a positive leap second was inserted at UTC on June 30, 2012, a massive Linux server kernel deadlock was caused by a deadlock bug triggered by some versions of the Linux system’s leap second processing logic.

The SLEW mode of the NTP service

The NTP service’s slew model doesn’t use jump-change timing, but rather incremental adjustments. When a positive leap second is inserted into UTC time, for example, the NTP service slowly adjusts the time by a certain number of ms per second. In this way, the Linux operating system will not be aware of the existence of leap seconds when synchronizing time from the NTP service, and the kernel does not need to enable the logic related to leap seconds.

Monotonic clock

Monotonic clock it always ensures that the time is forward and there is no problem with the wall clock going back. It is ideal for measuring duration, such as reading the value of the monotonic clock at one point in time and getting the value of the monotonic clock again after completing some work, with the difference of clock values being the time interval between the two checks.

But the absolute value of a monotonic clock doesn’t mean anything; it could be the number of nanoseconds the computer has experienced since it started, and so on. It is therefore meaningless to compare the values of monotonic clocks on different nodes.

Time management

The concept of time is somewhat fuzzy to computers, which must be aided by hardware to calculate and manage time. The quartz clock is used to do the system timer of the computer. The system timer automatically triggers the clock interrupt at a fixed frequency. Because the frequency of clock interrupts is programmed, the kernel knows the interval between two consecutive clock interrupts (this interval is called the beat). With clock interrupts, the kernel periodically updates the system’s wall clock and monotonic clock to count and manage time.

Accuracy of time

At present, the interrupt frequency of the system timer is 1000HZ, so the time precision that the computer can process is 1ms. However, many times more precise timing is required, such as 1 subtlety. How does the computer solve this problem?

Every time the computer starts up, the computer calculates the value of BogoMIPS. This value means the number of instructions executed by the processor in a given period of time. By using the value of BogoMIPS, the computer can get very, very small precision. For example, if the computer executes N instructions in 1 second, the accuracy of the computer can reach N times of a second. Obviously N is a very, very large number, so computers can get very, very accurate time.

conclusion

In this article, we’ve looked at how computer systems synchronize time, looked at what causes leap seconds and what Linux can do about them, looked at how Linux calculates and manages time, and looked at how Linux can improve time accuracy.

reference