Git is used every day, but there is always a question in mind. If information is lost or files are damaged in the process of transmission, how does Git find out?
1.Git official notes
In fact, all data in Git is computed and referred to as a checksum before being stored. This means that it is impossible to change any file content or directory content without Git’s knowledge.
The mechanism Git uses to compute checksums is called sha-1 hashes.
This is a string of 40 hexadecimal characters (0-9 and a-f) calculated based on the contents or directory structure of the files in Git.
Sha-1 hashes look like this:
24b9da6552252987aa493b52f8696cd6d3b00373
The information stored in a Git database is indexed by the hash value of the file’s content, not the file name.
So the question is, what is a checksum? What is SHA-1? 🤔
2. What is checksum
Take a look at wikipedia’s definition:
Checksum (English: Checksum) is a form of redundancy Checksum.
It is a simple way to check the integrity of data transmitted through space (such as communication) or time (such as computer storage) through error detection.
Common checksum methods in computer field include cyclic redundancy check (CRC), MD5, SHA family and so on.
The actual process of generating a checksum is typically to input a given amount of data into a checkfunction or checksum algorithm, and a good checksum algorithm will typically output a significantly different value for input data that has been slightly modified.
So what Git does is calculate the checksum based on the contents of the file or the structure of the directory, and only when the checksum is exactly the same can you prove that the file or directory has not been lost (because any small change will result in completely different checksum results).
3.SHA-1
Sha-1 is a commonly used algorithm for calculating checksums. Other commonly used algorithms include MD5 and SHA-256.
4. How do I view the checksum of files
The CRC, MD5, SHA1, and SHA256 values are obtained in the same way for Linux and Mac:
Cksum # CRC md5sum # MD5 shasum # SHA1 # 1 (default), 224, 256, 384, 512, 512224, 512256 shasum -a 256 file name #Copy the code
Try 👨🏻💻 :
As shown in the figure:
-
The first operation used the shasum command to get the SHA1 value of a file named test. TXT on my computer.
-
The second operation changes the file name to test1.txt. In this case, the SHA1 value is the same as that obtained in the first operation. Therefore, changing the file name does not change the checksum.
-
The third operation modifies the contents of the test1.txt file, and this time the SHA1 value is changed.
Therefore, we know that only identical files will have the same checksum. Changing anything other than the filename will result in a different checksum.
5. Common application scenarios of checksum
-
Check whether the downloaded files are damaged
-
Check whether downloaded files have been maliciously replaced