Author: Feng Weiyao

How does Git bring up the historical version of a file when we switch branches or switch to the specified historical commit record? In other words:

  1. Where does Git store data?
  2. How do I store different versions of files?
  3. How are different versions associated with the specified commit record?

Before we start, in order to avoid some ambiguity, let’s make a unified understanding of the commit record: we all know that each commit record has a unique 40-bit string, which is the hash value calculated by Git using the SHA-1 algorithm based on our commit content. This 40-bit string has a person id or hash or checksum or SHA-1 value. Since SHA-1 is a hash algorithm, let’s just call this string a hash.

How does Git store data

The main answer to how Git stores data is in the objects directory of the.git folder. Without explaining what the Objects directory is, let’s take a look at what’s in it. Now you can go to any Git repository and find the Objects folder in the.git directory (which is probably hidden by default). It should look something like this:

It contains a number of folders named after 2 characters, and a number of files named after a string of 38 characters. If you are sensitive enough, you can probably guess that this is a hash similar to specifying the commit record. Git cat-file to view the contents of this file. The result is as follows:

The result here is the contents of the Changelog.md file in my project. Rather, it is the complete contents of the Changelog.md file at the time of a single commit. The Objects folder is where Git stores the data.

Every time Git commits, it will find the changed file and use sha-1 algorithm to calculate a 40-bit string based on the content to name the file and place it in the Objects folder. The first two characters of the hash value are used to name subdirectories, and the remaining 38 characters are used as file names. That is, every time you make a change to your file, there is a snapshot that captures the contents of that version of the file. If Git wants to restore a file to a previous version of the file, all it needs to do is get the hash value of that version of the file.

But how does Git know what the hash value of the corresponding version of a file is at the time of a commit? How does Git correlate file versions with commit records?

In fact, there are three main types of information stored in the Objects folder. In addition to the file content we mentioned above, there is also file path information and submission information. They exist in the form of a blob object, a tree object, and a Commit object. Each object is a file in the Objects directory. Similar to the way information about the contents of a file is stored, Git uses the SHA-1 algorithm to compute a hash value based on the contents of the object, resulting in a 40-bit string that names the file and is used to locate it. Let’s take a look at what both the tree object and the submission object contain.

When you commit, Git will save your current directory structure. The corresponding tree object structure is as follows:

Contains the file name and the hash value of the file object, as well as the subfolder name and the corresponding tree object hash value. That is, you only need to find the tree object hash of the root directory of a commit to find the file name and file object hash of all files for that commit. You can get the historical version of the file.

So how do I find this tree object?

Through the submission object that we’re most familiar with. Git log command can find the corresponding submitted hash value, we can see the structure of the submitted object as follows:

Tree refers to the currently submitted tree object. Parent is the submission object from the last submission. The others are author, submitter, date and description.

Now I know what the historical version of a file is for a particular commit. This gives you an idea of how Git stores historical versions of files and relates them to commit records.

When you commit with Git commit, Git evaluates the hashes of each subdirectory and stores the checksums as tree objects in a Git repository. Git then creates a commit object that, in addition to the author, time, and description, contains a hash to the tree object (the project root) and the last commit. This way, Git can reproduce the saved snapshot as needed.

The branch of the git

Now we know that once we get the hash of a commit, we get a snapshot of that commit. If we want the file contents of a branch, how do we do that?

A branch is just a mutable pointer to the branch’s latest commit object.

In the refs folder in.git, you keep the hashes of the latest committed objects for all branches. The directory structure is as follows:

The Heads folder holds local branches and the Remotes folder holds remote branches. Take the local branch master as an example. Using cat Master, you can see that this file holds the hash value of the latest committed object. If you now make a new commit, you will find that the contents of the file will become the hash value of the new commit object.

Git also has a special reference HEAD that always points to the branch or commit object being checked out. The commits that HEAD points to are stored in the HEAD file in.git. As shown below, HEAD points to the master branch. If we manually check out the last commit from the Master branch, we will see that the file points to a certain commit object, which is the last commit object.

Finally, you can use a diagram to summarize the relationship between the branch and the submitted object.