Writing in the front
A deep understanding of the working principle of Git, combined with the scenarios and problems in practice, will make it easier to use Git or solve some difficult problems, and it is also more conducive to us to establish a reasonable Git workflow according to the actual scenarios of our teams or projects
Git Principle Analysis
What exactly is commit
Let’s start with a picture
A Commit object is a snapshot of a Git repository that contains all the files in the project at the time of a commit. You can see that a commit object inside Git actually points to a tree object, which is the root path of the project, and then recursively points to the tree and blob objects below, which are binary storage of individual files. One might wonder if each commit contains the entire repository, but if only one file is modified between two commits, aren’t all other files stored redundantly? Actually, of course not. Let’s look at another picture
Each version of git is actually a commit, and the dotted box file indicates that the file has not been modified in the different commit. Git’s internal storage of the same file (blob object) is always only one, and object references are used in different commits to refer to the same file
Explore the.git file
Have you ever wondered why, once you clone a project from a Git repository, you can now do various Git operations for teamwork? Git is actually a computer program, plus the data it needs to start working, and these data are all stored in the project. Git file. Below is the.git folder of a normal Git project
Let’s just focus on the Objects folder, CD inside
You can see a bunch of folders with two character names. Go to any CD (I found 00).
Git uses a 40-character hash to identify an object, which could be a commit, tree, or blob. Let’s use the cat command to look at the contents of the file
Git git git git git git git git git git git git git
This is actually the specific contents of a file in a release in the repository, in a commit. Let’s look at another file
The -t option of the cat-file command actually shows the type of the file, which is a tree object representing a directory
The contents of this tree object are other Tree and BLOb objects, which are subdirectories and files of this directory. Of course, we can also find all the commit, tree, and BLOb objects in the entire repository in the Objects folder. How does Git store all its files, and how do you organize the various versions of your repository (or commit)
Construct a COMMIT manually
The previous section should give you a general idea of git’s underlying storage, but not enough to give you a deep understanding of how git’s underlying workings work. Let’s take a look at how Git works by manually constructing a COMMIT through some low-level operations. First let’s construct an empty Git repository
Then we can watch the changes to the.git file in another window
There are no files in the repository yet, and there is no commit. Next, we will write something to the repository, but instead of creating a file directly, we will use some underlying git commands
First echo “Hello world!” String, and then pipe stdin (standard input) as input to git’s hash-object command
Git help hash-object git help hash-object git help hash-object git help hash-object
You can see that there is an extra file under Objects, cat
It is found to be garbled by using the cat command, and then we use cat-file
Discovering that it was a BLOb object, we did write a file to the Git repository and then looked at the file contents
It’s the string “Hello world!” that we typed with echo. . Now that you have a file in your git repository, use git log to log it
Git status is the repository’s current state
Again, nothing. At this point, we just write the file directly to the repository database using the underlying command, and this does not allow Git to manage the file for us. What do we need to do next? What do you need to do to write a file to Git? You first create and write a file, then add it to Git management with git Add, and commit it to your local repository with git commit. Let’s use git’s underlying commands to do this. Git help: update-index git help: update-index git
You can see that this command is used to update the index file, which is our stage area, the index of the staging file, and the git add file actually adds the file to the staging area and updates the index file. So next we use this command to add the file to the staging area and update the index file
The –add option adds a file to the staging area, the — cacheInfo option writes information directly to an index file, and 100644 under Unix represents a normal file (100) with its corresponding read/write permissions (644). We also specify a name “test.txt” for the text we just typed “Hello world!” File name of the. At this time we are checking the status of the warehouse
You can see that we have successfully created the test.txt file and git has recognized the new file. Let’s go back to the first picture
You can see that a COMMIT object points to a tree object, which is a directory, which in turn points to a BLOb object, which is a concrete file. Now that we have successfully created a Git-managed file, we also need to generate the Tree object. Git help > write-tree > git help > git help > write-tree > git help > git help > git
You can see that this command creates a tree object and writes to the index file, and then we execute this command
The output is a new hash that Git generated for the tree object that we wrote to. Look at cat-file
The tree object points to a blob object, which is the test.txt file. Take a look at the changes to the.git folder
The blob object in test.txt and the tree object in test.txt. Take a look at the current status of the warehouse
As you can see, the COMMIT has not yet been generated, so we have written the tree object to the repository, and now we need to write the COMMIT object. Git help git git help
You can use this command to write the Tree object (which in the current scenario represents the project root path, that is, the entire project) to the Git repository as a commit
Git generates another hash, which is the hash value of the commit. Take a look at the changes to the.git folder
Add a blob, a tree, and a commit object. See cat-file
You can see that the object type is COMMIT, and you can see that the commit object points to the tree, and contains auther, committer, commit Message, and so on.
A commit object has been successfully written to the git log. In fact, git repositories need to maintain a pointer to a commit so that when there are many commits on a branch, this pointer can be used to determine which version of the repository is currently in and which commit corresponds to the snapshot. This pointer is also stored in a.git file
Which is this HEAD file. Look at the cat
The HEAD pointer points to the refs/heads/master file. Take a look at this file
The refs/heads file is still empty, so we will now write a file with the HEAD pointer pointing to the COMMIT object we just created. There are two ways to do this. One is to create a file and edit it, but it is not recommended to change the HEAD pointer in this way. Instead, we use update-ref to update the HEAD pointer to the commit object
You can see that refs/heads produces a master file. Cat
The contents of the master file are the hash value of the commit that was just generated. (Master is actually the current branch, which is the default master branch of the Git repository.) Git log
Ok, the COMMIT has been successfully generated. Git status
It shows that test.txt was deleted because we wrote the file directly from the underlying command and the workspace does not have the file. Just checkout
Git status is ok. And you’re done!
conclusion
From the above instructions and a manual commit generation process, you should have a good understanding of the underlying git implementation. Git uses a “everything object” mode to control and manage the folders, files, and different versions of the repository by using the commit object containing the Tree directory object and the bloB file object. The different version snapshots are then captured by moving the pointer to COMMIT. This is the underlying core working principle of Git. After understanding this principle, all Git commands and operations, including complex workflow, practice of various scenarios and solution of difficult problems, can be based on the principle and get twice the result with half the effort.