The preface

I was not exposed to VCS (version control system) when I was in college. Although I have published past projects in Google Code, I have published them in the form of compressed packages. I didn’t use it when I worked with my roommate to develop the course design for my computer networking course. I didn’t really use Git until I started working at my first company, and I haven’t used another VCS since then — I just heard SVN was not used — and I’ve been using Git for over six years.

Although it’s not a problem for everyday use, I still have a fairly rudimentary understanding of the inner workings of Git — and I’m not being modest, I just don’t understand it. For example, when using git add, git commit, git branch, etc., what is git doing behind the scenes, I can’t answer. Fortunately, there’s a lot of information out there on the Internet, and after braving a lot of documentation and blogs, I was able to scratch the surface.

Now, let me try to go through this step by step.

What happens when you add Git?

Start by creating a repository and adding a file to it

mkdir git-test
cd git-test
git init
echo 'hello' > a
git add .
Copy the code

For now, do not commit changes. Now, let’s take a look at what Git is doing behind the scenes. Git’s secret lies in directories called.git, especially the objects directory. Using the tree command to view the directory results are as follows

The git/objects ├ ─ ─ ce │ └ ─ ─ 013625030 ba8dba906f756967f9e9ca394464a ├ ─ ─ the info └ ─ ─ packCopy the code

More, compared with before running git add a directory called ce, and is located in the file named 013625030 ba8dba906f756967f9e9ca394464a. This file is actually a “copy” of A, where the contents of file A are stored. But you can’t use cat to view the file directly, because Git has compressed the file. You can use Pigz to get the original text before compression, as shown in the following example code

pigz -d < .git/objects/ce/013625030ba8dba906f756967f9e9ca394464a
Copy the code

The results for

blob 6hello
Copy the code

Git’s rules for generating this file aren’t complicated. First Git calculates the length of the original file, which is 6 (6 because a newline was added when the echo and redirection were used to write file A). Git then concatenates a fixed prefix blob (there is a space here), the length of the file, a null character (an ASCII 0 character), and the contents of the file into a string and computs a SHA1 summary of the string. In the case of file A, you can try the calculation using the following command

printf "blob 6\0hello\n" | shasum
Copy the code

Or it’s easier to use Git’s built-in hash-object subcommand

git hash-object a
Copy the code

Either a command, the calculated are ce013625030ba8dba906f756967f9e9ca394464a. Git will then take the first two characters (CE) as the directory name and create a new directory under.git/objects. To start from the third character of the residual content ba8dba906f756967f9e9ca394464a (013625030) as the file name, will have stitching good content writing such as file after compression. This file is called a blob object in Git terminology, and you’ll also encounter objects of type Tree and Commit later.

What happens when git commit?

Now I’m going to commit my changes

git config user.email 'foobar'
git config user.name 'foobar'
git commit -m 'test'
Copy the code

Two new files have been added to the. Git /objects directory

The git/objects ├ ─ ─ 09 │ └ ─ ─ 76950 c1fdbcb52435a433913017bf044b3a58f# new├ ─ ─ 14 │ └ ─ ─ c77e71bd06df41e1509280cfba045e1db2aa5f# new├ ─ ─ ce │ └ ─ ─ 013625030 ba8dba906f756967f9e9ca394464a ├ ─ ─ the info └ ─ ─ packCopy the code

You can use git cat-file-t to check the types of these two new files

Git cat - # 14 c77e71bd06df41e1509280cfba045e1db2aa5f output file - t commit git cat - file - t 0976950 # c1fdbcb52435a433913017bf044b3a58f output treeCopy the code

You can also use git cat-file -p to print the contents of the new file in a readable way. For example use git cat file – p – 0976950 c1fdbcb52435a433913017bf044b3a58f output the contents of the tree types of objects, the result is zero

100644 blob ce013625030ba8dba906f756967f9e9ca394464a	a
Copy the code

An object of type Tree contains meta information about the file tracked by Git, including the file’s permissions, the type of object in Git, the object summary, and the file name. Another commit object contains the commit information. Using git cat-file -p, you can see the following results

tree 0976950c1fdbcb52435a433913017bf044b3a58f author foobar <foobar> 1576676836 +0800 committer foobar <foobar> 1576676836 + 0800test
Copy the code

The first line indicates which tree the commit object points to. From this tree, you can walk through all the files in the repository tracked by Git up to this commit. The COMMIT points to a tree. The tree can point to a BLOB or to another tree. The BLOB is like a leaf node in the tree, and does not point to any other object

What happened to Git Branch?

The Git branch subcommand is used to create new branches — although I use Git checkout -b more often than not. Since Git creates blob, tree, and commit objects when adding and committing, does Git also create branch objects when creating new branches? The answer is no.

Git branches are very simple — they’re just references to a commit object, like symbolic links in * Nix systems. All branches are stored under.git/refs/heads. For example, the.git/refs/heads/master file stores the most recently committed abstracts on the master branch

The git/refs/heads/master # 14 c77e71bd06df41e1509280cfba045e1db2aa5f outputCopy the code

That’s why it’s so cheap to create a new branch in Git — it’s just a copy of the same name as the current branch under.git/refs/heads. I create a new branch develop and commit a new file b,.git/objects will have three more files

git checkout -b develop
echo 'good' > b
git add b
git commit -m 'new branch'
Copy the code

The three new files store the contents of file B (a BLOb object), the meta information of file B (a tree object), and the commit (a COMMIT object). These files don’t have any information about the Develop branch. The Develop branch is just a file with the same name that exists in.git/refs/heads/.

What happens when Git merge a child?

Git does a fast-forward merge when the Develop branch forks from the master and merges the develop branch back to the master. Git /refs/heads/master file with the same abstract as develop.

You can also ask Git not to use fast-forward. Git reset –hard HEAD^1 back up the master branch to the first commit, and then merge develop again with the following command

git merge --no-ff develop
Copy the code

This time, Git will create a new commit object instead of simply modifying the.git/refs/heads/master file. On my computer, this new commit object in order to d1403bb629c7a636c724069b22875ed882b54bcc, use the git cat – file – p look at its content

tree e960ed43b8e6b5fe9b4e57b806f70796da820056
parent 14c77e71bd06df41e1509280cfba045e1db2aa5f
parent db891542d3e44448433ba86c7cd636d8aec3da54
author foobar <foobar> 1576679608 +0800
committer foobar <foobar> 1576679608 +0800

Merge branch 'develop'
Copy the code

Interestingly, this commit object has two “parent” commits, rather than a single “parent” as is often the case with tree data structures. Obviously, the two parent nodes are the last commit of the master branch before the merge, and the latest commit of develop.

Although a new commit object is created, the latest commit of the Develop branch holds the latest version of the repository, so there is no need to create a new tree. It is sufficient that the resulting COMMIT from the merge directly shares the same tree object as the latest Commit from the Develop branch — the summary in the first line of the output above is the summary of the tree to which the latest Commit from the Develop branch points.

So far, finally solved a puzzle I have been. I naively thought that when Git merged two branches, it copied all the extra changes from the branch to the one to be merged. This is because I didn’t understand the nature of branches. A Branch of Git is not a pipe, and no commit should be placed in a particular branch. Git merges are made in an immutable tree by creating only a few new COMMIT and tree objects and referencing existing ones. Otherwise, how can the two branches be merged quickly.

Afterword.

After several experiments, I know a little bit about the core principles of Git, so I don’t plan to go further. If you’re interested, try creating a conflicting merge and see what happens in the.git/objects directory after the conflict is resolved.

Finally, in the process of exploring the principles of Git, I found a number of excellent references, which I enclose here:

  1. Nfarina.com/post/986851…
  2. Maryrosecook.com/blog/post/g…
  3. Www-cs-students.stanford.edu/~blynn/gitm…
  4. Git-scm.com/book/en/v2/…

Read the original