Welcome to github.com/hsfxuebao/j… , hope to help you, if you feel ok, please click on the Star

1. Lower-layer commands and upper-layer commands

Git subcommands include checkout, branch, remote, and other Git subcommands. However, because Git was originally a toolset for a version control system, rather than a complete, user-friendly version control system, it also includes a subset of subcommands for doing low-level work. These commands are designed to be linked together in UNIX command-line style or invoked by scripts to do the job. This part of the command is often called a “plumbing” command, while the friendlier command is called a “porcelain” command.

In this chapter, however, we will focus on low-level commands. Because the underlying commands give you a peek into Git’s inner workings, they also help explain how Git does its job and why it works the way it does. Most low-level commands are not intended for the end user: they are better suited as components of new tools and as part of custom scripts.

When you execute git init in a new or existing directory, git creates a.git directory. This directory contains almost everything Git stores and manipulates. If you want to back up or copy a repository, simply copy the directory to another location. Everything discussed in this chapter is located in this directory. The typical structure of a newly initialized.git directory is as follows:

$ ls -F1
config
description
HEAD
hooks/
info/
objects/
refs/
Copy the code

Depending on the Git version, there may be additional content in that directory. But for a new Git init repository, this is the default structure you’ll see. The Description file is for use only by GitWeb applications, so we don’t need to worry about it. The config file contains project-specific configuration options. The info directory contains a global exclude file that places ignored patterns that do not want to be recorded in the.gitignore file. The hooks directory contains client-side or server-side hook scripts, a topic discussed in detail in Git hooks.

The remaining four items are important: the HEAD file, the index file (yet to be created), and the Objects and refs directories. They are all core components of Git. The Objects directory stores all data content; The refs directory stores Pointers to submitted objects for data (branches, remote repositories, labels, and so on); The HEAD file points to the branch currently checked out; The index file holds the staging area information. We’ll examine each of these four sections in detail to understand how Git works.

Git objects (commit, tree, blob)

Git is a content-addressing file system, which sounds pretty cool. But what does that mean? This means that at the heart of Git is a simple key-value data store. You can insert any type of content into a Git repository, and it returns a unique key that can be retrieved again at any time.

You can demonstrate this effect with the underlying git hash-object command, which stores arbitrary data in the.git/ Objects directory (that is, the object database) and returns a unique key to that data object.

First, we need to initialize a new Git repository and confirm that the Objects directory is empty:

$ git init test
Initialized empty Git repository in /tmp/test/.git/
$ cd test
$ find .git/objects
.git/objects
.git/objects/info
.git/objects/pack
$ find .git/objects -type f
Copy the code

You can see that Git initialized the Objects directory and created the Pack and info subdirectories, both of which are empty. Next, we create a new data object with git hash-object and store it manually in your new Git database:

$ echo 'test content' | git hash-object -w --stdin
d670460b4b4aece5915caf5c68d12f560a9fe3e4
Copy the code

In its simplest form, a Git hash-object accepts whatever you pass to it, and it only returns a unique key that can be stored in a Git repository. The -w option instructs the command not only to return the key, but also to write the object to the database. Finally, the –stdin option instructs the command to read from standard input; If this option is not specified, the path of the file to be stored must be specified at the end of the command.

This command outputs a 40-character checksum. This is a SHA-1 hash — a sha-1 checksum that combines the data to be stored with a header. This header information is discussed briefly later. Now we can see how Git stores data:

$ find .git/objects -type f
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
Copy the code

If you look again in the Objects directory, you’ll find a file that corresponds to the new content. This is how Git originally stored content — each file corresponds to one piece of content, and the file is named after the sha-1 checksum of that content along with certain header information. The first two characters of the checksum are used to name subdirectories, and the remaining 38 characters are used as file names.

Once you have stored your content in the object database, you can retrieve the data from Git using the cat-file command. This command is like a Swiss Army knife for dissecting Git objects. Specifying the -p option for cat-file instructs the command to automatically determine the type of content and show us roughly what it is:

$ git cat-file -p d670460b4b4aece5915caf5c68d12f560a9fe3e4
test content
Copy the code

At this point, you’ve learned how to put things in Git and how to take them out. We can also apply these actions to the contents of a file. For example, you can do simple version control on a file. First, create a new file and store its contents in the database:

$ echo 'version 1' > test.txt
$ git hash-object -w test.txt
83baae61804e65cc73a7201a7252750c76066a30
Copy the code

Next, write the new content to the file and store it to the database again:

$ echo 'version 2' > test.txt
$ git hash-object -w test.txt
1f7a7a472abf3dd9643fd615f6da379c4acb3e3a
Copy the code

The object database records two different versions of this file, and of course the first thing we saved earlier is still there:

$ find .git/objects -type f
.git/objects/1f/7a7a472abf3dd9643fd615f6da379c4acb3e3a
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
Copy the code

You can now delete the local copy of test.txt and use Git to retrieve the first version from the object database:

$ git cat-file -p 83baae61804e65cc73a7201a7252750c76066a30 > test.txt
$ cat test.txt
version 1
Copy the code

Or the second version:

$ git cat-file -p 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a > test.txt
$ cat test.txt
version 2
Copy the code

However, it is not practical to remember the SHA-1 value for each version of a file; Another problem is that in this (simple version control) system, the file name is not saved — we only save the contents of the file. These types of objects are called blob objects. Git cat-file -t git cat-file -t git cat-file -t git cat-file -t git cat-file -t git cat-file -t git cat-file -t

$ git cat-file -t 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a
blob
Copy the code

The tree object

The next Git object type to explore is a Tree object, which solves the problem of filename preservation and allows you to group multiple files together. Git stores content in a manner similar to UNIX file systems, but with some simplification. Everything is stored as tree objects, which correspond to UNIX directory entries, and data objects, which roughly correspond to inodes or file content. A tree object contains one or more tree entries. Each entry contains a SHA-1 pointer to a data object or subtree object, and the corresponding mode, type, and file name information. For example, the latest tree object for a project might look like this:

$ git cat-file -p master^{tree}
100644 blob a906cb2a4a904a152e80877d4088654daad0c859      README
100644 blob 8f94139338f9404f26296befa88755fc2598c289      Rakefile
040000 tree 99f1a6d12cb4b6f19c8655fca46c3ecf317074e0      lib
Copy the code

The master^{tree} syntax represents the tree object to which the latest commit on the Master branch points. Note that the lib subdirectory (the corresponding tree object record) is not a data object, but a pointer to another tree object:

$ git cat-file -p 99f1a6d12cb4b6f19c8655fca46c3ecf317074e0
100644 blob 47c6340d6459e05787f644c2447d2595f5d3a54b      simplegit.rb
Copy the code

Note

You may encounter errors using the master^{tree} syntax in some shells.

In Windows CMD, the character ^ is used to escape, so you must double-write it to avoid problems: git cat-file -p master^^{tree}. Git cat-file -p ‘master^{tree}’ git cat-file -p ‘master^{tree}’ git cat-file -p ‘master^{tree}’ git cat-file -p ‘master^{tree}’

Git cat-file -p “master^{tree}” git cat-file -p “master^{tree}” git cat-file -p “master^{tree}”

Conceptually, Git stores data that looks something like this:

Figure 149. Simplified Git data model.

You can easily create your own tree objects. Git typically creates and records a tree object based on the state represented by the staging area at a given time, so that a sequence of tree objects can be recorded by repetition. Therefore, to create a tree object, you first need to create a staging area by staging some files. You can create a staging area for a single file — the first version of our test.txt file — with the underlying command git update-index. Using this command, you can artificially add the first version of the test. TXT file to a new staging area. You must specify the –add option for the above command because the file was not in the staging area (we haven’t even created one yet); Also required is the — cacheInfo option, because the files to be added are in the Git database, not in the current directory. Also, you need to specify the file mode, SHA-1, and filename:

$ git update-index --add --cacheinfo 100644 \
  83baae61804e65cc73a7201a7252750c76066a30 test.txt
Copy the code

In this case, we specify file mode 100644, indicating that this is a normal file. Other options include: 100755, which represents an executable; 120,000, that’s a symbolic link. The file schemas here refer to common UNIX file schemas, but are much less flexible — these three schemas are all legal schemas for Git files (i.e., data objects) (there are others, of course, but for directory entries and submodules).

You can now write the staging contents to a tree object using the git write-tree command. There is no need to specify the -w option — if a tree object does not already exist, when this command is invoked, it automatically creates a new tree object based on the current staging state:

$ git write-tree
d8329fc1cc938780ffdd9f94e0d364e0ea74f579
$ git cat-file -p d8329fc1cc938780ffdd9f94e0d364e0ea74f579
100644 blob 83baae61804e65cc73a7201a7252750c76066a30      test.txt
Copy the code

Git cat-file = git cat-file = git cat-file

$ git cat-file -t d8329fc1cc938780ffdd9f94e0d364e0ea74f579
tree
Copy the code

Next we create a new tree object that contains the second version of the test.txt file and a new file:

$ echo 'new file' > new.txt
$ git update-index --add --cacheinfo 100644 \
  1f7a7a472abf3dd9643fd615f6da379c4acb3e3a test.txt
$ git update-index --add new.txt
Copy the code

The staging area now contains a new version of the test.txt file, and a new file: new.txt. Record the directory tree (recording the current staging state as a tree object) and then observe its structure:

$ git write-tree
0155eb4229851634a0f03eb265b69f5a2d56f341
$ git cat-file -p 0155eb4229851634a0f03eb265b69f5a2d56f341
100644 blob fa49b077972391ad58037050f2a75f74e3671e92      new.txt
100644 blob 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a      test.txt
Copy the code

We notice that the new tree object contains two file records, and that the SHA-1 value of test.txt (1f7a7a) is the “second version” of the previous value. Just for fun: you can add the first tree object to the second to make it a subdirectory of the new tree object. Git read-tree reads tree objects into staging by calling git read-tree. In this case, an existing tree object can be read into the staging area as a subtree by specifying the –prefix option with this command:

$ git read-tree --prefix=bak d8329fc1cc938780ffdd9f94e0d364e0ea74f579
$ git write-tree
3c4e9cd789d88d8d89c1073707c3585e41b0e614
$ git cat-file -p 3c4e9cd789d88d8d89c1073707c3585e41b0e614
040000 tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579      bak
100644 blob fa49b077972391ad58037050f2a75f74e3671e92      new.txt
100644 blob 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a      test.txt
Copy the code

If you create a working directory based on this new tree object, you will find that the root of the working directory contains two files and a subdirectory named bak that contains the first version of the test.txt file. Git can be thought of as storing internal data to represent the above structure:

Figure 150. Current Git data content structure.

Submit the object

Once you’ve done all that, you now have three tree objects that represent the different project snapshots we want to track. However, the problem remains: to reuse these snapshots, you must remember all three SHA-1 hashes. Also, you have no idea who saved the snapshots, when they were saved, or why. This is the basic information that a Commit Object can store for you.

You can create a commit object by calling the commit-tree command, specifying the SHA-1 value of a tree object and the parent commit object (if any) of the commit. We’ll start with the first tree object we created earlier:

$ echo 'first commit' | git commit-tree d8329f
fdf4fc3344e67ab068f836878b6c4951e3b15f3d
Copy the code

You will now have a different hash value due to the different creation time and author data. Replace the hash values for the submissions and labels in the rest of this chapter with your own checksum. Git cat-file can now be used to view the newly committed object:

$ git cat-file -p fdf4fc3
tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579
author Scott Chacon <[email protected]> 1243040974 -0700
committer Scott Chacon <[email protected]> 1243040974 -0700

first commit
Copy the code

The format of the submission object is simple: it first specifies a top-level tree object that represents the current project snapshot; Then there are possible parent commits (the commit object described earlier does not have any parent commits); Then author/submitter information (based on your user.name and user.email configuration, plus a timestamp); Leave a line blank and submit the comment at the end.

Next, we’ll create two more commit objects that reference the previous commit (as its parent) :

$ echo 'second commit' | git commit-tree 0155eb -p fdf4fc3
cac0cab538b970a37ea1e769cbbde608743bc96d
$ echo 'third commit'  | git commit-tree 3c4e9c -p cac0cab
1a410efbd13591db07496601ebc7a059dd55cfe9
Copy the code

Each of these three commit objects points to one of the three tree object snapshots created earlier. Now, if you run git log on the last committed sha-1 value, you might be surprised to find that you have a real Git commit history that can be viewed by git log:

$ git log --stat 1a410e
commit 1a410efbd13591db07496601ebc7a059dd55cfe9
Author: Scott Chacon <[email protected]>
Date:   Fri May 22 18:15:24 2009 -0700

	third commit

 bak/test.txt | 1 +
 1 file changed, 1 insertion(+)

commit cac0cab538b970a37ea1e769cbbde608743bc96d
Author: Scott Chacon <[email protected]>
Date:   Fri May 22 18:14:29 2009 -0700

	second commit

 new.txt  | 1 +
 test.txt | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

commit fdf4fc3344e67ab068f836878b6c4951e3b15f3d
Author: Scott Chacon <[email protected]>
Date:   Fri May 22 18:09:34 2009 -0700

    first commit

 test.txt | 1 +
 1 file changed, 1 insertion(+)
Copy the code

Amazing: Just now, you’ve created a Git commit history with just a few low-level actions without any upper-level commands. This is what Git does every time we run git add and git commit, essentially saving the overwritten file as a data object, updating the staging area, logging the tree object, and finally creating a commit object that specifies the top-level tree object and the parent commit. The three main Git objects — data objects, tree objects, and commit objects — were originally stored as separate files in the.git/ Objects directory. Here is a list of all the objects in the current sample directory, with comments on what they hold:

$ find .git/objects -type f .git/objects/01/55eb4229851634a0f03eb265b69f5a2d56f341 # tree 2 .git/objects/1a/410efbd13591db07496601ebc7a059dd55cfe9 # commit 3 .git/objects/1f/7a7a472abf3dd9643fd615f6da379c4acb3e3a  # test.txt v2 .git/objects/3c/4e9cd789d88d8d89c1073707c3585e41b0e614 # tree 3 .git/objects/83/baae61804e65cc73a7201a7252750c76066a30 # test.txt v1 .git/objects/ca/c0cab538b970a37ea1e769cbbde608743bc96d # commit 2 .git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4  # 'test content' .git/objects/d8/329fc1cc938780ffdd9f94e0d364e0ea74f579 # tree 1 .git/objects/fa/49b077972391ad58037050f2a75f74e3671e92 # new.txt .git/objects/fd/f4fc3344e67ab068f836878b6c4951e3b15f3d # commit 1Copy the code

If you trace all the internal Pointers, you get an object diagram like the following:

Figure 151. All reachable objects in your Git directory.

Object storage

As mentioned earlier, all objects you submit to Git repository will have a header saved along with them. Let’s take a moment to look at how Git stores its objects. Demonstrated interactively in the Ruby scripting language, you will see a data object — in this case the string “What is up, doc?” — how it’s stored.

Ruby interactive mode can be started with the irB command:

$ irb
>> content = "what is up, doc?"
=> "what is up, doc?"
Copy the code

Git first constructs a header that begins with the type of object it recognizes, in this case a “blob” string. Git then adds a space to the first part of the header, followed by the number of bytes of the data content, and finally a null byte:

>> header = "blob #{content.length}\0"
=> "blob 16\u0000"
Copy the code

Git will concatenate the header information with the original data and calculate the SHA-1 checksum of the new content. In Ruby, you can calculate the SHA-1 value by importing the SHA-1 Digest library through the require command and then calling digest :: sha1.hexdigest () on the target string:

>> store = header + content
=> "blob 16\u0000what is up, doc?"
>> require 'digest/sha1'
=> true
>> sha1 = Digest::SHA1.hexdigest(store)
=> "bd9dbf5aae1a3862dd1526723246b20206e5fc37"
Copy the code

Let’s compare git hash-object output. Echo -n is used to avoid adding a newline to the output.

$ echo -n "what is up, doc?" | git hash-object --stdin
bd9dbf5aae1a3862dd1526723246b20206e5fc37
Copy the code

Git will compress this new content via zlib. You can do this in Ruby with the help of the Zlib library. Import the appropriate library and then call Zlib:: deflate.deflate () on the target content:

>> require 'zlib'
=> true
>> zlib_content = Zlib::Deflate.deflate(store)
=> "x\x9CK\xCA\xC9OR04c(\xCFH,Q\xC8,V(-\xD0QH\xC9O\xB6\a\x00_\x1C\a\x9D"
Copy the code

Finally, you need to write the zlib-compressed content to an object on disk. Determine the path of the object to be written to (the first two characters of the SHA-1 value are the name of the subdirectory, and the last 38 characters are the names of the files in the subdirectory). If the subdirectory does not exist, it can be created using the fileutils.mkdir_p () function in Ruby. Next, open the File with file.open (). Finally, call the write() function on the file handle obtained in the previous step to write the zlib-compressed content to the target file:

> > the path = 'git/objects/' + sha1 [0, 2] +'/', sha1, 38 [2] = > "git/objects/bd / 9 dbf5aae1a3862dd1526723246b20206e5fc37" > > require 'fileutils' => true >> FileUtils.mkdir_p(File.dirname(path)) => ".git/objects/bd" >> File.open(path, 'w') { |f| f.write zlib_content } => 32Copy the code

Git cat-file git cat-file git cat-file

---
$ git cat-file -p bd9dbf5aae1a3862dd1526723246b20206e5fc37
what is up, doc?
---
Copy the code

That’s it — you’ve created a valid Git data object.

All Git objects are stored this way, except for the type identification — the header information for the other two object types starts with the string “commit” or “tree” instead of “blob.” Also, while the content of a data object can be almost anything, the content of a submission object and a tree object have their own fixed formats.

2. Git version advantage

The hash

A hash algorithm, also known as a secure hash function, or a summary of information, can describe any set of information with very simple information. Although different hash algorithms have different encryption strength, they have several things in common

  • No matter how much data is input, input the same hash algorithm, the length of the encryption result is fixed.
  • The hash algorithm is determined, the input data is determined, and the output data is guaranteed to remain unchanged
  • The hash algorithm determines that if the input changes, the output must change, and it usually changes a lot
  • The hash algorithm is not reversible

The underlying Git algorithm is sha-1, and hashing can also be used to verify files

Git relies on this mechanism to fundamentally ensure data integrity

Git’s version saving mechanism

1. File management mechanism of centralized version control tool

Storing information in the form of a list of file changes, such systems treat the information they hold as a set of base files and differences that accumulate over time from each file.

2.Git file management mechanism

Git views data as a set of snapshots of a small file system. Every time Git commits an update, it takes a snapshot of all the current files and stores the index of the snapshot. For efficiency, if the file is not modified, Git will not re-store the file, but will create a pointer to the previously stored file. So the way Git works is called snapshot flow.

SVN makes a copy of all files when creating a branch, while Git only creates a pointer to the current version, so it is very efficient. The switch between branches in Git is just a change in the HEAD pointer, which is also very efficient. Git’s operations depend heavily on the HEAD pointer.

Reference: git-scm.com/book/zh/v2/…