Git objects

preface

Eslint + Git +husky is how to control the quality of the front-end, and a preliminary understanding of the underlying internal operation principle of Git -Git object

Post the official documentation for further study :git-scm.com/book/zh/v2/

The underlying concept

Restore what git does at the bottom under git’s higher level command.

The Linux command used here

Clear: clears the screen find directory name -type f: tils files in the corresponding directory onto the CONSOLE CAT file URL: View the content of the corresponding file URL of the vim file (in English mode) Edit a file press I to insert the file to edit the file Press ESC & Press: to execute the command q! Forced exit (not saved) wq Save the changes and exitCopy the code

Initialize the

Command: git init

After initialization, a.git directory is displayed in the current directory. All data and resources required by Git are stored in this directory. So far, we’ve just initialized all the files and directories in the existing architecture, but we haven’t started tracking any files in the project yet.

The picture shows the contents and introduction of git folderCopy the code

Git object

At the heart of Git is a simple key-value database. You can insert any type of content into the database, and it returns a key value that can be retrieved again at any time

Here’s a simple version control for a file

Add: writes to the database and returns the corresponding key information

Run git hash-object -w filerNmae to return the key value of the file. As you can see from the image, the content we wrote is already stored in the objects folder under the.git folder.Copy the code

Check: How does Gi store data

1. The find. Git/objects - type f returns: git/objects/df/fdc195af3f4a6c6182c3b6aa306dc8f4671306 this is the way of a git at the beginning of content: a file corresponding to a content. The first two characters of the checksum are used to name subdirectories, The remaining 38 characters were used as the filename. 2. The git cat - file - t dffdc195af3f4a6c6182c3b6aa306dc8f4671306 returns: blob using cat - file - t command, Can let the Git told us that the internal storage of any object type 3. Git cat - file - p dffdc195af3f4a6c6182c3b6aa306dc8f4671306 returns: 1. The test using the cat - file - p command, You can view the contents of the objectCopy the code

Note *

The current operation is all on the local database does not involve the staging area. I understand the above operation corresponds to the storage workspace content in git AddCopy the code

The problem

1. Remember that sha-1 values for each version of a file are unrealistic 2. In Git, the file name is not saved -- we only save the file's content solution: the tree objectCopy the code

The tree object

Tree objects, which solve the problem of file name preservation, also allow us to organize multiple files together. Git stores content in a manner similar to UNIX file systems. Everything is stored as tree objects, which correspond to UNIX directory entries, and data objects (Git objects), which roughly correspond to file contents. A tree object contains one or more records (each record contains a sha-1 pointer to a Git object or subtree object, along with schema, type, and file name information).

So let’s do a simple operation to get a tree object

Add: Generates a tree object to the database

1. The git update - index - add - 100644 dffdc195af3f4a6c6182c3b6aa306dc8f4671306 cacheinfo test. Use the update - index command for md The first version of the test.txt file -- create a staging area --add option: since the file did not need the --add --cacheinfo option for the first time in the staging area: Since the files to be added are in a Git database, not in the current directory, all that is needed - cacheInfo 100644 is in file mode, indicating that this is a normal file; 100755, represents an executable file; 120,000, that's a symbolic link. dffdc... For the target HASH test.md, filerName 2. Git status Check the current file status and find that the target file is already stored in the staging area. In other words, we used two commands. Git Add is restored. But now still have not stored in the object tree. 3. The git tree command tree through the write - write - spanning tree to like return: a 86 c635fe406ae22ae4d0dd63361f1207f112ac6c filerName Now that the tree object has been generated, let's see what the tree object contains and where it is placedCopy the code

How does a tree store data

1. Find.git/objects-type f . The git/objects / 86 / c635fe406ae22ae4d0dd63361f1207f112ac6c. Git/objects/df/fdc195af3f4a6c6182c3b6aa306dc8f4671306 among them 86 is the folder path of the tree object that was just stored,c635... For storing the object file path tree. 2. The git ls - files - s return: 100644 dffdc195af3f4a6c6182c3b6aa306dc8f4671306 0 test. The md View the current storage status registers, because just to perform the update - index command to the staging area (. Git index) under the file updated. 3. The git cat - 86 c635fe406ae22ae4d0dd63361f1207f112ac6c file - t Tree use cat-file -t, Can let the Git told us that the internal storage of any object type 4. Git cat file - p - 86 c635fe406ae22ae4d0dd63361f1207f112ac6c returns: 100644 blob Dffdc195af3f4a6c6182c3b6aa306dc8f4671306 test. The md use cat - file - p treeHASH command, you can view the contents of the corresponding object stored will find inside the store is the current staging area corresponding to each file. We'll see that corresponds to the data in step '2'.Copy the code

Conclusion: The current tree structure is below, and we find that the tree structure already stores a bloB data type file. If we use cat-file -p blob, we can also see the corresponding contentsCopy the code

graph TD
tree --> blob1

Modification: How does the tree structure change if we modify the contents of the file? Let’s read on

At this point, we added a new file test2.md for the business scenario and modified the content. Git update-index --add -- cacheInfo 100644 blobHash filerNmae 3. Git write-tree Git read-tree --prefix=bak 'old-treehash' This command merges the old tree with the new tree. Git write-tree generates a new tree again. 6. Git cat-file -p new-treeHash Check the hash of the new tree and find two files: the old tree and the new blob The following figureCopy the code

Parse tree object

At this point, the tree structure has undergone subtle changes. We find that the tree structure stores a file of tree type and a file of blob type. If we use cat-file -p HASH, we can also see the corresponding contents. Git creates and records a tree object based on the state represented by the staging area (that is, the index area) at a given time, so that a sequence of tree objects can be recorded by repetition (at a given time). The tree object is an abstraction of the operation in the staging area, as opposed to a snapshot. When any changes to our workspace are synchronized to the staging area. The write-tree command is invoked to write a tree object to the staging area contents using the write-tree command. It automatically creates a new tree object based on the current staging state, that is, a tree object is generated for each synchronization. And the command returns a hash to the tree object. In Git, every file (data) has a hash (type blob) and every tree object has a hash (type tree).Copy the code

graph TD
tree --> blob
newTree --> blob2
newTree --> oldTree --> blob1

conclusion

We can think of a tree object as a snapshot of our project.Copy the code

There is a problem

There are now three tree objects (three write-trees performed) that represent the different project snapshots we want to track. The problem remains, however, that to reuse these snapshots, you must remember all three HASH values. Also, you have no idea who saved the snapshots, when they were saved, or why. This is the basic information that a Commit Object can store for youCopy the code

Git commit object

We can create a commit object by calling the commit-tree command, specifying the SHA-1 value of a tree object and the commit’s parent (if any, there is no parent for the first snapshot of the staging area)

Add: Creates a submission object

1. Echo 'first commit' | git commit - tree treeHASH (HASH tree object to submit) Returns: e71c5d4acbff486237b0cda2fa438a5b8bb3315f (submitted the newly created object HASH) we can by calling the commit - tree command to create a submit object, therefore need to specify a tree object SHA - 1 value, And the father of submit submit object (if there is one For the first time to do the staging area As there is no parent) 2. Git cat - submit object file - p e71c5d4acbff486237b0cda2fa438a5b8bb3315f view content It first specify a top-level object tree, Represents the current project snapshot; Then author/submitter information (based on your user.name and user.email configuration, plus a timestamp)Copy the code

Change: Add new submission object (now we modify some content in test.md file, and then add new submission object)

1. The git hash - object - w test. Md new snapshot file. 2. The git update d63e74a1b991638e48bc289ba7237f1c15b36c index -- cacheinfo 100644 in 93 Test. The md snapshot files to join the staging area 3. Git write - tree to create a new object tree. 4. Git read - tree -- prefix = 86 c635fe406ae22ae4d0dd63361f1207f112ac6c bak Git write-tree Creates a new tree object. Returns: 6420 f8cdd2497dec2cd60f0b8989ce77750d1136 new tree object content at this time for the old tree with new blob. 6. The echo 'second commit' | git commit - tree NewTreeObj -p oldCommitObj creates a new commit object. Git cat-file -t newCommitObjHASH To check the file type, the following format is displayed :commit 8. Git cat-file -p newCommitObjHASH The file contents are displayed :tree New tree object snapshot Parent Submitted object submit user informationCopy the code

Now, if you run the Git log command on the last committed SHA-1 value, you may be surprised to find that you have an authentic Git commit history that can be viewed by the Git logCopy the code

conclusion

At this point, we have created a Git commit history with just a few low-level actions without using any upper-level commands. This is what Git does every time we run git add and git commit, essentially saving the overwritten file as a data object, a new staging area, a record tree object. Finally, create a commit object that specifies the top-level tree object and the parent commit. The three main Git objects -- data objects, tree objects, and commit objects -- were originally stored as separate files in the.git/ Objects directoryCopy the code

If you trace all the internal Pointers, you get an object diagram like the following:

graph TD

At the end

Then we have a further understanding. I posted the command comparison to give you an impression.

Git add./ git hash-ovject -w file name git update-index git add./ git hash-ovject -w file name git update-index Git commit -m 'git write-tree' git commit-tree 'git log' git log commitHASH Working with Git objects: You can write content to the database and store file snapshots. Faults content too much, remember that each file of the HASH don't show, and the name of the file is not save - save the file content. Only the tree object: it solved the problem of the filename to save, also run multiple file organization we will together, the problem is that if you want to reuse the snapshot, you should remember a few tree used in the current HASH value, And we don't know who kept it. Commit object: This solves the problem of organizing all the content ready for publication together and annotating the relevant information. User information, etc.Copy the code

preface

The underlying concept

Restore what git does at the bottom under git’s higher level command.

Initialize the

Git object

Here’s a simple version control for a file

Add: writes to the database and returns the corresponding key information

Check: How does Gi store data

Note *

The problem

The tree object

So let’s do a simple operation to get a tree object

Add: Generates a tree object to the database

How does a tree store data

Modification: How does the tree structure change if we modify the contents of the file? Let’s read on

Parse tree object

conclusion

There is a problem

Git commit object

Add: Creates a submission object

Change: Add new submission object (now we modify some content in test.md file, and then add new submission object)

conclusion

If you trace all the internal Pointers, you get an object diagram like the following:

At the end

Then we have a further understanding. I posted the command comparison to give you an impression.

Related Posts

The steps from entering the URL to presenting the page

“Interview confidence” — The single Responsibility principle of design mode | more challenging in August

Test http1 and HTTP2 performance comparison, and analysis of Network Timing diagram