preface
Git is every programmer must be proficient in using a tool, but in the current this impetuous society, especially the front end of the field is big development, everyone seems to only care about how to use, and don’t want to go to understand the internal principle, although for most of the third-party packages we really don’t need to care about its implementation details, But for git, a tool that has been with us for days and nights, it’s worth taking a closer look.
This article explores git’s internal implementation logic step by step, starting with an empty project.
The key concept
SHA-1
Sha-1 is an irreversible encryption algorithm that generates data of varying sizes into a 160-bit (20-byte) hash called a message digest, which is typically rendered as 40 hexadecimal numbers.
Git partition
Workspace Index/Stage: Repository: local Repository: Remote Repository
Git Objects Object types
blob
: Saves a snapshot of the submitted filetree
: Saves all files submitted this timeblob
Object informationcommit
: Saves the submitted information andtree
Object information
These are the three most important object types
The directory structure
I know that a project starts with a directory structure, so git starts with its directory structure
#Create an empty project and initialize it
$ git init
#You can see the directory structure by entering the.git file│ 0 folders, 0 files, 0 files, 0 files, 0 files, 0 files, 0 files, 0 files, 0 files, 0 files, 0 files, 0 files, 0 files, 0 files, 0 files, 0 files, 0 files, 0 files, 0 files, 0 files, 0 files, 0 files, 0 files. │ ├─hooks # Store some shell scripts │ ├─info # Store some repository information │ Exclude │ ├─objects # Store all types of object │ ├─info │ ├─ trash ├─ free exercisesCopy the code
The focus here is on HEAD, objects and refs.
GIT Storage Principle
1. File snapshot storage
Create test. TXT and say hello git! Git add. You can see that the Objects folder has changed:
Objects: ├ ─ 27 new file │ │ f8151e4c44bb7a129d64b35fff3422d5e3a # 706 ├ ─ info └ ─ packCopy the code
27 and 706 f8151e4c44bb7a129d64b35fff3422d5e3a stitching together just 40, this is according to your file generated by the SHA1 algorithm of 40 hexadecimal hash value, and this file is used to save your file data snapshot, how big is your file, and how it can be.
This file is a binary file and cannot be viewed directly, but it can be viewed using the commands provided with Git:
$ git cat-file -t 27706f8151e4c44bb7a129d64b35fff3422d5e3a # look at the object type
blob
$ git cat-file -p 27706f8151e4c44bb7a129d64b35fff3422d5e3a # view object data
hello git!
Copy the code
Git git git git git git git git git git git index
$ git ls-files --stage # check the index file
100644 27706f8151e4c44bb7a129d64b35fff3422d5e3a 0 test.txt
Copy the code
This file holds the hash value of the blob object of our staging file, or a pointer to that object.
$ git reset HEAD -- . Cancel the staging area file
$ git ls-files --stage # check the index file and find no output value
Copy the code
2. Submit information management
$ git commit -m 'First submission'[master (root-commit) e895c99] commit 1 files changed, 1 insertions(+) create mode 100644 test.txtCopy the code
The Objects folder has changed:
Objects: ├ ─ 27 706 f8151e4c44bb7a129d64b35fff3422d5e3a │ │ ├ ─ 98 │ │ b241e3ee5f307af72f5aaafb154dbfb54c3a30 # new file ├ ─ e8 │ C99416c3adf72869db580b4c729891f27d0d # 95 new file │ ├ ─ info └ ─ packCopy the code
There are two new hashed files, as usual, so let’s take a look
$ git cat-file -t 98b241e3ee5f307af72f5aaafb154dbfb54c3a30Tree # Tree type Object
$ git cat-file -p 98b241e3ee5f307af72f5aaafb154dbfb54c3a30
100644 blob 27706f8151e4c44bb7a129d64b35fff3422d5e3a test.txt
#The tree object stores information about the file snapshot object to be submitted
$ git cat-file -t e895c99416c3adf72869db580b4c729891f27d0dCommit # Commit type object
$ git cat-file -p e895c99416c3adf72869db580b4c729891f27d0dtree 98b241e3ee5f307af72f5aaafb154dbfb54c3a30 author username <[email protected]> 1606826893 +0800 committer username <[email protected]> 1606826893 +0800 First submission
#The COMMIT object stores information about the commit author, the commit, the commit time, the information attached to the commit, and the most important tree object
Copy the code
Commit -> tree -> blob
The refs folder has also been changed. There is a master file in the HEADS folder, which contains the hash value of this update:
e895c99416c3adf72869db580b4c729891f27d0d
The contents of the HEAD file in the root directory are:
ref: refs/heads/master
This establishes the complete relationship chain from HEAD -> master -> commit -> tree -> blob.
The new logs folder in the root directory keeps track of all operations on each branch. In fact, we can see some git operations more intuitively from the files in the folder
│ ├ ─ sci-imp 2
#The HEAD and master files now hold the same information0000000000000000000000000000000000000000 e895c99416c3adf72869db580b4c729891f27d0d username <[email protected]> 1606826893 +0800 Commit (Initial): indicates the initial commit#The first two hashes represent the hash values of the previous commit and the commit object, followed by the user name, user email address, commit time, and commit additional content
Copy the code
Next, make a second change to test.txt and execute git add. And git commit
#The HEAD file under logs clearly records our actions0000000000000000000000000000000000000000 e895c99416c3adf72869db580b4c729891f27d0d username <[email protected]> 1606826893 +0800 commit (initial): First presentation e895c99416c3adf72869db580b4c729891f27d0d 8 f15afaae122a0eeaabc04fb1dc3ab36e3ecbb90 username < [email protected] > 1606828104 +0800 commit: Indicates the second commitCopy the code
We know the eight f15afaae122a0eeaabc04fb1dc3ab36e3ecbb90 is commit type object
#View the content of the COMMIT object
$ git cat-file -p 8f15afaae122a0eeaabc04fb1dc3ab36e3ecbb90tree 910e967c436b0824e4ac0aebd4963c64bdd5f31b parent e895c99416c3adf72869db580b4c729891f27d0d author username <[email protected]> 1606828104 +0800 committer Username <[email protected]> 1606828104 +0800 Commit the second timeCopy the code
As you can see, this commit not only records the tree, but also the last commit, thus keeping all the information related to the last commit, and so on, as many commits we can link them together like a linked list.
#View the contents of a tree object
$ git cat-file -p 910e967c436b0824e4ac0aebd4963c64bdd5f31b
100644 blob cb7a44021ad5013a4620857a2d67f4db9ca2bccb test.txt
Copy the code
At this point we find the file snapshot information from the tree object.
3. Branch management
Create a new branch
$ git branch test_branch
#The logs folder has changed│ ├ ─ sci-imp # press #
#Check the test_branch0000000000000000000000000000000000000000 8f15afaae122a0eeaabc04fb1dc3ab36e3ecbb90 username <[email protected]> 1606828668 +0800 branch: Created from master#The new branch operation and the last commit of the master branch are recorded
#Similarly, the test_Branch file has been added to the refs folder heads
8f15afaae122a0eeaabc04fb1dc3ab36e3ecbb90
#The content is the latest commit object
Copy the code
Perform a branch switch
$ git checkout test_branch
#The HEAD file in the root directory
ref: refs/heads/test_branch
#It points to our current branch
Copy the code
Modify the file on the branch to commit
#The test_branch logs8f15afaae122a0eeaabc04fb1dc3ab36e3ecbb90 67e35930423b61cbb09d90dd47e0e16bf0abbdb3 username <[email protected]> 1606829497 +0800 Commit: commit the branchCopy the code
Then switch to the main branch for merging
$ git merge test_branchUpdating 8f15afa.. 67e3593 Fast-forward hello.js | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
#The master logs8f15afaae122a0eeaabc04fb1dc3ab36e3ecbb90 67e35930423b61cbb09d90dd47e0e16bf0abbdb3 username <[email protected]> 1606829717 +0800 merge test_branch: Fast-forward
#The master under the refs
67e35930423b61cbb09d90dd47e0e16bf0abbdb3
#The content is also updated for the last commit on the test_Branch branch
Copy the code
4. Summary
Git stores all commit information through a link of three types of objects: commit -> tree -> blob. The commit object also forms a link with the commit object. HEAD is used to indicate the current working branch and find the last commit object that the branch points to in the Refs folder.
All we do is commit changes, switch branches, and so on.
By the way, this hash chain is also tamper-proof. If you change one of the files, the hash of the file changes, the hash of the tree changes, and the hash of the commit changes, you have to change the chain of relationships of the entire repository.
GIT common operation instructions
1. Initialize the warehouse
#Initialize the code repository
$ git init
#Pull the repository from the remote repository, pulling the master branch by default
$ git clone [url]
#Pulls the specified branch from the remote repository
$ git clone [url] -b [branch]
Copy the code
2. Work area operations
#Check git file status, including workspace and staging
$ git status
#Restore the specified change file
$ git restore [file1] [file2] ...
#Restore all change files in your workspace
$ git restore .
Copy the code
2. Temporary storage area operation
#Adds the specified change file to the staging area
$ git add [file1] [file2] ...
#Add all change files to the staging area
$ git add .
#Unmount the staging area specified file (HEAD is case insensitive)
$ git restore --staged [file1] [file2] ...
#Undo all files in the staging area
$ git restore --staged .
Copy the code
3. Warehouse area operation
#Submit designated temporary storage area documents to the warehouse area
$ git commit [file1] [file2] ... -m [message]
#Submit all temporary area documents to the warehouse area
$ git commit -m [message]
#View the most recent commit information to get the COMMIT hash value
$ git log
#If you roll back a version, the submitted file goes back to the staging area
$ git reset --soft HEAD~1
#Rollback to the specified COMMIT version
$ git reset --hard [commit_id]
#Rollback to the specified commit version with the new COMMIT. This rollback and the previous COMMIT are retained
$ git revert [commit_id]
Copy the code
4. Branch operations
#Lists all local branches, -r lists all remote branches, and -a lists all local and remote branches
$ git branch
#Create a new branch
$ git branch [branch]
#Switch to the specified branch
$ git checkout [branch]
#Create a new branch from the specified branch and switch to it
$ git checkout -b [branch] [orgin_branch]
#Merges the specified branch into the current branch
$ git merge [branch]
Copy the code
5. Label operations
#List all tags
$ git tag
#Create a new tag for the current commit
$ git tag [tag]
#Viewing tag Information
$ git show [tag]
#Push the specified tag to remote
$ git push origin [tag]
#Deleting a Local Tag
$ git tag -d [tag]
#Deleting a Remote Tag
$ git push origin :refs/tags/[tagName]
Copy the code
6. Remote warehouse operations
#Display all remote warehouses
$ git remote -v
#Pull the specified branch from the remote repository and merge it with the local branch
$ git pull orgin [branch]
#Uploads the specified local branch to the remote repository
$ git push orgin [branch]
Copy the code
conclusion
In this era of IT industry rapidly, most of the demand are ready-made tools and solutions can be a reference and use, but most people is often only limited use, will not think of to understand implementation principle, there will be no more idea to optimize IT, this is a very dangerous signal, will use these tools are on the one hand, Mastering the implementation ideas of these tools is the most important point.