Git and SVN are the most common Version Control systems (VCS), but there are many other VCS, such as CVS in the early days. As the name implies, version control system is mainly to control and coordinate the consistency of the contents of various versions of documents, including but not limited to code files, image files and so on. SVN dominated the market in the early days, but with the advent of Git, more and more people chose it as a version control tool, and the community grew stronger and stronger. Compared with SVN, the core difference is that Git is a distributed VCS. In short, every Git repository you pull down is a distributed version of the main repository, and the contents of the repository are exactly the same. SVN, on the other hand, requires a central repository for centralized control. The advantage of going distributed is that you are no longer dependent on the network. When changes need to be committed and you can’t connect to the network, you can just commit the changes to your local Git repository, and then synchronize the changes with the remote master repository when the network is available. Of course, both distributed and non-distributed have their own advantages and disadvantages, but at present, distributed Git is gradually accepted and promoted by more and more people. This article mainly introduces the basic principles and common commands of Git, trying to explain how Git works from the bottom, so as to help you understand the actions and changes behind the execution of the upper command. For details on the principles of Pro Git, see Pro Git. For common commands, see other documents. The summary of this article is based on your own understanding of the description, if wrong, please feel free to comment.

Git fundamentals

At its core, Git is a content-addressable file system, and the Git interface is just an application layer wrapped on top of it. This relationship is somewhat similar to that between the application layer and subordinate layer in a computer network. Git’s application-layer commands (Git commit, Git push, etc.) are known as porcelain commands. Git hashing and Git update-index (which are rarely used in everyday use) are called plumbing commands (a tool that connects a Git application interface to its underlying implementation). To understand Git’s underlying principles, you need to understand how Git uses low-level commands to implement higher-level commands. Before we do that, let’s take a look at Git’s directory structure and the role each file plays in Git.

Git directory structure

In an operating system, our repository is a folder. But why are these folders Git repositories? This is because when Git is initialized, it creates a.git folder where all the files needed for Git version control are stored. Create a directory on your desktop and run git init from the command line to initialize your git repository. If you can’t see your.git directory at this point, it’s because your operating system hides it automatically, and you need to set hidden files visible in your system Settings. Go into the.git directory and you’ll see that there are many files and folders, each of which has its own purpose, as illustrated in Figure 1 below.

In the figure above, the first row of files and folders is the core of Git, while the second row is the ones that don’t need special attention. Core files include: config file, Objects folder, HEAD file, index file and refs folder. They are explained in turn.

  • Config file: this file mainly records some configuration information for the project, such as whether to initialize in bare mode, remote information, etc. The remote branch information added by git remote add command is saved here.

  • Objects folder: This folder mainly contains Git objects. More on what a Git object is in the next section. Git files and operations are stored in Git objects. Git objects are classified into BLOB, tree, and Commit types. For example, Git commit is a commit object in Git, and each version is organized by the version tree. For example, the current HEAD points to a commit object, which in turn points to several BLObs or trees. The Objects folder contains a number of subfolders, where Git objects are stored in files with the first two bits of their SHA-1 value as subfolders and the last 38 bits of the file name. In addition, Git regularly compresses and packages Git objects to save disk space. The Pack folder is used to store compressed Git objects, and the info folder is used to find Git objects from packed files.

  • ** This file specifies the result of git branch (i.e. the current branch). If the current branch is master, the file will point to master, but instead of storing a master string, it will represent the branch in refs, such as ref: Refs/heads/master.

  • **index file: ** This file holds information about the staging area. This file is sort of a staging area. It contains the timestamp, filename, sha1 value of the file to which it points, etc.

  • Refs folder: This folder stores Pointers to commit objects for data (branches). The heads folder stores the sha-1 value of the most recent commit of each local branch (i.e., the SHA-1 value of the commit object). The remotes folder will record your last communication with each remote repository. Git will record the value of each branch you last pushed to that remote. The Tag folder is the alias of the branch, which you don’t need to know too much about here;

In addition, there are many other files and folders in the.git directory. These files and folders support additional functionality, but they are not a core part of Git, so you can learn about them. Hooks define client-side or server-side hook scripts that perform specific processing before or after specific commands and actions, such as: When you push the local repository to the server’s remote repository, you can define the post_update script in the hooks folder of the server repository, where the latest code can be deployed to the server’s Web server using script code, seamlessly linking version control and code distribution; The description file is for use only by GitWeb applications, so you don’t need to worry too much about it here; Logs record the commit records of each branch of the local and remote repository. That is, all commit objects (including date, author, etc.) are recorded in this folder. Therefore, this folder is the one we view most frequently. If you want to use Git log or tortoiseGit’s show log, you need to get commit logs from this folder. The info folder holds a global executable of ignored mode that you don’t want to manage in a.gitignore file, and is barely needed; The COMMIT_EDITMSG file records comment information from the last time it was committed. The.git folder contains many folders and files with different functions. These folders and files are essential information to describe the Git repository and cannot be changed or deleted at will. In particular, the.git folder can grow larger as the project evolves, because any change to any file requires Git to re-store it as a new object file in the Objects folder, so if a file is very large, The number of changes you commit will multiply the size of your.git folder. Therefore, the.git folder is more like a book in which every change for each version is stored, and the book also has a directory that indicates which page of the book the changes for each version are stored. This is the basic principle of Git.

Understand Git from the underlying commands

Git is divided into porcelain commands and plumbing commands, and porcelain commands are based on plumbing. In order to further understand the underlying principles of Git, we will explore the storage format of Git objects and plumbing commands in detail in this section. If Git is a Linux operating system, your plumbing command is a bit like a shell command, and your Procelain command is a series of system functions or tools written with shell commands, such as your own automatic operation tools. In the next section, we’ll look at how a Plumbing command, rather than a porcelain command, can do Git staging and commit, and use the log to see the commit record. First, let’s start with an introduction to Git objects.

Git object

As we mentioned earlier, Git is a content-addressable file system. How does Git address? Git uses HashTable to address content. In other words, Git simply stores key-value pairs (sha-1, 40 bits) to address content. Value is the compressed file content. Therefore, in the following practice, we often use a 40-bit hash to do our plumbing operations, and almost every plumbing command needs a key to specify what to do.

Git objects can be BLOB, Tree, or Commit. A BLOB object can store almost any type of file, which is called a binary large object. The object type is the same as the BLOB type ina database. A tree object is a data type used to organize BLOB objects. You can think of it as a tree node in a binary tree, except that the tree in Git is not a binary tree. Commit objects represent each commit operation, derived from tree objects. Each commit object represents a commit operation. During the creation process, you can specify the parent node of the commit object, so that all the commit operations can be linked together. A branch is just another child of the tree. If you understand commit trees, Git is almost halfway there.

Git object storage is also very simple, basically can be expressed as the following expression:

Key = sha1(file_header + file_content) Value = zlib(file_content)
Copy the code

In simple terms, Git concatenates the file header with the original data content and computes a 40-bit SHA-1 checksum for the new content. The first two bits of the checksum are the names of subdirectories in the Object directory and the last 38 bits are the names of subdirectories in the subdirectories. Git then compresses the data using zlib and writes it to disk. The format of the file header is “blob #{content.length}\0”, such as “blob 16000 “, which is also the most commonly used format. For the Tree object and the commit object, the file header format is the same, but the file data has a fixed format. Since this is only a basic introduction to Git principle, it is not detailed here. In fact, you can also according to the understanding of the idea, if you design this format, how to design: A tree object contains a reference to the connected BLOB object, and a commit object contains a reference to the committed tree object.

Object staging area

In the procelain command, git add filename is used to add the modified file to the staging area (also known as index library, key-value of the modified file, index file in the.git root directory records the index of the file in the staging area). How does Git add use a plumbing command to index a file? The Git add command corresponds to two basic plumbing commands:

Git update-index git hash-object git hash-object git hash-object git hash-object git hash-objectCopy the code

Therefore, git Add can be divided into two steps in a plumbing command: first, use the hash-object command to change the key-value of files that need to be temporarily stored into A Git object and store the keys. Then, the update-index command is used to add these objects to the index repository for staging, which completes staging of Git files. If you want to see a file’s information based on the key of a Git object, you need to refer to one of the following plumbing commands:

Git cat-file -p /-t key # Git cat-file -p /-t key # git cat-file -p /-t keyCopy the code

Using this command, you can view details about a Key-value Git object. Example

Next, let’s use the plumbing command to do git add. First, create a new Git repository and initialize it using Git init in the new folder, which I won’t detail here, as shown below:

After initialization, a.git directory will be generated in the current directory. If you enter this directory, you will find the directory structure described above. Git hash-object -w to convert version. TXT to git object and store it as follows: git hash-object -w

Here the hash – objec command will return the Git object key value, at this moment. The Git directory objects directory will find that more than a 6 c subdirectories, 58 b76a52188643965f3a6704166e8e0424b7fe file name of the directory are called, That is the last 38 bits of the key value. Write down the key value because we are adding the object to the index base based on that key value. Next, we use the update-index command to index, as shown below:

Note that it is important to include the -add option, while the -cacheInfo option indicates the file type of the file. 100644 is a normal file, and associated with it are executable files, etc. Also, in addition to specifying the key value, you need to specify the file name, indicating which version of which file to add to the index library. After the update-index command is executed, you can find that there are many index files in the. Git directory. The contents of the index file will change after each update-index command is executed. At this point, the main git add process is complete.

Let’s talk briefly about the index file. Index is an index file that stores information about the entire directory tree in the staging area, and holds a timestamp and length for each file in the directory tree. If you use UltraEdit to open the index file, you can find that the index file is in the following format:

Index Magic number (DIRC) + version number + Number of temporary files + timestamp and length of each file

Index The Index library records the timestamp and the corresponding length of all files in the project repository since the last modification of the project. Therefore, as more files are added to the repository, the Index file grows. Git add updates the index (timestamp and size) of each file you add. Git status compares each file’s index to the last one you committed, and displays any changes if they occur. The temporary operation computes a checksum (the SHA-1 hash string mentioned in Chapter 1) for each file, saves a snapshot of the current version of the file to a Git repository (Git uses blob objects to store these snapshots), and adds the checksum to the staging area. This means that the current version of the key for each file is added to the index file. I haven’t verified this, but it should be correct in theory.

Creating a tree node

In Git, all content is stored as tree or BLOB objects. If Git is a UNIX file system, tree objects correspond to directories in UNIX file systems, while BLOB objects correspond to inodes or file contents. In the Git Objects section, we made a rough guess at the format of the tree object. A single tree object contains one or more tree records, each of which contains a SHA-1 pointer to a BLOB or subtree (i.e., a 40-bit key), along with information about the object’s permission mode, type, and file name. Why create a tree object? In Git, after you add a modified file, you commit the contents of the staging area to the local repository. There is no concept of tree. Creating a Tree object is just a buffer step between add and commit, because the commit object is created from the tree object. So how do you create a tree object? You only need to run the following command:

Git write-tree # create a tree object based on the information in the index libraryCopy the code

Git cat-file This command returns the key value of the tree object. You can run git cat-file to view details about the tree object. The creation process is shown below:

As can be seen from the figure, cat-file -t displays the type of the object as Tree, indicating that the tree object is successfully created. At this point, the tree node is created.

In fact, since the INDEX staging contains all files in the project repository, the tree object corresponding to the COMMIT object is always the root tree object of the working directory. For each commit, the root directory of the working directory is linked to a tree object. In Git, each subdirectory corresponds to a tree object, and each file corresponds to a BLOB object, so the entire working directory corresponds to a tree of Git objects, the root node is the tree referenced by the commit object, and each subfolder corresponds to a subtree. So any change to a file will cause all of its parent objects to change and be re-stored. Git add and Git commit. After each commit, you can use git log to check the key of the commit object. Use cat-file again to fetch all the children of the tree. Then you can see that the subfolders correspond to a tree node and the files correspond to a BLOB node.

Commit object

In Git, every commit corresponds to a commit object, and a commit object corresponds to a tree object. To create a COMMIT object, use the following command:

Git commit-tree key -p key2

This method is somewhat similar to adding nodes to a tree in a data structure: both add children to the parent node. The -p option specifies the key of the preceding commit object, which is the key of the parent node, so that the two commit nodes are joined together, and the successive connections form a tree, which we will talk about next. A Commit object is created as follows:

In this command, we only need to specify the first six bits of the key, and since this is the first commit, we don’t need the -p option to specify the parent node. Using the cat-file command, you can see that a COMMIT object is created successfully. The commit object contains the key of the tree object associated with it, as well as information about the author and committer. To view the complete commit record, run the git log -stat key command, which prints all commit records prior to the specified commit object. Now that the Commit object has been created, we can use our plumbing command to implement Git’s add and commit operations, Cool. The relationships of all objects created so far are shown below:

Commit Tree Commit Tree

Next, we complete a second commit and a third commit based on the first. For the second commit, we will submit the second version of version. TXT and add a new file; The third tether demonstrates the construction and submission of sub-tree objects within a tree object. In each of the following commits, we also need to specify the commit objects that precede each commit, so that the Commit objects are joined together to form a commit tree. First, we make changes and commit to the second version. Modify version. TXT and add a new. TXT file as shown below, then use the above method to perform key-value and index update:

Then update the index:

We then create a tree object using the staging area and create a COMMIT object based on the tree object, as shown in the figure below. Git log is used to print the following commit objects. Git log is used to print the following commit objects.

After this commit, the object relationship in Git is as follows:

Next, let’s do the third commit. First, the tree object from the first version is read into the staging area using the read-tree command. As shown below:

TXT file (s) with the same path can only appear once in the index library. If you want to read the first version of the tree object in the index library, you need to add the -prefix option. You need to place version.txt for this version in the folder bak. The Tree object is then created and committed a third time, as shown in the figure below

You can view all commit objects using the Git log. At this point, run the cat-file command to check the contents of the tree object:

If you export the tree as a working directory, there will be a bak subfolder in the root directory. After the third commit, the relationship of all the objects in Git is shown in Figure 4.

As mentioned in the previous section, the entire working directory corresponds to a tree object, and each subfolder is a tree object. Each commit object corresponds to the root tree object. Any change to an object will cause all the upper tree objects to be re-stored.

The above is our three-time submission process with plumbing command. We hope that you can simply understand the connection between porcelain command and plumbing command through these steps, paving the way for the following Git learning.

Git common commands

The purpose of this section is to give a brief introduction to some of the most important but less frequently used commands in Git, so that you can have a comprehensive understanding of the big department commands in Git. The use of basic commands in Git is not described here, and the overall workflow is shown in the figure. If you are not familiar with Git’s branches, you are advised to read Pro Git chapter 3 carefully. The basic Git workflow is shown in Figure 5. Git pull, git push, git fetch, git remote and other basic commands will not be described here. These basic commands are the most important commands, so you must master them. It is suggested that through the above explanation of basic principles and the description of Pro Git, you can think about the changes behind each basic command in detail, so as to deepen your understanding of Git application layer commands. TortoiseGit GUI is something that you can use with tortoiseGit if you have a good understanding of it.

This section focuses on the following git commands, with emphasis on their basic usage: git log, git fork, git rebase, git reset, git reverse, and git Stash. Most of the time, when we’re working on small to medium sized projects, if we don’t have a lot of team members, we just need to open a branch. In this case, as long as you practice, pay attention to pull the latest code, before a push is basic won’t appear serious conflicts or problems, at that time didn’t use the above command basic, but in the case of multiple branch, we may use the above command for branch merge or version back, etc., as a result, It is necessary to take a simple look at these commands and know when to use them

.

Git log

After committing several updates, or cloning a project, you can review the commit history using git log. By default, without any parameters, git log lists all updates by commit time, with the most recent update at the top. In general, I would print the commit log in the log using the following command:

git log –pretty=format:”%h %s” –graph

One of them. — The pretty option specifies the format to print, with %h representing the short SHA1 value (first 6 of 40 bits) listed for each submitted object; The graph option prints logs using a graph. The printed result is as follows:

You can also use Git GUI to display Git commit history. Right-click Git GUI and choose Visual All Branch History from the Repository menu bar. As shown in the figure below

Git fork

Git fork is not a Git command, but a workflow. Instead of using a single server-side repository as a “central” code baseline, each developer has a server-side repository, which means each contributor has two Git repositories instead of one: one locally private and one publicly server-side, as shown in the figure below

A major advantage of the Forking workflow is that contributed code can be integrated, rather than requiring everyone to push code into a single central repository. Developers push to their own server-side repositories, whereas only the project maintainer can push to a formal repository. This allows the project maintainer to accept any developer’s submission without giving him formal code base write permission.

Git rebase

Git merge Git merge Git merge Git merge Git merge Git merge git merge git merge git merge git merge The principle of this command is to go back to the most recent common ancestor of the two branches and generate a series of file patches based on all subsequent submissions (there is only one C3 here) of the current branch (experiment). Then, starting from the last submission object (C4) of the base branch (master branch), patch files prepared before are applied one by one. Finally, a new merge submission object (C3′) is generated, so as to rewrite the submission history of Experiment and make it the direct downstream of the Master branch. As shown below:

The general purpose of rebase is to get a clean patch that can be applied on a remote branch. For example, if you are not the maintainer of a project and want to help, you should use Rebase: First develop in one of their branches, when ready to submit patches to the main project, according to the latest Origin /master to do a synthesis operation and then submit, so that the maintainer does not need to do any integration work (in fact, it is the responsibility of resolving the conflict between the branch patches and the latest trunk code, Just do a fast forward merge based on the warehouse address you provided, or simply adopt the patch you submitted.

Conflicts may arise during the rebase process. In this case, Git stops rebase and lets you resolve the conflict; After resolving the conflict, update the index of the content with git add. Then, you don’t need to execute git-commit, just execute git rebase — continue, and Git will continue to apply the remaining patches. To abort this derivative, just git rebase –abort. Remember, never rebase a branch once the commit object in that branch has been published to the public repository.

Git pull –rebase git pull –rebase git pull –rebase Git /rebase), then update the current local branch to the latest Origin branch, and finally apply the saved patches to the current local branch. If you look at tortoiseGit’s log when using tortoise pull, you’ll see that tortoiseGit is using tortoise pull in this way.

Git reset

When using Git, beginners may often have to resolve conflicts due to misoperations. At some point, when you accidentally change something wrong, or commit something by mistake, we need to do a version rollback. The most common commands for version rollback include git reset and Git Revert. These two commands allow us to travel through the history of the version.

Here are a few more classic scenarios to summarize:

  • Scenario 1: When you tamper with the contents of a file in your workspace and want to discard the workspace changes directly, call git checkout — filename;

  • Scenario 2: If you change the contents of a file in your workspace and add it to the staging area, you want to discard the changes. In the first step, run the git reset HEAD file command to return to scenario 1. In the second step, follow scenario 1.

  • Scenario 3: If inappropriate changes have been committed to the repository and you want to undo the commit, use git reset –hard commit_id, but only if they have not been pushed to the remote repository.

Before shuttling, you can use git log to view the commit history to determine which version you want to go back to. To go back to the future, look at the command history with Git reflog to determine which version you want to go back to in the future.

Git revert

Git Revert is used to undo an operation. The commit and history before and after the operation are retained, and the undo is treated as a recent commit. Git Revert is the process of submitting a new version that you want to reversely revert without affecting the previous version.

Git revert and Git Reset both allow you to revert your workspace to a historical state.

  • Git reflog is the most significant difference between git revert and git reflog, which rollback a previous commit with a new commit, and Git reset, which deletes the specified commit.

  • Git reset moves the HEAD back, while Git Revert moves the HEAD back, but the new commit is the opposite of what you want to revert, and cancelts what you want to revert.

  • In the case of rollback, the effect is similar. However, there are differences when continuing to merge older versions before. Because Git Revert “neutralizes” the previous commit with a reverse commit, this change does not reappear when merging old branches later. However, git reset deletes some commit on a branch, so the rollback commit should be reintroduced when merging with the old branch again.

Git stash

Git Stash is used to hold ongoing work, pushing the contents of the workspace that have not been added to the index into the local Git stack and popping them out when needed. For example, you want to pull the latest code, but do not want to add a new COMMIT. Or, to fix an urgent bug, stash and pop it back to your previous commit and stash and pop it again. Git Stash allows the local repository to revert to the last commit state, and local uncommitted content is pushed into the Git stack. The basic use flow of Git Stash is as follows:

Git Stash # Hold content that has not yet been committed in the workspace

Do your work # Build on the status of the last submission

Git Stash pop # pops up the temporary content and applies it

When you use the git Stash list command many times, your stack will be full of uncommitted code and you will be confused about which version to apply back. The git stash list command will print out the current git stack information and you only need to find the version number. For example, you can use git Stash apply stash@{1} to pull out the stash that you specify as version number stash@{1}. When you apply all the stacks back, you can use git Stash Clear to clear them. The Stash Save menu in TortoiseGit is the appropriate command.

conclusion

This article mainly introduces and popularizes the basic principles and common commands of Git. Git directory structure, porcelain command and plumbing command, to use the plumbing command to complete commit practice, and finally some important commands are illustrated. Hope that after reading this article, you can have an overall understanding of the principle of Git, and can use various commands flexibly. Most of the content of this article comes from the Internet, which is a summary of knowledge collection and understanding. I hope it can really help you.