How Git works

A list,

Git is the most advanced distributed version control system in the world. The process is as follows:

Workspace: the workspace
Index / Stage: the staging area
Repository: Warehouse area (or local warehouse)
Remote: Remote warehouse

Key differences between SVN and Git

Difference 1 (Important)

1. The SVN is a centralized version control system. Version library is centralized in the central server, and when working, with their own computers, only a server to maintain and control the code, so the first to get the latest version from the central server, after finishing work, you need to finish their work to push to the central server,
2.Git is a distributed version control system. There is no central server and each host is treated as a server

The difference between two

The main difference between Git and other version control systems is that Git only cares about whether the file data as a whole has changed, while most other systems only care about specific differences in the file content. Git does not keep data on these differences. In practice, Git is more like taking snapshots of files that have changed and recording them in a tiny file system. Each time an update is committed, it looks through the fingerprint information of all the files and takes a snapshot of the file, then saves an index pointing to that snapshot. To improve performance, Git doesn’t save the file again if it hasn’t changed. Instead, it links only to the last snapshot saved.
Almost all operations are performed locally, and the vast majority of operations in Git only require access to local files and resources, not to the Internet. But with CVCS, almost everything requires a network connection. Because Git keeps historical updates of all current projects on your local disk, it is fast to process.

Git directory

- hooks: If you want to start a hook script, you can remove the suffix simple from the file name of the script. These hooks will be triggered at a specific time. For example, post-commit is executed after the whole commit process is complete, which can be used to send commit notifications, etc. There are also server-side hooks, which can be executed before or after the push. For example, post-receive is executed after the push, which can be used to notify the packaging platform to start the packaging task. - Objects: Stores real data as Git objects 1. -refs-heads: stores the latest commit hashes of all local branches. -stash: Stores the corresponding hash values of stash. -tags: stores the tags related. /refs/heads/master: the current branch does not hold the SHA1 value. /refs/heads/master: the current branch does not hold the SHA1 value. /refs/heads/master: the latest commit SHA1 valueCopy the code

Git object storage

Data objects, tree objects, and commit objects are stored in the.git/ Objects directory. Before storing a Git object, the 40-bit hash value of the Git object is divided into two parts: the first two bits are used as the folder, and the last 38 bits are used as the file name of the object.

The 40-bit hash value of Git objects is calculated using the SHA1 algorithm,
Use zlib’s Deflate algorithm to compress Git objects and store them in the last 38 bits of the file name that the object generates
Git Object Storage algorithm steps

1. Calculate the length of 'content' and construct 'header'; 2. Add 'header' to 'content' to construct Git object; 3. Use the SHA1 algorithm to calculate the 40-bit hash code of the Git object. 4. Use zlib's Deflate algorithm to compress Git objects; 5. Store the compressed Git object to. Git /objects/hash[0, 2]/hash[2, 40].Copy the code

Why does Git design the directory structure this way instead of just using the 40-bit hash of the Git object as the file name? There are two reasons:

1. Some file systems have limits on the number of files in a directory. For example, FAT32 limits the maximum number of files in a single directory to 65535. Copying Git files using a USB flash drive may cause problems. 2. Some file systems perform a linear search process. The more files in a directory, the slower the access is.

Git objects

Git locates files based on the hash code of their contents, which means that files with the same contents will point to the same location in the same file system and will not be stored twice.
Git objects come in three types: data objects, tree objects, and commit objects. Git file systems are designed in a similar way to Linux file systems. The contents and properties of files are stored separately. The contents of files are stored in a “bag full of bytes” in the file system, and file attributes such as filename, owner, and permission are stored in a separate area. In Git, data objects are file contents, tree objects are file directory trees, and commit objects are snapshots of file systems.

1. Data objects

A data object is the content of a file, excluding file names and permissions. Git computes a hash value based on the file content and stores the hash value as the file index in the Git file system. Git stores files with the same content only once because the hash value of the same file content is the same. Git hash-object can be used to compute hash values for file contents and store the resulting data objects in git file systems.

1. Calculate a hash value, To hash value as index file stored in a Git file system Git hash - object - w. / gittest. TXT -- -- -- -- -- -- > 6 ef4b68add7afe38c244974d1e08d9a40d442528 / / w If you do not add this option to the Git file system, only the hash value of the file will be computed. 2. ` -t ` said check Git object type Git cat - 6 ef4b68add7afe38c244974d1e08d9a40d442528 file - pCopy the code

Data objects only solve the problem of file content storage, while file name storage needs to be solved by tree objects

2. The object tree

A tree object is a file directory tree, which records the name, type, and mode of the directory that a file obtains. You can use git update-index to specify a name and mode for the data object, and then use git write-tree to write the tree object to the Git file system

/ / write index area or called the staging area 1. Git update - index - add - 100644 6 cacheinfo ef4b68add7afe38c244974d1e08d9a40d442528 gittest. TXT Git write-tree // the tree object can still be viewed with 'git cat-file'Copy the code

Tree objects solve the problem of file names, and since we commit tree objects in phases, tree objects can be viewed as a snapshot of the source directory tree at development time, so we can use tree objects as source source versioning. In source code versioning, we also need to know who submitted the code, when it was submitted, the instructions for submission, and so on, which requires submission objects.

3. Submit the object

Git commit-tree can be used to write a commit object to a Git file system. Git commit-tree can be used to write a commit object to a Git file system

Conclusion: Git data object solves the problem of data storage, tree object solves the problem of file name storage, commit object solves the problem of commit information storage. As you can see from the Git design, Linus abstracts and decouples a source code version control system so that each object solves a specific problem, making it more flexible and maintainable than using a single data structure.

Git references

Git references are equivalent to aliasing a 40-bit hash value for easy identification and reading. Git reference objects are stored in the. Git /refs directory, which has three subfolders — heads, Tags, and remotes — for HEAD, tag, and remote references.

1. The HEAD references

The HEAD reference is used to point to the last committed object for each branch, so that when you switch to a branch, you know where the “tail” of the branch is.
The HEAD reference is stored in.git/refs/headsThe number of branches in a directory has the corresponding HEAD reference object of the same name
The content of the HEAD reference is the hash value of the submitted object

2. Label reference

Tag Git objects. Tag references are stored in Git.git/refs/tagsinside

3. Remote reference

Remote references are also Git reference objects, so they can theoretically be used as wellgit update-refManual maintenance. However, we need to synchronize the code with the remote repository, find the HEAD of the corresponding branch in the remote repository, and use itgit update-refUpdate, the process is more troublesome. And we’re executinggit pullorgit pushThe remote reference is automatically updated when such a high-level command is executed

Conclusion:

All three Git references have been parsed. In general, all three Git references are stored uniformly.git/refsThe contents of Git references are 40-bit hash values pointing to a Git object, which can be any Git object, data object, tree object, or commit object. All three Git references are availablegit update-refManual maintenance.
The difference between the three Git reference objects is that they are stored in.git/refs/heads,.git/refs/tags,.git/refs/remotes, different folders are stored, giving different functions to reference objects. A HEAD reference is used to record the last commit of a local branch, a tag reference is used to label any Git object, and a remote reference is officially used to record the last commit of a remote branch.

A list,

Key differences between SVN and Git

Difference 1 (Important)

The difference between two

Git directory

Git object storage

Git objects

Git references

Related Posts

IOS Core Data Migration Guide – Simple book

Advanced iOS — analysis of the @synchornized principle

IOS low-level – The essence of objects