• Git 2.31
  • Taylor Blau
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: Badd
  • Proofread: PassionPenguin, PingHGao

The bright spot in Git 2.31

The open source project Git recently released version 2.31 with new features and Bug fixes from 85 contributors, 23 of whom are new. The last time we updated Git with you was when Git 2.29 was just released. Git has gone through two iterations since version 2.29, so let’s take a look at some of the most interesting features and changes.

The introduction ofGit maintenance

Imagine this: You open your terminal, commit, pull from another repository, push the final product to a remote location, and suddenly you run into this unhelpful message:

Auto packing the repository for optimum performance. You may also run "git gc" manually. See "git help gc" for more information.

Next thing you know, you’re stuck here. For now, you just have to wait for Git to run Git GC –auto before you can continue.

What’s going on here? In common usage scenarios, Git writes a lot of data: objects, package files, references, and so on. For some of these data paths, Git optimizes write performance. For example, it is faster to write a “loose” object, but faster to read a package file.

To keep you productive, Git does some coordination: Generally, it optimizes the write path during your operations, meaning it pauses frequently to make its internal data structures read more efficiently, with the goal of keeping you productive over the long term.

Git has its own algorithms to determine when this “pause” is appropriate, but sometimes those algorithms can trigger a Blocking Git GC at the wrong time. While you can manage these data structures yourself, you probably don’t want to waste time deciding when and how to handle them.

Starting with Git 2.31, you can have your cake and eat it. This cross-platform feature keeps the warehouse going without blocking any interactions. It’s worth noting that Git pre-fetch the latest object from the remote once every hour, which can significantly reduce the time it takes to perform Git fetch.

Getting started with background maintenance couldn’t be easier. Simply switch to the warehouse in the terminal where you want to use background maintenance and run the following command:

$ git maintenance start
Copy the code

Git does the rest. In addition to pre-pulling the latest objects every hour, Git also ensures that its own data is in order. It updates the Commit-Graph file hourly and packs loose objects (and repacks already packed objects) every night.

In the Git maintenance documentation, you can read more about this feature and learn how to customize it with the maintenance.* config option. If you get stuck, you can refer to the typo file.

[Source, source, source, source]

Reverse index on the local disk

As you probably already know, Git stores all data as “objects” : commits, trees, blobs that hold the contents of each file. For efficiency purposes, Git puts many objects into package files, which are essentially a stream of objects (the same stream that Git fetch and Git push transfer objects rely on). In order to access these objects efficiently, Git generates an index for each package file. These.idx files allow the object ID to be quickly converted to the corresponding byte offset in the package file.

What if we want to reverse access? Plus, if Git only knows which byte to look for in a package file, how does it know which object that byte belongs to?

To do this, Git uses the aptly named reverse index: an opaque mapping of locations in a package file and which objects each location belongs to. Before Git 2.31, there was no disk file format for reverse indexing (like.idx files), so it needed to be stored in memory after the reverse index was generated each time. This reverse indexing can be roughly thought of as generating an array of object-location pairs and sorting the array by location (curious readers can find the details here).

But such operations take time. If the repository’s package files are large, this process can be lengthy. To get a better idea of how volume affects time, we can do an experiment by comparing the time it takes to print the size and contents of the same object. Git uses forward indexing to locate the target object in the package file when printing the contents of only one object. But to print the size of an object in a package file, Git needs to locate not only the target object, but also the object immediately following it, and subtract the two positions to figure out how much space the target takes up. To find the location of the first byte of an adjacent object, Git needs to use a reverse index.

A comparison of the two shows that printing the object size is 62 times slower than printing the contents of the entire object. You can try this with hyperfine:

$ git rev-parse HEAD >tip
$ hyperfine --warmup=3 \
  'git cat-file --batch <tip' \
  'git cat-file --batch-check="%(objectsize:disk)" <tip'
Copy the code

In version 2.31, Git was finally able to serialize reverse indexes into a new disk file format with a.rev file extension. After generating the reverse-indexed disk file, we repeated the experiment and this time the results showed that the time to print the contents and size of the same object was about the same.

The perceptive reader may wonder why Git goes to all the trouble of using reverse indexes. After all, if you can already print out the contents of an object, then the size of printing it shouldn’t be too hard to figure out how many keystrokes it takes to print out the contents. However, this also depends on the size of the object. If the object is very large, calculating the total number of bytes is more expensive than simply subtracting.

In addition to this kind of artificial experimentation, reverse indexes are useful in other ways, such as sending object bytes directly from disk when an object is passed in a Fetch or Push. Calculating the reverse index ahead of time makes this process run faster.

Git doesn’t generate.rev files by default, but you can try it yourself by running Git config pack.writeReverseIndex true and then repackaging the repository (using Git repack-ad). We’ve been doing this on GitHub over the past few months, significantly improving the experience of many Git operations.

[Source code]

trivia

  • We’ve already mentioned the commit-graph file in the previous article. This is a very useful sequence of information that contains common information about commits, such as who is whose parent commit node, who is whose root node, and so on. For more details, this series of articles provides a good explanation. The commit log chart also stores the generation number information for each commit, helping to speed up multiple commit walks. Git 2.31 has a new generation number that can further improve performance in certain scenarios. This section of Code was contributed by Abhishek Kumar, a student at the Google Summer of Code.

    [source]

  • In recent versions of Git, it has become easier to change the default name of the primary branch in a new repository with the init.defaultBranch configuration item. Git has traditionally tried to check out the branch that the HEAD of the remote repository points to (for example, if the default branch on the remote repository is “foo”, Git clone will try to check out the foo branch locally), but this does not work against empty repositories. In Git 2.31, the same applies to empty repositories. Now, if you clone a new repository locally and start writing the first code, your local copy will follow the default branch name of the remote repository, even if the remote repository has not committed the records yet.

    [source]

  • Speaking of renaming, Git 2.30 also makes it easier to change another default name: the name of the first remote branch of the repository. When you clone a repository, the first initial remote branch is always called “Origin”. Before Git 2.30, you could only rename origin

    with Git remote. Git 2.30 by default gives you the option to configure a custom name instead of always using “Origin”. Try setting the clone. DefaultRemoteName configuration item yourself.

    [source]

  • As a warehouse gets bigger and bigger, it becomes difficult to determine which branches are the main ones. Git 2.31 has a –disk-usage option for Git rev-list, making it easier and faster to calculate the size of an object than using the original tool. The example section of the Rev-List manual shows us some use cases (see the “traditional” way of doing this in the timing section of the source link below).

    [source]

  • You may have used the -g

    option to find commits that have changed a particular code character (e.g. Git log -g ‘foo\(‘ can find changes, whether added, deleted, or modified, involving calls to foo()). But you might also want to ignore changes that match a particular pattern. Git 2.30 introduces -i

    , which lets you ignore code changes that match a particular regular expression. For example, git log -p -i ‘//’ omits changes that only modify comments (including //).

    [source]

  • The renaming detection mechanism has also been significantly optimized to pave the way for Merge Backend. See Optimizing Git’s Merge Machinery, #1, Optimizing Git’s Merge Machinery, #2 for more details.

These are just a few glimpses of the latest updates. Read the release notes for 2.30, 2.31, or earlier in the Git repository for more updates.

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.