Git repositories get bigger as the number of commits increases, especially if large files are pushed, and even if they are deleted later, Git keeps track of them so that they can be rolled back to the specified commit at any time.

Take a Git repository of the author as an example, the entire directory is 3GB, of which the.git directory itself accounts for 2.7GB. If the Git repository is too large, the following problems may occur:

  • Take up storage space on your computer
  • Others take too long to clone
  • Switching branches takes up memory

Consider downsizing your warehouse if the large files you submitted are no longer needed. Let’s start with the results. The streamlined warehouse is only 80M, which is 97% optimized. So how does that work?

A tool called BFG is recommended to help you quickly clean up your Git repository. It is reassuring that this tool does not delete any files that were in the repository at the time of the last commit, but just cleans up the history. Without further ado, let’s get straight to the procedure:

Step 1: Install the BFG command line tool

BFG relies on the Java environment, so make sure you have Java installed on your computer. Then go to the official website to download the jar package, such as bfG-1.14.0.jar. Then create the alias on the command line:

alias bfg='the Java jar/XXXX/BFG - 1.14.0. Jar'
Copy the code

Brew can be installed directly on a Mac:

brew install bfg
Copy the code

Step 2: Download the mirror repository

Commands are:

git clone --mirror git://example.com/your-repo.git
Copy the code

Git clone the git clone command is used to clone the git repository.

  1. git clone <repository> <directory>
  2. git clone --bare <repository> <directory.git>
  3. git clone --mirror <repository> <directory.git>

The third option is used, which takes the –mirror parameter. What’s the difference, you might ask?

  • Use 1 will<repository>Point to the version library to create a clone<directory>Directory. directory<directory>Equivalent to clone version library workspace, files will be checked out, version library is located under the workspace.gitDirectory.
  • The clone libraries created by usage 2 and 3 do not contain workspaces and are directly the contents of the library. Such libraries are called raw libraries. It is generally accepted as the directory name of the bare version library.gitIs the suffix, so in the example above, the cloned raw library directory name is written as<directory.git>.
  • Usage 3 differs from usage 2 in that the cloned raw version of usage 3 registers the upstream repository so that continuous synchronization can be performed between the raw repository and the upstream repository using git fetch.

If the repository is large, you are advised to run the following command to back up the database after downloading it.

cp -r your-repo.git your-repo-backup.git
Copy the code

Step 3: Execute the cleanup command

Run the BFG command as needed, for example, to clear all files larger than 10M:

bfg --strip-blobs-bigger-than 10M your-repo.git
Copy the code

Clean up the top 100 files:

bfg --strip-biggest-blobs 100 your-repo.git
Copy the code

To specify a file or to clean a file by wildcard:

bfg --delete-files "node.exe" your-repo.git
bfg --delete-files "*.zip" your-repo.git
Copy the code

Protect the specified branch (master by default). This command is very useful to indicate which branch files should not be deleted after the latest commit:

bfg --protect-blobs-from master,feat/latest --delete-files "*.zip" your-repo.git
bfg --protect-blobs-from master,dev,stage --strip-biggest-blobs 100 your-repo.git
Copy the code

Of course, if you don’t want to see this file in any branches, you can write:

bfg --delete-files "*.zip" --no-blob-protection your-repo.git
Copy the code

The nice thing about BFG is that every command generates a log, which is stored in the your-rebo.git.bfg-report directory on a level with your -rebo.git repository. There is a deleted-files.txt file that clearly shows which files have been deleted, for example:

d2c18a31fef1d60f47 8316352 dog.png
e00f5efc61fc18a37c 8205751 cat.png
c8c27e0758df6d9795 9878146 wix.exe
Copy the code

Step 4: Update Git history

To make sure that all the commands you want to clear in step 3 have been executed, run the following command to rewrite the warehouse history, which will be slow:

cd your-repo.git
git reflog expire --expire=now --all && git gc --prune=now --aggressive
Copy the code

Step 5: Commit to the remote repository

So far, all operations have been local, requiring a slimmed down warehouse to be submitted remotely:

cd your-repo.git
git push --mirror
Copy the code

At this time, the history of the warehouse was really rewritten, after re clone, will get a leaner warehouse.

However, in multi-player collaboration, you need to tell all collaborators to delete the old local warehouse and clone the new one. If someone accidentally pushes the old local warehouse, your work will be wasted.

So in order to completely prevent the above situation, you can also build a new warehouse, rewrite the origin address with the following command, push to the new warehouse, notify everyone to download the new warehouse can be:

cd your-repo.git
git remote set-url origin your-new-repo.git
git push --mirror
Copy the code