“This is the 12th day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”


Implement HEAD query

Now we know how files under.git/HEAD and.git/refs/heads work together. To get started, let’s define some related types:

const HASH_BYTES: usize = 20;

// A (commit) hash is a 20-byte identifier.
// We will see that git also gives hashes to other things.
#[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord)]
struct Hash([u8; HASH_BYTES]);

// The head is either at a specific commit or a named branch
enum Head {
  Commit(Hash),
  Branch(String),}Copy the code

Next, we’ll want to be able to convert hashes back and forth between a 40-character hexadecimal representation and a compact 20-byte representation.

use std::fmt::{self, Display, Formatter};
use std::io::Error;
use std::str::FromStr;

impl FromStr for Hash {
  type Err = Error;

  fn from_str(hex_hash: &str) -> io::Result<Self> {
    // Parse a hexadecimal string like "af64eba00e3cfccc058403c4a110bb49b938af2f"
    // into [0xaf, 0x64, ..., 0x2f]. Returns an error if the string is invalid.
    // ...}}impl Display for Hash {
  fn fmt(&self, f: &mut Formatter) -> fmt::Result {
    // Turn the hash back into a hexadecimal string
    for byte in self.0 {
      write!(f, "{:02x}", byte)? ; }Ok(())}}Copy the code

Now we can write the core logic: read the.git/HEAD file and determine its corresponding commit hash.

fn get_head() -> io::Result<Head> {
  use Head::*;

  lethash_contents = fs::read_to_string(HEAD_FILE)? ;// Remove trailing newline
  let hash_contents = hash_contents.trim_end();
  // If .git/HEAD starts with `ref: refs/heads/`, it's a branch name.
  // Otherwise, it should be a commit hash.
  Ok(match hash_contents.strip_prefix(REF_PREFIX) {
    Some(branch) => Branch(branch.to_string()),
    _ => {
      lethash = Hash::from_str(hash_contents)? ; Commit(hash) } }) }impl Head {
  fn get_hash(&self) -> io::Result<Hash> {
    use Head::*;

    match self {
      Commit(hash) => Ok(*hash),
      Branch(branch) => {
        // Copied from get_branch_head()
        let ref_file = Path::new(BRANCH_REFS_DIRECTORY).join(branch);
        lethash_contents = fs::read_to_string(ref_file)? ; Hash::from_str(hash_contents.trim_end()) } } } }fn main() -> io::Result< > () {lethead = get_head()? ;lethead_hash = head.get_hash()? ;println!("Head hash: {}", head_hash);
  Ok(())}Copy the code

Now, whether we look at the main branch or look directly at the commit hash, this will be printed.

Head hash: af64eba00e3cfccc058403c4a110bb49b938af2f
Copy the code

We have successfully determined the current committed hash value. Now, how do we find out what information is stored for the commit?

What’s in the submission?

When you view a commit on a web interface like GitHub or with a command like Git show, you’ll see the changes (“diff”) that the commit brings.

So you might think that Git would store every commit as a diff. It is also possible to store each commit like a backup, containing the contents of each file for that commit.

Both approaches work: you can calculate a difference from two copies of the file, or you can calculate the contents of the file by applying each difference in order (starting from an empty repository or from the most recent commit). Which method to use depends on what you’re optimizing for.

Diff – based methods occupy less storage space. It minimizes the amount of duplicate information because it stores only the content that changes. However, storing content makes it much faster to check the code in a particular commit, because we don’t need to apply potentially thousands of differences. This also makes it easy to implement Git Clone –depth 1, which speeds up cloning by downloading only the most recent commits.

Also, if the changes are small, it’s not too time-consuming to calculate the difference between two commits: the difference algorithm is fairly fast, and Git can automatically skip unchanged directories/files, as we’ll see later.

For these reasons, Git takes a “store the contents of every file” approach. Git’s implementation manages to store only one copy of the same file, which saves a lot of storage space over naive solutions.