The preface
As a tool you use every day, you must be familiar with the basic uses of Git, such as:
- with
git-blame
Find out which colleague introduced a particular line of bugs, and let him take the blame; - with
git-merge
Merging someone else’s code into your flawless branch and then discovering that unit tests don’t work; - with
git-push -f
Overwrite all submissions from the rest of the team.
Git is also a key-value database with versioning capabilities:
- All submissions are stored in the directory
.git/objects/
Under; - The contents of files are stored
blob
Object, store file metadatatree
Object, and store the commit recordcommit
Objects, etc.; - Git provides key-value style read and write commands
git-cat-file
andgit-hash-object
.
As those of you who read my previous post on git Merge know, if a merge is not fast-forward, a new commit object with two parent commit objects is created. With well-known Go language Web framework gin warehouse as an example, the hash value for e38955615a14e567811e390c87afe705df957f3a submission is a merger, there are two lines of the parent in the content of the submitted
➜ gin git: (master) git cat - file - p 'e38955615a14e567811e390c87afe705df957f3a' tree 93e5046e502847a6355ed26223a902b4de2de7c7 parent ad087650e9881c93a19fd8db75a86968aa998cac parent ce26751a5a3ed13e9a6aa010d9a7fa767de91b8c author Javier Provecho Fernandez <[email protected]> 1499534953 +0200 committer Javier Provecho Fernandez <[email protected]> 1499535020 +0200 Merge pull request #520 from 178inaba/travis-import_pathCopy the code
All submitted objects form a directed acyclic graph with a submitted parent attribute. But you’re smart enough to realize that the git-log output is linear, so Git uses some sort of graph traversal.
Look at man Git-log, which can be seen in the Commit Ordering section
By default, the commits are shown in reverse chronological order.
You are smart enough to already know how to implement the traversal algorithm for this graph.
Write one yourselfgit-log
parsingcommit
object
To print the commit object’s information in the correct order, you need to parse it first. Instead of opening the file, reading the byte stream, and unpacking the file itself from scratch, we simply call git-cat-file as we did above. Git-cat-file prints some of the contents that need to be extracted for later use:
- In order to
parent
The opening line. The hash of this row is used to locate a node in a directed acyclic graph; - In order to
committer
The opening line. This line of UNIX timestamps will be used to determine who is “next.”
You can write a Python class to parse a COMMIT object
class CommitObject:
The result of parsing an object of type commit in Git. ""
def __init__(self, *, commit_id: str) - >None:
self.commit_id = commit_id
file_content = self._cat_file(commit_id)
self.parents = self._parse_parents(file_content)
self.timestamp = self._parse_commit_timestamp(file_content)
def _cat_file(self, commit_id: str) - >str:
cmd = ['git'.'cat-file'.'-p', commit_id]
return subprocess.check_output(cmd).decode('utf-8')
def _parse_commit_timestamp(self, file_content: str) - >int:
""" Resolve the submitted UNIX timestamp." ""
lines = file_content.split('\n')
for line in lines:
if line.startswith('committer '):
m = re.search('committer .+ <[^ ]+> ([0-9]+)', line.strip())
return int(m.group(1))
def _parse_parents(self, file_content: str) - >List[str] :
lines = file_content.split('\n')
parents: List[str] = []
for line in lines:
if line.startswith('parent '):
m = re.search('parent (.*)', line.strip())
parent_id = m.group(1)
parents.append(parent_id)
return parents
Copy the code
traversecommit
Directed acyclic graph — large root heap
Congratulations, that data structure you’ve learned can come in handy.
Assumption in the above class CommitObject parsing the hash values are submitted e38955615a14e567811e390c87afe705df957f3a gin, then its parents attributes, there are two strings:
ad087650e9881c93a19fd8db75a86968aa998cac
;ce26751a5a3ed13e9a6aa010d9a7fa767de91b8c
.
Among them:
- The hash value is
ad087650e9881c93a19fd8db75a86968aa998cac
The submission time ofSat Jul 8 12:31:44
; - The hash value is
ce26751a5a3ed13e9a6aa010d9a7fa767de91b8c
The submission time ofJan 28 02:32:44
.
Obviously, according to the inversion of time sequence (reverse chronological) print log, is the next print node should be ad087650e9881c93a19fd8db75a86968aa998cac – use git – log command can confirm this.
After printing ad087650e9881c93a19fd8db75a86968aa998cac, and from its parent submission and ce26751a5a3ed13e9a6aa010d9a7fa767de91b8c, pick out the next to print submission object. Obviously, this is a recurring process:
- From the one to print
commit
Object, find the one with the largest commit timestamp; - Prints its messages;
- will
commit
Add all parent commits to the object pool to be printed, go back to step 1;
This process continues until there are no More Commit objects to print, and all commit objects to print form a priority queue — which can be implemented with a large root heap.
However, I’m not going to actually implement a heap data structure in this short demo — I’m going to use insert sort instead.
class MyGitLogPrinter() :
def __init__(self, *, commit_id: str, n: int) - >None:
self.commits: List[CommitObject] = []
self.times = n
commit = CommitObject(commit_id=commit_id)
self._enqueue(commit)
def run(self) :
i = 0
while len(self.commits) > 0 and i < self.times:
commit = self.commits.pop(0)
for parent_id in commit.parents:
parent = CommitObject(commit_id=parent_id)
self._enqueue(parent)
print('{} {}'.format(commit.commit_id, commit.timestamp))
i += 1
def _enqueue(self, commit: CommitObject) :
for comm in self.commits:
if commit.commit_id == comm.commit_id:
return
Insert a new node at the index I by moving it from I to the last element.
i = 0
while i < len(self.commits):
if commit.timestamp > self.commits[i].timestamp:
break
i += 1
self.commits = self.commits[0:i] + [commit] + self.commits[i:]
Copy the code
Finally provide a start function to experience
@click.command()
@click.option('--commit-id', required=True)
@click.option('-n', default=20)
def cli(commit_id: str, n: int) :
MyGitLogPrinter(commit_id=commit_id, n=n).run()
if __name__ == '__main__':
cli()
Copy the code
Real Monkey King vs. fake Monkey King
To see if the above code prints the commit objects in the correct order, I first redirect its output to a file
➜ gin git:(master) python3 ~/SourceCode/python/my_git_log/my_git_log.py --commit-id 'e38955615a14e567811e390c87afe705df957f3a' -n 20 > /tmp/my_git_log.txt
Copy the code
Print it out in the same format with git-log
➜ gin git (master) the git log - pretty = 'format: % H % ct' 'e38955615a14e567811e390c87afe705df957f3a' - n > 20 / TMP/git_log. TXTCopy the code
Finally, let the diff command tell us if the two files are different
➜ gin git diff: (master)/TMP/git_log. TXT/TMP/my_git_log. TXT 20 c20 < 2521 d8246d9813d65700650b29e278a08823e3ae 1499266911 \ No newline at end of file ---> 2521d8246d9813d65700650b29e278a08823e3ae 1499266911
Copy the code
It’s almost the same.
Read the original