preface

This article comes from a secure force push scenario. In the solution -> Use Git pull -rebse section, I noticed that Git produced a very strange conflict. The position of the conflict line was offset in my opinion. Since I don’t know much about Git and the conflict principle behind it, I can’t find the right keywords to search for the answer, and I’m not sure whether this problem can be reproduced with a demo. In the spirit of digging deeper, while writing the previous article, I started trying to reproduce this strange conflict with a simple demo.

Git version 2.24.3 (Apple Git-128)

Scenario reduction

After combing through, the scene is relatively simple. First, I create an empty repository, create a new branch base, and create a new file base.md in it, which contains the following contents (I have added line numbers for each line for ease of understanding, same below) :

1:
2:
3: b
Copy the code

After saving the file and committing a commit, I created a new branch based on the commit and updated base.md as follows:

1:
2:
3: b
4:
5: c
Copy the code

Once saved and committed, cut back to the base branch and update base.md as follows (note that I added a blank line at the end of the file) :

1: a
2:
3: b
4:
Copy the code

After you save the Settings and submit them, run git Merge feat. So I want you to stop and think about this for a moment, is there a conflict? Why/Why not? If so, what part of the conflict is generated?

The answer is there will be conflict, but are you right about where the conflict is?

01: a
02:
03: <<<<<<< HEAD (Current Change)
04: b
05: =======
06:
07: b
08:
09: c
10: >>>>>>> feat (Incoming Change)
11:
Copy the code

After that, please take a close look at line 6 above, which is a blank line. Is anyone as confused as I am about this? Why is there a blank line here? If I select Incoming Change, two blank lines will be generated between a and B in the file. I need to manually delete one line and delete the last line to completely match the branch feat.

This is not what I expected. Intuitively, the branch feat change is to add a blank line after b and a line containing the letter C. The base branch changes the first line from an empty line to one containing the letter A, followed by b with an additional empty line. The feat branch adds a blank line to the base branch and adds a blank line to the c line. In my original understanding, adding a blank line would not conflict, and feat would add a c line to that, so it would merge directly.

On second thought, the above idea is still too young too naive. From normal development point of view, even if the developers of A new at the end of the file content includes developers new B, certainly cannot apply directly to the developer to A, because A increases code includes B, but the function of what’s possible and B is totally different, so there needs to be thrown conflict, let the user to choose. Of course, this is just my guess, and Git obviously doesn’t take into account what you implement, but it certainly has internal support for determining such conflicts.

In fact, if both A and B make changes to the end of the file, Git will prompt A conflict even if A’s changes include B’s. That’s perfectly reasonable behavior.

Getting back to the subject, intuitively, I think the conflict should look something like this:

01: a
02:
03: b
04: <<<<<<< HEAD (Current Change)
05:
06: =======
07:
08: c
09: >>>>>>> feat (Incoming Change)
Copy the code

If your intuition is what I think it is, read on. If not, please let me know what you think in the comments section. My guess is that either you are well versed in git merge and diff, or you have some great insights.

How to read Git diff

From the above section, we can easily reproduce this scenario, and the next step is to seek the answer. In fact, AT first, I didn’t know how to search for keywords and didn’t understand the principle behind Git. But MY guess was that the conflict had to do with Diff (it turned out to be, but I was just guessing at the time), so first we had to be able to read Diff.

When you think about diff, you might want to use an editor to view it. For example, I used vscode.

As you can see, the new section is marked in green and can be compared by clicking the “Open Changes with Previous Revision” button in the upper right corner:

Let’s look at the branch base change:

The branch feat change is easy to understand, but what about base? I just saw such a sign is also confused, not edit the first line, and then add the last line, why also appear delete mark? And there are wrong lines in the left and right comparison. What does wrong line mean? I went on to assume that the conflict must have something to do with the odd diff (which turned out to be correct), and I changed it slightly, such as adding two lines between the letters A and B:

Or just add a and leave the end unchanged:

As you can see, leaving two lines between a and B is equivalent to adding one more line of A and one more blank line at the end. Adding only a is equivalent to editing the first line. At this point, I still can’t figure out why there is a delete mark on it, because there is no delete operation intuitively.

At this point, I actually started asking a colleague about these results and he suggested that I use git diff directly on the command line instead of using the editor, because the editor might have done some processing and optimization on Git diff. I thought he was right, and I did what he said, using the command line.

Again, look at the branch feat change:

diff --git a/base.md b/base.md index f547db6.. 0d91235 100644 -- a/base.md +++ b/base.md @@ -1,3 +1,5 @@ -b \ No newline at end of file +b ++ c \ No newline at end of  file (END)Copy the code

Git a/base. Md git b/base. The second line index f547db6.. 0d91235 100644 is used to identify some metadata hashes inside Git. This is followed by – and +++, where – is the change from the former, marked with – below, +++ is from the latter, marked with + (it can also be thought of as – for deletion, + for addition). So at sign at sign minus 1,3 plus 1,5 at sign at sign, which means starting at line 1, we’ve extracted 3 rows, and then starting at line 1, we’ve added 3 rows. Check out this tutorial for more information.

Next, we look at the content section. The last (END) is the flag at the END of the command line. It has nothing to do with the content of the file. The first two empty lines are unchanged, so there is no operation. Starting at line 3, we see -b indicating that the line is deleted, \ No newline at end of file indicating that there are No blank lines at the end of the file, then three new lines containing b, blank lines and c, and finally the file still has No blank lines at the end of the file.

\ No newline at end of file Why did the b line change? Isn’t it just two more lines? This is because when we hit return, we are essentially adding a newline character to line B, so if a line is followed by \ No newline at end of file, it indicates that the line has No newline character. Note that at most one line of a file does not have a newline character and is on the last line.

Looking at the diff above, careful readers will notice that in vscode new additions correspond to +, deletions correspond to -, and edits come first with +. This is actually a set of instructions that Git generates to convert a/base.md to b/base. For example, the above instruction set is:

  1. Let me keep my first row the same
  2. Let me keep my second row the same
  3. Delete the letter on the third lineb(No newline character)
  4. Add new letters on the third linebAnd then enter (newline character)
  5. The fourth line is empty, so enter (newline character)
  6. Add letters on the fifth linec, without carriage return (no newline character)

Git implements file changes by executing this set of instructions. Suggested that readers on the basis of such instructions to simulate the git personally again, this and we still have very big different directly edit, pay particular attention to is a newline character belongs to the end of a line, this means that if is to remove a line with a newline, actually from the bank the next line of the first to operate (hit the delete key to delete a newline). Similarly, if you add a new line with a new line character, after that line is finished, you have to hit enter to add a new line character, which looks to the computer like the line ends with a new line character, but looks to the human as a new line underneath.

With that in mind, let’s look at the diff for the branch base:

diff --git a/base.md b/base.md index f547db6.. A1a53b5 100644 -- a/base.md +++ b/base.md @@ -1,3 +1,3 @@ +a -b \ No newline at end of file +b (end)Copy the code

The instruction set is:

  1. New letters on the first lineaAnd then enter (newline character)
  2. Let me keep my second row the same
  3. Delete the third line (since the third line is empty, move the mouse pointer to the beginning of the fourth line and hit the Delete key to delete the return character)
  4. Delete the letter on the third lineb(No newline character)
  5. Add new letters on the third linebAnd then enter (newline character)

This should explain why vscode has a delete flag, because there is a delete operation in the instruction set after diff, that is, the deletion of the b line and two lines above it. As mentioned, Git uses this set of instructions to make changes to files, so there can be two or more different sets of instructions, but the final changes are the same. In this case, it’s clear that git is generating a different set of instructions than we would experience if we manipulated them.

Differences between native diff and editor display

In general, the diff displayed by the editor is more helpful to read, but it’s not very helpful to understand what’s going on inside Git, and there are sometimes inconsistencies, which I stumbled across.

Take vscode for example. If you view a file in diff, there are three colors: blue for edit, red for delete, and green for add. If the view is in two-column mode, there is no blue mark, and the blue edit is represented by “red on the left line + green on the same right line”. When a new row is displayed in two columns, /////// is used to indicate the misaligned row on the left, and the green new row on the right corresponds to the new row.

Such signs are convenient and sufficient for human comprehension. But it ignores the “newline” detail. As mentioned earlier, the newline belongs to the current line, and when we add a new line, we actually change the current line, and this change doesn’t show up in vscode’s diff.

In addition to this, let’s take a look at one of the inconsistencies I stumbled upon. Back to the original state of the branch base, the file base.md reads as follows:

1:
2:
3: b
Copy the code

When changed to:

1: a
2:
3: b
Copy the code

Or:

1: a
2:
3: b
4:
Copy the code

With both changes, vscode shows the same action on the a line as git diff (the former is editing, the latter is adding).

But if I change it to the following:

1: a
2:
3: b
4:
5:
Copy the code

Git diff: git diff: git diff: git diff: git diff: git diff: git diff: git diff

There is a small detail here, even hidden. And I don’t know if this is a bug in the VScode editor, if you know, you can leave a comment in the comments section.

Same operation, different Git diff

As for the instruction set generated by Git diff, I also found that the instruction set generated is not “the same” under the seemingly “same” operation. In the above section, I changed from just adding a, to adding one row after b, to adding two rows after B. Git diff: git diff: git diff: git diff: git diff: git diff: git diff:

It’s not hard to see that when we just add a, we edit the line where A is, and then we branch base. When the end of the file increases to 2 lines, it becomes adding b and then deleting the original B. When you increase to 3 lines, you add 1 line between the added a and b lines. It does not become regular until the end, when the desired blank lines are added respectively after the added B.

It seems a little random. Since this article does not delve deeply into the git diff engine algorithm, it is only mentioned to further illustrate that the instruction set generated by Git diff is not what we normally think of. With this in mind, let’s go back to the main topic and look at the root causes of conflict.

From Git diff to merge conflicts

Git generates a different set of instructions than we expected. What does this have to do with the merge conflict? When git merges, git finds the last public commit for both branches. Based on this commit, Git executes a diff on each branch to get two sets of diff instructions. Git attempts to merge the two sets of instructions. Git prompts for conflicts whenever instructions overlap. Therefore, merge conflicts, which are really conflicts of instruction sets, are closely related to what kind of conflicts are generated and what kind of instruction sets are generated by diff.

Git checkout — conflictStyle Diff3 git checkout — conflictstyle diff3 git checkout — conflictstyle diff3 git checkout — conflictstyle diff3 git checkout — conflictstyle diff3 Git config –global merge. Conflictstyle Diff3 In this form, there is an additional merge base, which can be interpreted as the most recent joint commit of the two changes.

Use Diff3 to show conflicts:

01: a
02:
03: <<<<<<< ours
04: b
05: ||||||| base
06:
07: b
08: =======
09:
10: b
11:
12: c
13: >>>>>>> theirs
14:
Copy the code

When you look at it this way, it’s much clearer. The base branch wants to delete the line above B, while the feat branch wants to keep the line and add the next two lines. So why is the delete operation resolved here in the base branch? Because of the instruction set. Let’s compare these two sets of instructions:

The diff instruction set for branch base is:

  1. New letters on the first lineaAnd then enter (newline character)
  2. Let me keep my second row the same
  3. Delete the third line (since the third line is empty, move the mouse pointer to the beginning of the fourth line and hit the Delete key to delete the return character)
  4. Delete the letter on the third lineb(No newline character)
  5. Add new letters on the third linebAnd then enter (newline character)

The diff instruction set of feat is:

  1. Let me keep my first row the same
  2. Let me keep my second row the same
  3. Delete the letter on the third lineb(No newline character)
  4. Add new letters on the third linebAnd then enter (newline character)
  5. The fourth line is empty, so enter (newline character)
  6. Add letters on the fifth linec, without carriage return (no newline character)

Notice that instructions 4-5 of the former overlap exactly with instructions 3-4 of the latter. For the first two lines, there is no conflict (as long as one party stays the same, the other party’s changes are applied), so just apply. The conflict here lies in the conflict between instruction 3-5 of the former and instruction 3-6 of the latter. In particular, instruction 3 of the former points out that the third line (i.e. the blank line above the letter B) needs to be deleted first, which also solves the two questions why there is deletion and what is deleted.

My questions and opinions

Review the results of the conflict presentation:

01: a
02:
03: <<<<<<< ours
04: b
05: ||||||| base
06:
07: b
08: =======
09:
10: b
11:
12: c
13: >>>>>>> theirs
14:
Copy the code

Note that the last line at the end is outside of the conflict, and I think it’s more appropriate that it’s on line 5:

01: a
02:
03: <<<<<<< ours
04: b
05:
06: ||||||| base
07:
08: b
09: =======
10:
11: b
12:
13: c
14: >>>>>>> theirs
Copy the code

This is more consistent with the operation of the command, and when the re-set (branch feat change) is received, there is no need to manually delete the last line, which is closer to the original change of branch feat. If you know more about merge, please let me know in the comments section.

In addition to this doubt, the diff generates such a set of instructions rather than being closer to what we think. I suspect this is a tradeoff that Git’s diff engine algorithm makes, but it just doesn’t work as well as it should in my particular case. For example, suppose git diff produces the following set of instructions for a change to a branch base:

  1. Delete the first line (because the first line is empty, move the mouse pointer to the beginning of the second line and hit the Delete key to delete the return character)
  2. New letters on the first lineaAnd then enter (newline character)
  3. Let me keep my second row the same
  4. Delete the letter on the third lineb(No newline character)
  5. Add new letters on the third linebAnd then enter (newline character)

Now compare the diff command of feat:

  1. Let me keep my first row the same
  2. Let me keep my second row the same
  3. Delete the letter on the third lineb(No newline character)
  4. Add new letters on the third linebAnd then enter (newline character)
  5. The fourth line is empty, so enter (newline character)
  6. Add letters on the fifth linec, without carriage return (no newline character)

At this time, the conflict is the 4-5 instructions of the former and the 3-6 instructions of the latter, and the result is shown as follows:

01: a
02:
03: <<<<<<< ours
04: b
05: ||||||| base
06: b
07: =======
08: b
09:
10: c
11: >>>>>>> theirs
12:
Copy the code

Such conflicts, apart from still having the same questions I raised in the question section above, seem to be much more consistent with our thought process (and the intuitive feelings I raised in the scene reduction section above).

conclusion

Except for my one question above, this problem is solved. Recalling the process of solving this problem, it was quite difficult. I could not find relevant materials, and because the problem was too detailed, no one paid attention to it. For this reason, I ask StackOverflow Why Git diff has different behaviors? Why git changes the line order when merge conflicts occurs? . As can be seen from the title of the original question, I did not understand the nature of Git Diff and Merge at that time, and the title of the question was stated in very superficial words. Fortunately, a warm-hearted person immediately answered my question and gained a lot in the end. For example, the different instruction sets generated by the Git diff in this article, the more intuitive Diff3 style that shows the conflict, and even the hypothetical view I present at the end, all come from these two questions.

Git’s diff engine generates the final change instruction set based on some algorithms. We can even tweak this behavior with parameters such as indent-heuristic, but it just doesn’t seem to work in my case.

For those of you who are interested in these two questions, go ahead and read them. I have written most of the conclusions in this article. Hopefully you have a better understanding of Git diff and Merge than you’ve had before.