git rebase with 'ours' merge strategy prompting to rebase --continue again and again - git-merge-conflict

Here is the issue I am facing
I had created a feature branch from the master branch. I worked extensively on a feature branch, it is 80 odd commits ahead of the master branch. In these commits, I have edited some files multiple times. After a few days, someone pushed a couple of commits on the master branch thus Pull Request of the feature branch can't be merged due to merge conflicts.
I tried rebase to master and resolve the merge conflict, but I am getting more and more conflicts after git rebase --continue
govi#falcon:/home/my_user/project/ (feature/xyz): git rebase master
govi#falcon:/home/my_user/project/ (feature/xyz | REBASE 32/85):
Here for any merge conflict, I want to select my changes. so I tried ours conflicts resolution strategy in recursive mode. Now git is not forcing me to resolve any conflicts but it is asking me to execute git rebase --continues almost 80 odd times.
govi#falcon:/home/my_user/project/ (feature/xyz): git rebase master -s recursive -X ours
govi#falcon:/home/my_user/project/ (feature/xyz | REBASE-i 1/85)
Last command done (1 command done):
pick db2511c Modify file
Next command to do (1 remaining command):
pick d1c2037 Modify file one more time
is there a better approach to resolve merge conflicts in the above scenario? or maybe a better way to rebase?
PS: We are not allowed to reset the master branch. I know easy way would be to perform {reset, stash, rebase, pop} on the feature branch, but PR is already in progress.

TL;DR
You really want -X theirs. Why you want that is ... long.
Long
First, an initial side note: be careful with terminology: you are not using the ours strategy but rather the ours strategy option. I find Git's terminology here confusing, and prefer to call these -X options extended options, to avoid repeating the word strategy.
Now, on to the problem itself. When using git rebase, you are, in effect, repeatedly running git cherry-pick. Each cherry-pick operation copies one commit; git rebase works by copying multiple commits. The git rebase command first lists out all the hash IDs of the commits that are to be copied, saving these into internal "to-do" files. These files then get updated as the rebase makes progress.
(The details for these files have changed over the years and there's no real point in describing them. However, your shell prompt settings appear to read these to-do and progress files correctly, based on the "1/85" and "32/85" you're seeing here.)
A cherry-pick operation is, technically, a full-blown three-way merge, and can therefore produce merge conflicts. But one must be quite careful here. You wrote:
git rebase master -s recursive -X ours
The strategy argument to git merge or git rebase is -s or --strategy; you are using recursive here, which is fine (an ours strategy is not). The extended options are -X, and an ours or theirs extended option does make sense—but there's a trap here: you want -X theirs.
What's going on
Before we dive into cherry-pick, let's look at git merge. Without this first look at git merge, some of what cherry-pick does makes no sense at all.
To do a git merge operation, we start with a series of commits where, e.g., two different developers started with the same initial chain of commits:
...--F--G--H <-- main
These two developers, who we'll call Alice and Bob in the usual way, have each made some new commits. I'll work here from Alice's point of view:
I--J <-- alice (HEAD)
/
...--H
\
K--L <-- bob
At this point, Alice might merge Bob's work. She has her commit J checked out, with the special name HEAD attached to the branch name alice; she now runs git merge bob to merge Bob's commit L.
The git merge command—technically, this is the recursive strategy rather than git merge itself—locates commit L using the branch name bob. This commit becomes the third commit. Git locates commit J using the special name HEAD, and this becomes the second commit. Last—which becomes first—it works backwards through the commit graph to locate the best common commit, which in this case is commit H.
Each commit has a full snapshot of every file that Git knew about when whoever made the commit, made the commit. So Git can now easily compare the snapshot in the merge base commit H against the snapshot in Alice's commit J, and then do the same thing with Bob's commit L:
git diff --find-renames <hash-of-H> <hash-of-J> # what Alice changed
git diff --find-renames <hash-of-H> <hash-of-L> # what Bob changed
Note that the three commits in question here are:
commit H, as the merge base;
commit J, as --ours, via HEAD; and
commit L, as --theirs, via the name bob.
The merge command—the merge as a verb part of it, that is—now combines our changes, H-vs-J, with their changes, H-vs-L. It is this combining process that can produce merge conflicts.
To the extent that there aren't merge conflicts, though, Git can automatically apply the combined changes, to the files as seen in the merge base commit H. This keeps our changes while adding their changes, which is of course just what we want from a merge.
When there are merge conflict, git merge stops in the middle of the merge. It leaves in Git's index all three input files: index slot #1 contains the base commit copy, slot #2 contains the --ours copy from HEAD, and slot #3 contains the --theirs copy from the commit we named with our git merge command.
Git writes, to the work-tree version of the conflicted file, its best effort at doing the combination. Places where Git was able to combine changes on its own already contain that combination. Places where Git found an ours-vs-theirs conflict have conflict markers and two, or even all three, input files' lines, depending on how you set merge.conflictStyle.
I call these kinds of conflicts low level conflicts. (Git calls them that internally, sort of.) There are also what I call high level conflicts, such as when one side—ours or theirs—modifies and/or renames a file, and the other side deletes it.
Using an extended option, -X ours or -X theirs, tells Git: when you hit a low-level conflict, just resolve it by taking ours or theirs respectively. This has no effect on high level conflicts: you must still resolve these manually.
Note that low-level conflicts can occur even if the two changes don't both change the same line. For instance, if the original input says:
line 1
line 2
line 3
line 4
and Alice changes 2 to two while Bob changes 3 to three, Git will call this a merge conflict. Using -X ours or -X theirs will discard one of the two changes. It's a good idea to actually test such merges before moving on. (Well, it's a good idea to test any merge: just because Git thought that it was OK to combine two different sets of changes, does not mean that it really was OK.)
Recap
The takeaways from the above—re-read through it if needed—are:
The -s strategy is in charge of all the work; we're talking here about -s recursive (though -s resolve does the same kind of thing).
A merge operation has three inputs: base = #1, ours or HEAD = #2, theirs = #3.
Git will combine unconflicted changes on its own, regardless of -X options.
Git will stop with high-level conflicts, regardless of -X options.
The -X options will favor either "ours" (#1-vs-#2) or "theirs" (#1-vs-#3) to resolve low-level conflicts.
Cherry-pick
We're now ready to look at what git cherry-pick really does. The action for a cherry-pick is often described as repeat the changes from a previous commit. While this captures the goal, it doesn't cover the mechanism. The mechanism is irrelevant up until a merge conflict occurs, and then suddenly it's terribly important.
To talk about the mechanism, let's draw another commit graph fragment. This time, instead of Alice and Bob diverging from some common starting point H, let's just look at one or two programmers working on two different features, for instance:
...--P--C--N--O <-- feature1
...--R--S--T <-- feature2 (HEAD)
Commit C is the child of parent commit P; commit N comes after C and O comes after P; these are all found through the name feature1.
Commit T is the last commit on feature2, and we have branch feature2 checked out right now. So commit T is the HEAD commit.
We need some new code to apply to T, and we realize: Wait, I just saw that code, or wrote it last week. It was in commit C! So we run git log to find the actual hash ID of commit C, then run:
git cherry-pick <hash-of-C>
to copy that commit.
In order to do the copying—to find out what changed between parent commit P and child commit C—Git will run the same git diff --find-renames that we saw above with git merge. But that just gets their change. In order to apply their change to our commit, Git will first run another git diff --find-renames, this time comparing parent P with our current / HEAD commit T.
In other words, Git runs:
git diff --find-renames <hash-of-P> <hash-of-T> # what we changed
git diff --find-renames <hash-of-P> <hash-of-C> # what they changed
and now Git combines the changes, using the same merge engine as usual (-s recursive), and applies the combined changes to the snapshot in P. This preserves our work, and adds their change. Commit P becomes the merge base, and commit T is the --ours while C is the --theirs.
Merge conflicts, if any occur, are because of these two git diff operations. If they do occur, index slot #1 contains files from the merge base P, slot #2 contains ours from T, and slot #3 contains theirs from T. The --ours option to git checkout makes sense, because T really is our commit. The -X ours option makes sense, because T is our commit.
Rebase
As mentioned above, the way git rebase works is to list out the commit hash IDs of some series of commits that need to be copied. Then it uses Git's detached HEAD mode to check out one particular commit. For illustration, let's draw a small rebase with just three commits to do:
C--D--E <-- branch (HEAD)
/
...--B--F--G <-- mainline
Here, the commits we'd like copied are C, D, and E. The old base was commit B. Commits F and G got added to the mainline branch. So we run:
git checkout branch
git rebase mainline
Git uses the current commit E and works backwards to find the three commits to copy, while using the name mainline and working backwards to find that commit B is the shared commit at which the copying stops. Then, Git uses the name mainline to get into detached HEAD mode:
C--D--E <-- branch
/
...--B--F--G <-- HEAD, mainline
Git is now ready to copy commit C. Internally, at this point, Git runs git cherry-pick <hash-of-C> and git cherry-pick does its thing.
If all goes well, the "merge" that cherry-pick runs works: Git compares base B with "our" commit G, compares base B with "their" commit C, combines the two differences on top of commit B, and makes a new commit that we will call C':
C--D--E <-- branch
/
...--B--F--G <-- mainline
\
C' <-- HEAD
Git now repeats this with commit D. The "merge" uses commit C as its merge base, C' as --ours, and D as --theirs. Git combines the changes, applies the combined changes to existing commit C', and makes new commit D':
C--D--E <-- branch
/
...--B--F--G <-- mainline
\
C'-D' <-- HEAD
Git now cherry-picks E: D is the merge base, D' is --ours, and E is --theirs, and the new commit completes the copying process:
C--D--E <-- branch
/
...--B--F--G <-- mainline
\
C'-D'-E' <-- HEAD
With the copying done, git rebase now only needs to yank the name branch off the old tip commit E, and make it point to the commit that HEAD currently names, i.e., E', and re-attach HEAD to make everything look normal:
C--D--E [abandoned]
/
...--B--F--G <-- mainline
\
C'-D'-E' <-- branch (HEAD)
Note what --ours means
During the cherry-picking part of a rebase, --ours referred to:
commit G, at first
then commit C',
and then commit D'.
So --ours refers first to their commit G, then to our own commits as built on the new branch.
The --theirs commits were, in order, C, then D, then E. So --theirs refers to our commits, always.
The merge base commits were, in order, B, then C, then D. There's no --base option to refer to these, but the first one was "their" commit and the other two were ours.
If we want to override "their" (mainline) branch changes, then, we need to use --theirs, not --ours, most of the time.

Related

Pentaho "Make the transformation database transactional" plus commit frequency

By using Make the transformation database transactional property, If I get it right, a single commit is done at the end of the transformation (or rollback if there is an error or an abort)
However, the Commit size is still avaiable on the table output step, for example.
Is the Commit size value ignored on this cases? How does the Commit size work in combination with Make the transformation database transactional? (Will there be a single commit or multiple commits?)
I'm pretty sure that End result will be the same.
The whole execution will make the Batch commits, but if any of them fail, the entire execution will not be commited.
I cannot atest to this performance EXACTLY, but i can atest to the end result, Checking the 'Make the transformation Database Transactional' will effectively execute what you want to.

Vimdiff equivalent for Select Lines

When used as a mergetool for Git, what is the equivalent in vimdiff to kdiff3's "Select Lines(s) From A/B/C"? Is there a shortcut for that like Ctrl+1/2/3 in kdiff3?
Based on the Vim Reference Manual section for vimdiff, there are no built-in commands with the full functionality of Ctrl+1/2/3 in vimdiff. What I mean by "full functionality" is that in kdiff3 you could do the commands Ctrl+2, Ctrl+3, Ctrl+1 in that order, and in the merged version you end up with the diff lines from buffer B followed by the lines from buffer C followed by the lines from buffer A.
There is, however, a command for performing a more limited version of the functionality available in kdiff3. If you only want to use lines from one of your input files, then the command [count]do is available, where [count] is typically 1,2, or 3 depending on which vim buffer you want to pull the lines from. (do stands for "diff obtain".)
For example, if you had the following merge situation:
then you could move your cursor to the merge conflict in the bottom buffer and type 1do if you wanted "monkey", 2do if you wanted "pig", or 3do if you wanted "whale".
If you do need to grab lines from multiple buffers when merging with vimdiff, then my recommendation would be to set the Git config option merge.conflictstyle to diff3 (git config merge.conflictstyle diff3) so that the common ancestor appears in the merged buffer of the file, as shown in the screenshot above. Then just move the lines around to your liking using vim commands and delete the diff notations and any unused lines.

github api to compare commits, response status is diverged

When configuring jenkins, I want to detect feature branches whether they have merging conflicts, so I use github api v3 to test on 2 intentional conflicted branches.
After merge branch1 to master, I compared branch2(b2) like this:
curl -i https://api.github.com/repos/hao1987/myself/compare/hao1987:master...hao1987:b2
and it returns a long json which has an attribute:
"status": "diverged"
I wonder if that means conflict, and where I can try types of "status"
This isn't documented (sorry!), but status can be one of four things:
"diverged" = commits were introduced on both the head and base branch since the common ancestor
"ahead" = commits were introduced on head after the common ancestor with base
"behind" = commits were introduced on base after the common ancestor with head
"identical" = branches point to same commit
So, "diverged" doesn't tell you whether a merge between the branches would result in merge conflicts.

What are the asterisks for in the output of tf merge

tf merge /recursive /candidate $/foo/ExUI $/bar/ExUI
Changeset Author Date
--------- -------------------------------- ----------
23438 Fred_Bloggs 04/05/2010
23609 Jimmy_jones 11/05/2010
23943* John_doe 25/05/2010
Can anyone explain what the asterisk is for in the above output. I assumed that it indicates changesets that include changes outside the scope of the current query and that are 'partial', but on investigation this is not the case.
Any offers? It would be really helpful if there were a way to identify such changesets. The current plan is that we are having to dump out the contents of each changeset to file and inspect manually for any changes outside the scope.
You should read the developer support teams post on Partial Merges in TFS – A Guide.
This could mean:
In the output you will see an asterisk
(*) next to changeset 138 which
indicates that it is a partial merge.
That means that only part of changeset
138 has been merged into changeset
139. If we take a look at the merge candidates from Dev to Main, we will
see that even though we already merged
changeset 138 to Main, it is still a
merge candidate. This is caused by the
fact that the merge engine detected
that there are still some changes in
changeset 138 which were not
propagated from Dev to Main.
I would read the whole post and see if it answeres your question.

TFS: Merging back into main branch

We have a Current branch where the main development happens. For a while I have been working on something kind of experimental in a separate branch. In other words I branched what I needed from the Current branch into an Experimental branch. While working I have regularly merged Current into Experimental so that I have the changes others have made, so that I am sure what I make work with their changes.
I now want to merge back into Current. First I merged Current into Experimental, compiled and made sure everything was working. So in my head, Experimental and Current should be "in sync". But when I try to merge Experimental back into Current, I get a whole bunch of conflicts. But I thought I had already kind of solved those when I merged Current into Experimental.
What is going on? Have I totally misunderstood something? How can I do this smoothly? Really don't want to go through all of those conflicts...
When you click Resolve on an individual conflict, what does the summary message say? If your merges from Current -> Experimental were completed without major manual work, it should be something like "X source, 0 target, Y both, 0 conflicting." In other words, there are no content blocks in the target (Current) file that aren't already in the source branch's copy (Experimental). You can safely use the AutoMerge All button.
Note: AutoMerge should be safe regardless. It's optimized to be conservative about early warnings, not for the ability to solve every case. But I recognize that many of us -- myself included -- like to fire up the merge tool when there's any question. In the scenario described, IMO, even the most skittish can rest easy.
Why is there a conflict at all? And what if the summary message isn't so cut & dry? Glad you asked :) Short answer - because the calculation that determines the common ancestor ("base") of related files depends heavily on how prior merge conflicts between them were resolved. Simple example:
set up two branches, A and B.
make edits to A\foo.cs and B\foo.cs in separate parts of the file
merge A -> B
AutoMerge the conflict
merge B -> A
TFS must flag this sequence of events as conflicting. The closest common ancestor between B\foo.cs;4 and A\foo.cs;2 lies all the way back at step 1, and both sides have obviously changed since then.
It's tempting to say that A & B are in sync after step 4. (More precisely: that the common ancestor for step 5's merge is version #2). Surely a successful content merge implies that B\foo.cs contains all the changes made to date? Unfortunately there are a number of reasons you cannot assume this:
Generality: not all conflicts can be AutoMerged. You need criteria that apply to both scenarios.
Correctness: even when AutoMerge succeeds, it doesn't always generate valid code. A classic example arises when two people add the same field to different parts of a class definition.
Flexibility: every source control user has their own favorite merge tools. And they need the ability to continue development/testing between the initial Resolve decision ["need to merge the contents somehow, someday"] and the final Checkin ["here, this works"].
Architecture: in a centralized system like TFS, the server simply can't trust anything but its own database + the API's validation requirements. So long as the input meets spec, the server shouldn't try to distinguish how various types of content merges were performed. (If you think the scenarios so far are easily distinguished, consider: what if the AutoMerge engine has a bug? What if a rogue client calls the webservice directly with arbitrary file contents? Only scratching the surface here...servers have to be skeptical for a reason!) All it can safely calculate is you sent me a resulting file that doesn't match the source or target.
Putting these requirements together, you end up with a design that lumps our actions in step 4 into a fairly broad category that also includes manual merges resulting from overlapping edits, content merges [auto or not] provided by 3rd party tools, and files hand-edited after the fact. In TFS terminology this is an AcceptMerge resolution. Once recorded as such, the Rules of Merge(tm) have to assume the worst in pursuit of historical integrity and the safety of future operations. In the process your semantic intentions for Step 4 ("fully incorporate into B every change that was made to A in #2") were dumbed down to a few bytes of pure logic ("give B the following new contents + credit for handling #2"). While unfortunate, it's "just" a UX / education problem. People get far angrier when the Rules of Merge make bad assumptions that lead to broken code and data loss. By contrast, all you have to do is click a button.
FWIW, there are many other endings to this story. If you chose Copy From Source Branch [aka AcceptTheirs] in step 4, there would be no conflict in step 5. Ditto if you chose an AcceptMerge resolution but happened to commit a file with the same MD5 hash as A\foo.cs;2. If you chose Keep Target [aka AcceptYours] instead, the downstream consequences change yet again, though I can't remember the details right now. All of the above get quite complex when you add other changetypes (especially Rename), merge branches that are far more out of sync than in my example, cherry pick certain version ranges and deal with the orphans later, etc....
EDIT: as fate would have it, someone else just asked the exact same question on the MSDN forum. As tends to be my nature, I wrote them another long answer that came out completely different! (though obviously touching on the same key points) Hope this helps: http://social.msdn.microsoft.com/Forums/en-US/tfsversioncontrol/thread/e567b8ed-fc66-4b2b-a330-7c7d3a93cf1a
This has happened to me before. When TFS merges Experimental into Current, it does so using the workspaces on your hard drive. If your Current workspace is out of date on your local computer, TFS will get merge conflicts.
(Experimental on HD) != (Current in TFS) != (Old Current on HD)
Try doing a forced get of Current to refresh your local coppy of Current and try the merge again.
You probably have lines like this before you start the merge...
Main branch - Contains code A, B, C
Current branch - Contains code A, B, C, D, E
Experimental branch - Contains code A, B, C, D, F, G, H
When you push from Current to Exp, you are merging feature E into the experimental branch.
When you then push from Exp to Current, you still have to merge F, G, and H. This is where your conflicts are likely rooted.
----Response to 1st comment----
Do you auto merge, or use the merge tool?
What is an example of something that is "in conflict"?

Resources