Does git merge -s ours strategy preserve the commit from garbage collecting?

In our project we have a model comparator that checks the performance of the code and saves the statistics together with the git hash of the commit. The problem is that in many cases this commit becomes orphaned later, and we cannot retrieve the commit just using the git hash.

I've proposed a solution to merge the commit on a special shelter branch right after saving the statistics. That would allow us to have a single branch that is responsible for keeping the commits alive (this branch can even be made restricted to prevent losing the commits after the merge). This solution however opens another problem with possible merge conflicts that are undesirable while automatic merge from script; my final idea is to use the ours strategy for merging: git merge -s ours <hash>. This merge strategy doesn't take a single line from the commit we are merging, and just makes a dummy commit that belongs to the shelter while being a dependent of the commit that otherwise may become orphaned.

To convince the team that this strategy would work I need to provide a written evidence from the git reference that clearly states that even the -s ours strategy preserves the commit from garbage collecting (the fact that this strategy doesn't take a single line from the source commit introduces some uncertainty). So far I've found a vague statement:

git gc tries very hard not to delete objects that are referenced anywhere in your repository. In particular, it will keep not only objects referenced by your current set of branches and tags, but also objects referenced by the index, remote-tracking branches, notes saved by git notes under refs/notes/, reflogs (which may reference commits in branches that were later amended or rewound), and anything else in the refs/* namespace. If you are expecting some objects to be deleted and they aren’t, check all of those locations and decide whether it makes sense in your case to remove those references.

This quote doesn't say anything about different merge strategies, and I expect a devil in the detail. For example, git merge --squash, being a "merge", doesn't preserve the dependencies.

My question is whether there is an explicit statement that resolves the ambiguity for the git merge -s ours strategy.

3 answers

  • answered 2020-10-21 21:19 mkrieger1

    Long question, short answer: yes.

    Slightly longer answer: Using the ours strategy only affects the file contents that are used in the merge commit (those from the branch merged into). But the commit from the other branch whose contents are discarded is still one of the parent commits of the newly created merge commit, therefore reachable and not garbage collected.

    In fact, this is in my view one of the primary use cases for the ours merge strategy; to keep a reference to a series of commits that did not make it into the final product.

    To convince your team, you can just do the merge and show them the commit history in a graphical repository viewer like gitk. If the merged commits are visible, they will not be garbage collected.

  • answered 2020-10-21 23:03 jthill

    For example, git merge --squash, being a "merge", doesn't preserve the dependencies.

    That's the only option that doesn't preserve ancestry on success, so it's the only option that discusses the resulting (lack of added) ancestry.

    My question is whether there is an explicit statement that resolves the ambiguity for the git merge -s ours strategy.

    In all other cases, the merged history is an ancestor of the merge result. You can verify this by inspection of the docs. There's no ambiguity about the -s/--strategy option docs, that option doesn't affect the resulting ancestry at all, only how the resulting contents are arrived at, so that's all that it discusses.

  • answered 2020-10-22 02:52 Mark Adelsberger

    All merges preserve all of their parents. It doesn't matter what strategy is used.

    What you say about merge --squash is misleading. When you say it doesn't preserve commits from the target branch, you're right; but when you say "being a 'merge'"... it isn't. Just like a fast-forward isn't a merge. A merge is a commit with more than one parent; not all merge commands produce one.

    Specifically, if you're on master and you git merge --squash dev, what --squash does is to omit the second parent pointer on the resulting commit (which would be the pointer to dev in a real merge). That's why the result doesn't preserve commits from dev if they later become orphaned - since the resulting commit doesn't point to them, they aren't part of its history.

    But changing the merge strategy does not change the parent pointers. At any time you can observe that the merged-in commit is part of the history (regardless of whether any content was taken from it) by looing at the output of git log on the resulting branch (or a gitk graph if you'd like a more visual representation); and that means it has to be preserved.

    That said, any ref will preserve the full history of the commit it points to - not just a branch. It seems to me that tagging the affected commits would be a more straightforward / less confusing way to do it. That is, this is more in line with what a lightweight tag might be expected to be used for, and not so much in line with what a branch would be used for. (If you object to using tags because you use them for something else and don't want to clutter that namespace, you could use a custom ref namespace of some sort and it would still seem more straightforward to me.)