ISO version control file merge tool (merge RCS/CVS ,v files)

ISO version control file merge tool (merge RCS/CVS ,v files)

Post by Andy Gle » Fri, 24 Dec 2004 10:03:02



---++ VC Merge

So here is what I want: given two version control files,
supposedly for the same file,
merge them.

The merge may range from trivial to fancy.
All merges should preserve all history and comments;
however, some merges may do better than others at recognizing commonality
between ostensibly different versions.
Similarly, some merges may be more space efficient than others


---++ Commonalities and Differences between VC Tools

I'll talk about this as if it is a tool that merges CVS/RCS ,v files.
I believe it should generalize fairly well to other VC tools,
but what I need now is CVS. CVS is my legacy VC system.

Pretty much all tools understand
* files
* versions of files
* branches
* check in comments

Some tools handle groups of files together, but I don't think this matters
to the discussion here.

Some tools don't really have a concept of a branch
- notably Subversion, where a branch is just a different directory
in the Subversion "filesystem".
However, in Subversion you can trace ancestry, and figure out
branching, although it can be hard to distinguish branching in the Subversion VC system
from moving a file to a different location in irectory hierarchy.

Some tools understand more complicated graphs than simple branching:
they can track cross-links, branches that merge back, etc.
I think this can be handled.

Some tools get confused if a filename has different types in different versions
- e.g. CVS, change from a binary to a text file.
Other tools handle this - e.g. MetaCVS, giving a file a unique name at initial checkin.
If you delete and then re-add a file, it gets a new unique name.
Some tools totally separate names from content - e.g. Monotone.

For all tools, it should be possible to write code that traverses the
tree, handling branches depth-first, bread-first, or whatever.

---+++ Trivial Merge - No Sharing

The most trivial merge replicates all branches, and doesn't try to share anything.

Input:

<verbatim>
repo1
A->v1.1->v1.2->v1.3
+->v1.1.1.1->v1.1.1.2
</verbatim>

<verbatim>
repo2
A->v1.1->v1.2->v1.3->v1.4
+->v1.1.1.1
</verbatim>

Output: union of all branches and versions
<verbatim>
A-+->repo1branch->v1.1->v1.2->v1.3
| +->v1.1.1.1->v1.1.1.2
+->repo2branch->v1.1->v1.2->v1.3->v1.4
+->v1.1.1.1
</verbatim>

This exposes some minor issues:
* in a CVS-like system, who gets the main branch
* A: option. Default nobody
* what about tags and labels
* Default: all made unique by, e.g. adding a prefix.
* Advanced: may want to recognize sharability

---+++ Types of Sharing

Same data, same version:
In the above example, repo1/A v1.1, v1.2, may be the same as
repo2/A v1.1 and v1.2.

Same data, different versions:
repo1/A v1.3 may be the same as repo2/A v1.4.
i.e. the main branch of repo1 may have skipped a version
that was on repo2's main branch.
(It's just accidental that I say main branch for this
example.)

---+++ Trivial Merge - Recognize Sharing

A trivial merge might recognize sharing, but not change the trivial
concatenation of all branches.
E.g. it would still have the union of all branches and versions,
but might add comments indicating commonality.

* fileA
* repo1branch