Semistructured Merge: Rethinking Merge in Revision Control Systems

Overview

This website contains the results of an empirical study on semistructured and unstructured merge with regard to their abilities to resolve merge conflicts. The two merge approaches, the tool, and the empirical study are described in detail in a scientific paper.

The study was conducted using the tool FeatureHouse, which implements (amongst others) the two merge approaches. To compare semistructured and unstructured merge, FeatureHouse creates for each two revisions of a file (and its common base revision) two new output files, one containing the result of unstructured merge and one containing the result of semistructured merge.

Distribution

The distribution contains a binary (jar file) with the merge tool, sample projects that we used to evaluate our approach (examples), scripts to analyze the output of the merge tool (evaluation), and a scientific paper describing the approach.

Binary: fse2011_artifact_binary.tar.gz (merge tool)
Examples: fse2011_artifact_examples.tar.gz (examples)
Complete: fse2011_artifact_complete.tar.gz (merge tool, examples, evaluation, and scientific paper)
Virtual Machine (VirtualBox): fse2011_artifact_virtual_machine.tar.gz (merge tool, examples, evaluation, and scientific paper)

How To

FeatureHouse expects the input revisions listed in a file, containing the directories of the revisions to merge in top-down order: first revision, base revision, second revision (three-way merge). Detailed information on how to use the tool and how to analyze the results can be found in the distribution (README.TXT).

In a nutshell, the merge algorithm is applied by invoking FeatureHouse as follows:

java -cp featurehouse.jar merger.FSTGenMerger --expression <revisions file>

For example, merging the revisions 4676 and 4998 of the sample project jEdit, one has to invoke:

java -cp featurehouse.jar merger.FSTGenMerger --expression ../examples/jEdit/rev4676-4998/rev4676-4998.revisions

Note, as a prerequisite, Linux's merge tool has to be installed on the system. Beware a single run can take considerable time (hours for large projects).

After merging, the results are stored in a corresponding folder, for the above example, in folder ../examples/jEdit/rev4676-4998/rev4676-4998.

Sample Systems

Here, we list all sample systems that we used to compare unstructured and semistructured merge. In total, we based our study on 24 software projects written in three different languages.

Java

Python

Each software project comprises one or more merge scenarios (triples of revisions to merge). Each merge scenario contains three folders, one for each revision (first, base, second) and a revision file listing the merge order.

Analysis

We used a number of scripts to analyze the output of the merge process. First, we sum up all merge conflicts that occur in a project (using ./writeAllResults-JAVA.sh, ./writeResults-PY.sh, and ./writeResults-CS.sh). The scripts create two .csv files (one for unstructured merge and one for semistructured merge):

./writeAllResults-JAVA.sh ../examples/jEdit/

Then, all results and diagrams are computed with statistics tool R. The R scripts (Diagrams.R and Percentages.R) are applied to the .csv files and create corresponding bar diagrams and tables:

R -f Diagrams.R

Below, we provide a set of diagrams containing information on the number of conflicts, conflicting lines of code, conflicting files, and semantic conflicts of all sample projects. Please click on the project names or diagrams to display the project's data collected during the empirical study.

AutoWikiBrowser	Language: C-Sharp

BitPim	Language: Python

CruiseControl.NET	Language: C-Sharp

DrJava	Language: Java

emesene	Language: Python

Eraser	Language: C-Sharp

eXe	Language: Python

FireIRC IRC Client	Language: C-Sharp

Freecol	Language: Java

GenealogyJ	Language: Java

iFolder	Language: C-Sharp

iText	Language: Java

JabRef	Language: Java

jEdit	Language: Java

JFreeChart	Language: Java

Jmol	Language: Java

matplotlib	Language: Python

NASA WorldWind	Language: C-Sharp

PMD	Language: Java

Process Hacker	Language: C-Sharp

RSS Bandit	Language: C-Sharp

SpamBayes	Language: Python

SQuirreL SQL	Language: Java

Wicd	Language: Python

Contact

For more information about the project, please contact:

Sven Apel (University of Passau, Germany)
Jörg Liebig (University of Passau, Germany)
Christian Kästner (Philipps University Marburg, Germany)