Almost any sufficiently complex software system today is configurable. Conditional compilation is a simple variability-implementation mechanism that is widely used in open-source projects and industry. Especially, the C~preprocessor CPP is very popular in practice, but it is also gaining (again) interest in academia. Although there have been several attempts to understand and improve CPP, there is a lack of understanding of how it is used in open-source and industrial systems and whether different usage patterns have emerged. The background is that much research on configurable systems and product lines concentrates on open-source systems, simply because of they are available for study in the first place. This leads to the potentially problematic situation that it is unclear whether the results obtained from these studies are transferable to industrial systems. We aim at lowering this gap by comparing the use of CPP in open-source projects and industry, based on a substantial set of subject systems and well-known variability metrics, including size, scattering, and tangling metrics. A key result of our empirical study is that, regarding almost all aspects we studied, open-source systems and industrial systems are similar, including systems that have been developed in industry and made open source at some point. So, there is evidence that, regarding CPP as variability-implementation mechanism, insights, methods, and tools developed based on studies of open-source systems are transferable to industrial systems.
cppstats is a tool for analyzing software systems
regarding their variability. It can be used to obtain
information about the usage of CPP
in C software projects. The official homepage of
the tool can be found
under http://fosd.net/cppstats.
For the measurements in the context of our study, we
extended cppstats and applied it to all of the
subject systems. Our modified version is available on
Github: https://github.com/clhunsen/cppstats.
Metrics | Description |
---|---|
Size metrics | |
LOC | Lines of normalized code |
LOF | Lines of normalized CPP-annotated code |
PLOF | Relative fraction of CPP-annotated code (LOF/LOC) |
VP | Number of variation points (#ifdef blocks) |
CC | Number of configuration constants |
LOF#ifdef | average number of LOF within each VP |
Scattering metrics | |
SD#ifdef | Average number of #ifdefs per CC |
SDfile | Average number of files per CC |
Tangling metrics | |
TD#ifdef | Average number of CCs per #ifdef |
TDfile | Average number of CCs per file |
Nesting metrics | |
NDavg | Average nesting depth of #ifdefs |
NDmax | Maximum nesting depth of #ifdefs |
A table with the raw data is provided on a subpage.
The data is also available for download below.