Preprocessor-Based Variability in Open-Source and Industrial Software Systems: An Empirical Study


Almost any sufficiently complex software system today is configurable. Conditional compilation is a simple variability-implementation mechanism that is widely used in open-source projects and industry. Especially, the C~preprocessor CPP is very popular in practice, but it is also gaining (again) interest in academia. Although there have been several attempts to understand and improve CPP, there is a lack of understanding of how it is used in open-source and industrial systems and whether different usage patterns have emerged. The background is that much research on configurable systems and product lines concentrates on open-source systems, simply because of they are available for study in the first place. This leads to the potentially problematic situation that it is unclear whether the results obtained from these studies are transferable to industrial systems. We aim at lowering this gap by comparing the use of CPP in open-source projects and industry, based on a substantial set of subject systems and well-known variability metrics, including size, scattering, and tangling metrics. A key result of our empirical study is that, regarding almost all aspects we studied, open-source systems and industrial systems are similar, including systems that have been developed in industry and made open source at some point. So, there is evidence that, regarding CPP as variability-implementation mechanism, insights, methods, and tools developed based on studies of open-source systems are transferable to industrial systems.


cppstats is a tool for analyzing software systems regarding their variability. It can be used to obtain information about the usage of CPP in C software projects. The official homepage of the tool can be found under
For the measurements in the context of our study, we extended cppstats and applied it to all of the subject systems. Our modified version is available on Github:

Sample Systems

Open-source systems (OSS)

Formerly closed-source systems
(first versions FCS1;
current versions FCS)

7 industrial systems (IS)



Size metrics
LOC Lines of normalized code
LOFLines of normalized CPP-annotated code
PLOFRelative fraction of CPP-annotated code (LOF/LOC)
VPNumber of variation points (#ifdef blocks)
CCNumber of configuration constants
LOF#ifdefaverage number of LOF within each VP
Scattering metrics
SD#ifdefAverage number of #ifdefs per CC
SDfileAverage number of files per CC
Tangling metrics
TD#ifdefAverage number of CCs per #ifdef
TDfileAverage number of CCs per file
Nesting metrics
NDavgAverage nesting depth of #ifdefs
NDmaxMaximum nesting depth of #ifdefs

Data Visualization

fig12 fig12
fig12 fig12
fig12 fig12
fig12 fig12


A table with the raw data is provided on a subpage.

The data is also available for download below.



Valid XHTML 1.1 Valid CSS!