Compiled Corpus
Qualitas.class Corpus contains more than 18 million LOC, 200K compiled classes, and 1.5 million compiled methods.
Note: The compiled Eclipse projects can be found in the Download section.
Qualitas.class Corpus is a compiled version of the Qualitas Corpus. It provides compiled Eclipse projects for the 111 Java systems included in the last release of the corpus.
Although the original Qualitas Corpus has provided a valuable contribution for experimentation in software engineering, there are several scenarios—e.g., experiments that rely on Abstract Syntax Tree (AST) or bytecode—in which researchers need to import and compile the source code. Since this task is not trivial in the case of systems with many external dependencies, our goal is to assist researchers by removing the compilation effort when conducting empirical studies.
Qualitas.class Corpus contains more than 18 million LOC, 200K compiled classes, and 1.5 million compiled methods.
As another contribution, for the 111 systems, the Qualitas.class Corpus includes the values of the following 23 source code metrics measured at the level of classes:
|
|
In a summarized perspective, the figure below illustrates the distribution of the average for the subset of metrics. Basically, each circle represents a system and the figure indicates the overall average for each metric. For example, the MLOC metric ranged from 3.35 (fitlibraryforfitnesse) to 23.4 (jparse), but the overall average was indeed 7.88 ± 2.7.