Qualitas.class Corpus

Qualitas.class Corpus is a compiled version of the Qualitas Corpus. It provides compiled Eclipse projects for the 111 Java systems included in the last release of the corpus.

Although the original Qualitas Corpus has provided a valuable contribution for experimentation in software engineering, there are several scenarios—e.g., experiments that rely on Abstract Syntax Tree (AST) or bytecode—in which researchers need to import and compile the source code. Since this task is not trivial in the case of systems with many external dependencies, our goal is to assist researchers by removing the compilation effort when conducting empirical studies.

Compiled Corpus

Qualitas.class Corpus contains more than 18 million LOC, 200K compiled classes, and 1.5 million compiled methods.

Note: The compiled Eclipse projects can be found in the Download section.

Metrics

As another contribution, for the 111 systems, the Qualitas.class Corpus includes the values of the following 23 source code metrics measured at the level of classes:

Basic Metrics:
- Number of Lines of Code (LOC)
- Number of Packages (NOP)
- Number of Classes (NOCL)
- Number of Interfaces (NOI)
- Number of Methods (NOM)
- Number of Attributes (NOA)
- Number of Overridden Methods (NORM)
- Number of Parameters (PAR)
- Number of Static Methods (NSM)
- Number of Static Attributes (NSA)
CK Metrics:
- Weighted Methods per Class (WMC)
- Depth of Inheritance Tree (DIT)
- Number of Children (NOC)
- Lack of Cohesion in Methods (LCOM HS)

Complexity Metrics:
- Method Lines of Code (MLOC)
- Specialization Index (SIX)
- McCabe Cyclomatic Complexity (VG)
- Nested Block Depth (NBD)
- Normalized Distance (RMD)
Coupling Metrics:
- Afferent Coupling (CA)
- Efferent Coupling (CE)
- Instability (I)
- Abstractness (A)

Note: The metrics values on XML files can be found in the Download section.

Employed Tools: We relied on Google CodePro Analytix 7.1.0 and Metrics 1.3.8 to compute the metrics (using default parameters).

Example

In a summarized perspective, the figure below illustrates the distribution of the average for the subset of metrics. Basically, each circle represents a system and the figure indicates the overall average for each metric. For example, the MLOC metric ranged from 3.35 (fitlibraryforfitnesse) to 23.4 (jparse), but the overall average was indeed 7.88 ± 2.7.

Qualitas.class Corpus (A Compiled Version of the Qualitas Corpus)

Compiled Corpus

Metrics

Example