Pharo Dataset

Pharo Dataset is a dataset of source code metrics collected from several systems. The dataset includes measures of the following Pharo application:

Systems Description NOC
ACD Event debugger 12
Announcements Object dependency framework 12
Arki Engine for building reports 19
Artefact Framework to generate PDF documents 106
AsmJit Assembler to generate machine code for x86/x64 architectures 40
AST Refactoring engine 70
Athens  A vector-based graphics framework  109
Balloon Special rendering engine 25
ClassOrganizer Module system for Pharo classes 6
CodeImport 11
Collections Collection library 83
CommandShell Command line scripts framework 16
Compiler The Pharo legacy compiler 46
Compression Compression protocols 26
Debugger Pharo debugger 35
DeepTraverser Graph tools 6
Dynamix-Core Dynamic analysis framework 10
EyeSee Visualization framework 66
Fame Meta model used to describe other model 66
Files New Library to deal with files 15
FileSystem Library for dealing with files 47
FreeType Library to read font data and rendering used in layout 41
Fuel Binary object graph serializer 71
Glamour Engine for scripting interactive browsers 277
Gofer A tool to work with groups of Monticello packages 28
Graph-ET Charting framework 29
GraphET2 Library for creating charts 53
Graphics Framework for rendering
Grease Dialects library to run
the Seaside web application
GroupManager Extension of Nautilus to handle groups 21
GT Toolkit for Pharo 120
Hashtable Application that uses chaining for collision resolution 17
HelpSystem Developer tools 17
HudsonBuildTools20 Utility classes for the integration continue with Jenkins 8
Kernel The kernel of the Pharo runtime 137
KeyChain A User management system 21
Keymapping Keyboard shortcut management 39
Komitter A tool to commit code with fine granularity 47
Magritte Application development frameworks 103
Manifest A package documentation library 13
Merlin Library to create wizards 75
Metacello Graphical tool to versioning 136
Monticello Source code versioning system 142
Moose Software and data analysis platform 300
Morphic Pharo’s graphical interface 202
Multilingual Support for Unicode and friends 37
NativeBoost A language-side approach to Foreign-Function-Interfaces (FFIs) 108
Nautilus The Pharo system code browser 112
NECompletion Developer tools 28
Network Low level network library 40
NOCompletion Developer tools 8
OpalCompiler Bytecode compiler for Pharo 73
PetitParser Parser framework 164
Polymorph Framework for creating UIs in code  154
ProfStef Application Development frameworks 12
Refactoring The source code Refactoring engine 223
Regex Regular expressions package in Pharo 30
Ring Model infrastructure designed for Pharo 27
Roassal Open visualization engine 244
Roassal2 Open visualization engine 277
RoelTyper A single method type inference 17
Rubric A deep refactoring of the Pharo text editor 87
Shout Developer tools 11
Slot Slots for Pharo 45
SmallDude  Duplication analysis 25
SmartSuggestions Developer tools 34
Spec A framework for describing and building user interfaces 182
Spotlight2 A spotlight-like morph for Pharo 28
SUnit Testing tool 28
System-Announcements Support for system events (e.g., class creation, deletion) 23
System-CommandLine A tool to help chopping command line input 11
Tabs Suppor for UI Tabs 13
Text A text editor 37
Tools Set of developers tools (e.g., debugger) 62
Trachel Object model for graphical widgets 61
Units-Core A simple package for units 13
XML XML Parser and Writter 135
Zinc HTTP networking protocol framework 96
Zodiac Regular and secure socket streams framework 11

Data and Tools

Dataset download:

For each system, the dataset includes the following source code metrics measured at the level of classes:

  1. Number of attributes (NOA)
  2. Number of methods (NOM)
  3. FAN-OUT
  4. Weighted Methods per Class (WMC)

Basically, for each system S and metric M, there is in the Pharo dataset a csv file whose lines represent the classes of S and whose columns represent M. A cell (C,M) in this file contains the value of the metric M, measured for the class c.

To derive relative thresholds use RTTool


  1. Paloma Oliveira, Marco Tulio Valente, Alexandre Bergel, and Alexander Serebrenik. Validating Metric Thresholds with Developers – an Early Result. Submitted to 31th International Conference on Software Maintenance and Evolution – Early Research Achievements (ICSME-ERA Track), pages 1-5, 2015.
  2. [PDF] Paloma Oliveira, Fernando Lima, Marco Tulio Valente, and Alexander Serebrenik. RTTOOL: A Tool for Extracting Relative Thresholds for Source Code Metrics. In 30th International Conference on Software Maintenance and Evolution (ICSME), Tool Demo Track, pages 1-4, 2014. [Bibtex]

      author = {Paloma Oliveira and Fernando Lima and Marco Tulio Valente and Alexander Serebrenik}, 
      title = {{RTTOOL}: A Tool for Extracting Relative Thresholds for Source Code Metrics}, 
      booktitle = {30th International Conference on Software Maintenance and Evolution (ICSME), Tool Demo Track}, 
      year = {2014}, 
      pages = {1--4}, 
  3. [PDF] Paloma Oliveira, Marco Tulio Valente, and Fernando Lima. Extracting Relative Thresholds for Source Code Metrics. In IEEE Conference on Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE), pages 254-263, 2014. [Bibtex]
      author = {Paloma Oliveira and Marco Tulio Valente and Fernando Lima}, 
      title = {Extracting Relative Thresholds for Source Code Metrics}, 
      booktitle = {IEEE Conference on Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE)}, 
      year = {2014}, 
      pages = {254--263}, 
      pdf = {}