Return to search

Software cost estimation with incomplete data

The construction of software cost estimation models remains an active topic of research. The basic premise of cost modeling is that a historical database of software project cost data can be used to develop a quantitative model to predict the cost of future projects. One of the difficulties faced by workers in this area is that many of these historical databases contain substantial amounts of missing data. Thus far, the common practice has been to ignore observations with missing data. In principle, such a practice can lead to gross biases, and may be detrimental to the accuracy of cost estimation models. In this paper, we describe an extensive simulation where we evaluate different techniques for dealing with missing data in the context of software cost modeling. Three techniques are evaluated: listwise deletion, mean imputation and eight different types of hot-deck imputation. Our results indicate that all the missing data techniques perform well, with small biases and high precision. This suggests that the simplest technique, listwise deletion, is a reasonable choice. However, this will not necessarily provide the best performance. We provide a decision tree to select the best performing missing data techniques depending upon the pattern, mechanism and percentage of missing data.

Identiferoai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:QMM.30749
Date January 2000
CreatorsStrike, Kevin D.
ContributorsMadhavji, N. H. (advisor)
PublisherMcGill University
Source SetsLibrary and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada
LanguageEnglish
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Formatapplication/pdf
CoverageMaster of Science (School of Computer Science.)
RightsAll items in eScholarship@McGill are protected by copyright with all rights reserved unless otherwise indicated.
Relationalephsysno: 001762213, proquestno: MQ64461, Theses scanned by UMI/ProQuest.

Page generated in 0.0015 seconds