Return to search

Bayesian statistical models for predicting software effort using small datasets

The need of today�s society for new technology has resulted in the development of a growing number of software systems. Developing a software system is a complex endeavour that requires a large amount of time. This amount of time is referred to as software development effort. Software development effort is the sum of hours spent by all individuals involved. Therefore, it is not equal to the duration of the development.
Accurate prediction of the effort at an early stage of development is an important factor in the successful completion of a software system, since it enables the developing organization to allocate and manage their resource effectively. However, for many software systems, accurately predicting the effort is a challenge. Hence, a model that assists in the prediction is of active interest to software practitioners and researchers alike.
Software development effort varies depending on many variables that are specific to the system, its developmental environment and the organization in which it is being developed. An accurate model for predicting software development effort can often be built specifically for the target system and its developmental environment. A local dataset of similar systems to the target system, developed in a similar environment, is then used to calibrate the model.
However, such a dataset often consists of fewer than 10 software systems, causing a serious problem in the prediction, since predictive accuracy of existing models deteriorates as the size of the dataset decreases.
This research addressed this problem with a new approach using Bayesian statistics. This particular approach was chosen, since the predictive accuracy of a Bayesian statistical model is not so dependent on a large dataset as other models. As the size of the dataset decreases to fewer than 10 software systems, the accuracy deterioration of the model is expected to be less than that of existing models. The Bayesian statistical model can also provide additional information useful for predicting software development effort, because it is also capable of selecting important variables from multiple candidates. In addition, it is parametric and produces an uncertainty estimate.
This research developed new Bayesian statistical models for predicting software development effort. Their predictive accuracy was then evaluated in four case studies using different datasets, and compared with other models applicable to the same small dataset.
The results have confirmed that the best new models are not only accurate but also consistently more accurate than their regression counterpart, when calibrated with fewer than 10 systems. They can thus replace the regression model when using small datasets. Furthermore, one case study has shown that the best new models are more accurate than a simple model that predicts the effort by calculating the average value of the calibration data. Two case studies has also indicated that the best new models can be more accurate for some software systems than a case-based reasoning model.
Since the case studies provided sufficient empirical evidence that the new models are generally more accurate than existing models compared, in the case of small datasets, this research has produced a methodology for predicting software development effort using the new models.

Identiferoai:union.ndltd.org:ADTP/217760
Date January 2007
CreatorsVan Koten, Chikako, n/a
PublisherUniversity of Otago. Department of Information Science
Source SetsAustraliasian Digital Theses Program
LanguageEnglish
Detected LanguageEnglish
Rightshttp://policy01.otago.ac.nz/policies/FMPro?-db=policies.fm&-format=viewpolicy.html&-lay=viewpolicy&-sortfield=Title&Type=Academic&-recid=33025&-find), Copyright Chikako Van Koten

Page generated in 0.0017 seconds