Return to search

Providing statistical inference to case-based software effort estimation

This thesis proposes a novel approach, called Analogy-X to extend and improve the classical data-intensive analogy approach for software effort estimation. The Analogy-X approach combines the notions of distance matrix correlation found in ecology literature and statistic analysis techniques to provide useful inferential statistics to support analogy-based systems. Data-intensive analogy for software effort estimation has been proposed as a viable alternative to other prediction methods such as linear regression. In many cases, researchers found analogy outperformed algorithmic methods. However, the overall performance of analogy depends on the dataset quality or relevance of project cases to the target project, and the feature subset selected in the analogy-based model. Unfortunately, there is no mechanism to assess its appropriateness for a specific dataset, in most of the cases analogy will continue to execute regardless of the dataset quality. The Analogy-X approach is a set of procedures that utilize the principles of Mantel randomization test to provide inferential statistics to Analogy. Inspired by the Mantel correlation randomization test commonly used in ecology and psychology, Analogy-X uses the strength of correlation between the distance matrix of project features and the distance matrix of known effort values of the dataset to assess the suitability of the dataset for analogy, to identify the most appropriate feature subset, and to remove any atypical project cases from the dataset. The empirical studies show that Analogy-X is capable of: -- Detect extremely outlying project cases that will ultimately distort prediction outcomes using a sensitivity analysis strategy. -- Detect relevant project features that are useful to identify potential source analogues in a stepwise fashion similar to that of stepwise regression. -- Identifying whether analogy-based approach is appropriate for the dataset Analogy-X, thus is a robust solution, provides a sound statistical basis for analogy. It removes the need of using any forms of heuristic search and greatly improves its algorithmic performance. The studies also show that the Analogy-X approach is capable of removing the bottlenecks of performance in data-intensive analogy. The overall results obtained also suggest that a fully automated data-intensive analogy for software effort estimation can be implemented using the Analogy-X approach, and it is indeed an effective front end to analogy-based systems. The contribution of this work is significant since it provides an approach that will have major impact on the evolution of data-intensive analogy-based and case-based reasoning systems.

Identiferoai:union.ndltd.org:ADTP/257339
Date January 2007
CreatorsKeung, Wai, Computer Science & Engineering, Faculty of Engineering, UNSW
Source SetsAustraliasian Digital Theses Program
LanguageEnglish
Detected LanguageEnglish
Rightshttp://unsworks.unsw.edu.au/copyright, http://unsworks.unsw.edu.au/copyright

Page generated in 0.0019 seconds