Global ETD Search

21	Software defect prediction using maximal information coefficient and fast correlation-based filter feature selection Mpofu, Bongeka 12 1900 (has links) Software quality ensures that applications that are developed are failure free. Some modern systems are intricate, due to the complexity of their information processes. Software fault prediction is an important quality assurance activity, since it is a mechanism that correctly predicts the defect proneness of modules and classifies modules that saves resources, time and developers’ efforts. In this study, a model that selects relevant features that can be used in defect prediction was proposed. The literature was reviewed and it revealed that process metrics are better predictors of defects in version systems and are based on historic source code over time. These metrics are extracted from the source-code module and include, for example, the number of additions and deletions from the source code, the number of distinct committers and the number of modified lines. In this research, defect prediction was conducted using open source software (OSS) of software product line(s) (SPL), hence process metrics were chosen. Data sets that are used in defect prediction may contain non-significant and redundant attributes that may affect the accuracy of machine-learning algorithms. In order to improve the prediction accuracy of classification models, features that are significant in the defect prediction process are utilised. In machine learning, feature selection techniques are applied in the identification of the relevant data. Feature selection is a pre-processing step that helps to reduce the dimensionality of data in machine learning. Feature selection techniques include information theoretic methods that are based on the entropy concept. This study experimented the efficiency of the feature selection techniques. It was realised that software defect prediction using significant attributes improves the prediction accuracy. A novel MICFastCR model, which is based on the Maximal Information Coefficient (MIC) was developed to select significant attributes and Fast Correlation Based Filter (FCBF) to eliminate redundant attributes. Machine learning algorithms were then run to predict software defects. The MICFastCR achieved the highest prediction accuracy as reported by various performance measures. / School of Computing / Ph. D. (Computer Science) Defect prediction Feature selection Software metrics Relevant metrics Redundancy Machine learning algorithms Filter Wrapper Embedded Information theory 005.14 Software measurement Machine learning Embedded computer systems Information theory
22	Data selection for cross-project defect prediction Hosseini, S. (Seyedrebvar) 25 November 2019 (has links) Abstract Context: This study contributes to the understanding of the current state of cross-project defect prediction (CPDP) by investigating the topic in themes, with special focus on data approaches and covering search-based training data selection, by proposing data selection methods and investigating their impact. The empirical evidence for this work is collected through a formal systematic literature review method for the review, and from experiments on open source projects. Objective: We aim to understand and summarize the manner in which various data manipulation approaches are used in CPDP and their potential impacts on performance. Further, we aim at utilizing search-based methods to produce evolving training data sets to filter irrelevant instances from other projects before training. Method: Through a series of studies following the literature review of current state of CPDP, we propose a search-based method called genetic instance selection (GIS). We validate our initial findings by conducting the next study on a large set of data sets with multiple feature sets. We refine our design decisions using an exploratory study. Finally, we investigate an existing meta-learning approach, provide insights on its design and propose an alternative iterative data selection method. Results: The literature review reveals lower performances of CPDP in comparison with within project defect prediction (WPDP) models and provides a set of primary studies to be used as the basis for future research. Our proposed data selection methods make the case for search-based approaches considering their higher effectiveness and performance. We identified potential impacting factors on the effectiveness through the exploratory study and proposed methods to create better CPDP models. Conclusions: The proposal of numerous approaches in the literature over the last decade has led to progress in the area and the acquired knowledge and tools apply to many similar domains and can act as parts of academic curricula as well. Future directions of study can include searching for better validation data, better feature selection techniques, tuning the parameters of the search-based models, tuning hyper-parameters of learners, investigating the effects of multiple sources of optimization (learner, instances and features) and the impact of the class imbalance problem. / Tiivistelmä Tausta: Tämä tutkimus edistää projektienvälisten virheiden ennustamisen nykytilan ymmärtämistä (CPDP) tutkimalla aihetta teemoissa, keskittyen erityisesti tiedollisiin lähestymistapoihin ja hakuperusteisen harjoitusdatan valintaan esittelemällä datan valintamenetelmiä ja tutkimalla niiden vaikutuksia. Tämän työn empiirinen todistusaineisto on koottu muodollisella systemaattisella kirjallisuuskatsauksella ja avoimen lähdekoodin projekteissa tehdyillä kokeilla. Tavoite: Pyrimme ymmärtämään ja tiivistämään tavan, jolla erilaisia datan käsittelyn lähestymistapoja käytetään CPDP:ssa sekä niiden potentiaalisia vaikutuksia suorituskykyyn. Lisäksi, tavoitteenamme on hyödyntää hakuperusteisia menetelmiä muodostamaan kehittyviä harjoitusdata-settejä suodattamaan epäolennaisia esiintymiä muista projekteista ennen koulutusta. Menetelmä: CPDP:n nykytilan kirjallisuuskatsauksen jälkeen tehtyjen tutkimusten avulla ehdotamme hakuperusteista menetelmää, jota kutsutaan geneettisen esiintymän valinnaksi (GIS). Todistamme alustavat havaintomme suorittamalla seuraavan tutkimuksen suurella joukolla datasettejä, joilla on useita ominaisuuksia. Jalostamme suunnittelupäätöksiämme käyttäen tutkivaa tutkimusta. Lopuksi, tutkimme vallitsevaa meta-oppimisen lähestymistapaa ja tarjoamme näkemyksiä sen suunnitteluun ja ehdotamme vaihtoehtoista, toistuvaa datan valintamenetelmää. Tulokset: Kirjallisuuskatsaus paljastaa CPDP:n heikomman suorituskyvyn verrattuna projektinsisäisten virheiden ennustamisen (WPDP) malleihin ja tarjoaa joukon primaaritutkimuksia, joita voidaan käyttää perustana myöhemmälle tutkimukselle. Ehdottamamme datan valintamenetelmät puoltavat hakuperusteisten menetelmiä niiden paremman tehokkuuden ja suorituskyvyn vuoksi. Tunnistimme potentiaalisia tehokuuteen vaikuttavia tekijöitä tutkivien tutkimusten avulla ja ehdotimme metodeja parempien CPDP mallien luomiseksi. Johtopäätökset: Viime vuosikymmenten aikana kirjallisuudessa esitellyt lukuisat menetelmät ovat edistäneet alaa ja hankittu tieto ja työkalut soveltuvat monille samanlaisille alueille ja voivat toimia myös osana akateemisia opetussuunnitelmia. Tutkimuksen tulevat linjaukset voivat sisältää validointiin paremmin soveltuvan datan haun, paremmat ominaisuuksien valintatekniikat, hakuperusteisten mallien parametrien hienosäädön, oppijoiden hyper-parametrien hienosäädön, tutkimuksen useiden optimoinnin lähteiden vaikutuksista (oppija, esiintymät, ominaisuudet) ja luokan epätasapaino-ongelman vaikutuksesta. Search-Based Methods cross-project defect prediction data selection meta-analysis systematic literature review datan valinta hakuperusteiset menetelmät meta-analyysi systemaattinen kirjallisuuskatsaus
23	Developer-Centric Software Assessment Makedonski, Philip 12 April 2018 (has links) No description available. 510 software assessment software mining software evolution model-based software development data mining software analytics meta-modelling software modelling model transformation defect prediction facts extraction origin analysis Informatik (PPN619939052)

Page generated in 0.0884 seconds