Global ETD Search

1	An analysis of software defect prediction studies through reproducibility and replication Mahmood, Zaheed January 2018 (has links) Context. Software defect prediction is essential in reducing software development costs and in helping companies save their reputation. Defect prediction uses mathematical models to identify patterns associated with defects within code. Resources spent reviewing the entire code can be minimised by focusing on defective parts of the code. Recent findings suggest many published prediction models may not be reliable. Critical scientific methods for identifying reliable research are Replication and Reproduction. Replication can test the external validity of studies while Reproduction can test their internal validity. Aims. The aims of my dissertation are first to study the use and quality of replications and reproductions in defect prediction. Second, to identify factors that aid or hinder these scientific methods. Methods. My methodology is based on tracking the replication of 208 defect prediction studies identified in a highly cited Systematic Literature Review (SLR) [Hall et al. 2012]. I analyse how often each of these 208 studies has been replicated and determine the type of replication carried out. I use quality, citation counts, publication venue, impact factor, and data availability from all the 208 papers to see if any of these factors are associated with the frequency with which they are replicated. I further reproduce the original studies that have been replicated in order to check their internal validity. Finally, I identify factors that affect reproducibility. Results. Only 13 (6%) of the 208 studies are replicated, most of which fail a quality check. Of the 13 replicated original studies, 62% agree with their replications and 38% disagree. The main feature of a study associated with being replicated is that original papers appear in the Transactions of Software Engineering (TSE) journal. The number of citations an original paper had was also an indicator of the probability of being replicated. In addition, studies conducted using closed source data have more replications than those based on open source data. Of the 4 out of 5 papers I reproduced, their results differed with those of the original by more than 5%. Four factors are likely to have caused these failures: i) lack of a single version of the data initially used by the original; ii) the different dataset versions available have different properties that impact model performance; iii) unreported data preprocessing; and iv) inconsistent results from alternative versions of the same tools. Conclusions. Very few defect prediction studies are replicated. The lack of replication and failure of reproduction means that it remains unclear how reliable defect prediction is. Further investigation into this failure provides key aspects researchers need to consider when designing primary studies, performing replication and reproduction studies. Finally, I provide practical steps for improving the likelihood of replication and the chances of validating a study by reporting key factors.
2	An Exploration of Challenges Limiting Pragmatic Software Defect Prediction Shihab, Emad 09 August 2012 (has links) Software systems continue to play an increasingly important role in our daily lives, making the quality of software systems an extremely important issue. Therefore, a significant amount of recent research focused on the prioritization of software quality assurance efforts. One line of work that has been receiving an increasing amount of attention is Software Defect Prediction (SDP), where predictions are made to determine where future defects might appear. Our survey showed that in the past decade, more than 100 papers were published on SDP. Nevertheless, the adoption of SDP in practice to date is limited. In this thesis, we survey the state-of-the-art in SDP in order to identify the challenges that hinder the adoption of SDP in practice. These challenges include the fact that the majority of SDP research rarely considers the impact of defects when performing their predictions, seldom provides guidance on how to use the SDP results, and is too reactive and defect-centric in nature. We propose approaches that tackle these challenges. First, we present approaches that predict high-impact defects. Our approaches illustrate how SDP research can be tailored to consider the impact of defects when making their predictions. Second, we present approaches that simplify SDP models so they can be easily understood and illustrates how these simple models can be used to assist practitioners in prioritizing the creation of unit tests in large software systems. These approaches illustrate how SDP research can provide guidance to practitioners using SDP. Then, we argue that organizations are interested in proactive risk management, which covers more than just defects. For example, risky changes may not introduce defects but they could delay the release of projects. Therefore, we present an approach that predicts risky changes, illustrating how SDP can be more encompassing (i.e., by predicting risk, not only defects) and proactive (i.e., by predicting changes before they are incorporated into the code base). The presented approaches are empirically validated using data from several large open source and commercial software systems. The presented research highlights how challenges of pragmatic SDP can be tackled, making SDP research more beneficial and applicable in practice. / Thesis (Ph.D, Computing) -- Queen's University, 2012-08-02 13:12:39.707 Software Defect Prediction Software Engineering Software Maintenance
3	Developing and Evaluating Methods for Mitigating Sample Selection Bias in Machine Learning Pelayo Ramirez, Lourdes Unknown Date No description available. Machine Learning Software Reliability Stratification Learning in Imbalanced Datasets Sample Selection Bias Software Defect Prediction
4	Software defect prediction using machine learning on test and source code metrics Liljeson, Mattias, Mohlin, Alexander January 2014 (has links) Context. Software testing is the process of finding faults in software while executing it. The results of the testing are used to find and correct faults. Software defect prediction estimates where faults are likely to occur in source code. The results from the defect prediction can be used to opti- mize testing and ultimately improve software quality. Machine learning, that concerns computer programs learning from data, is used to build pre- diction models which then can be used to classify data. Objectives. In this study we, in collaboration with Ericsson, investigated whether software metrics from source code files combined with metrics from their respective tests predicts faults with better prediction perfor- mance compared to using only metrics from the source code files. Methods. A literature review was conducted to identify inputs for an ex- periment. The experiment was applied on one repository from Ericsson to identify the best performing set of metrics. Results. The prediction performance results of three metric sets are pre- sented and compared with each other. Wilcoxon’s signed rank tests are performed on four different performance measures for each metric set and each machine learning algorithm to demonstrate significant differences of the results. Conclusions. We conclude that metrics from tests can be used to predict faults. However, the combination of source code metrics and test metrics do not outperform using only source code metrics. Moreover, we conclude that models built with metrics from the test metric set with minimal infor- mation of the source code can in fact predict faults in the source code. Software defect prediction Software testing Machine learning Computer Sciences Datavetenskap (datalogi) Software Engineering Programvaruteknik
5	Investigation into predicting unit test failure using syntactic source code features / Undersöking om förutsägelse av enhetstestfel med användande av syntaktiska källkodssärdrag Sundström, Alex January 2018 (has links) In this thesis the application of software defect prediction to predict unit test failure is investigated. Data for this purpose was collected from a Continuous Integration development environment. Experiments were performed using semantic features from the source code. As the data was imbalanced with defective samples being in minority different degrees of oversampling were also evaluated. The data collection process revealed that even though several different code commits were available few ever failed a unit test. Difficulties with linking a failure to a specific file were also encountered. The machine learning model used in the project produced poor results when compared against related work, from which it was based on. In F-measure, it on average achieve 53% of the mean performance of state-of-the-art for software defect prediction on bugs in Java source files. Specifically, it would appear that very little information was available for the model to learn defects in files not present in training data. / I denna avhandling undersöks applikationen av prognos för mjukvarudefekter för att förutse enhetstestfel. Data för detta syfte samlades in från en utvecklingsmiljö med kontinuerlig integration. Experimenten utfördes med användning av semantiska särdrag samlade från källkod. Då data var obalanserat med defekta exempel i minoritet evaluerades olika grader av översampling. Datainsamlingsprocessen visade att även om det fanns många kodinlämningar så misslyckades få någonsin ett enhetstest. Svårigheter med att länka testmisslyckanden till en specifik fil påträffades också. Den använda maskininlärningsmodellen uppvisade också dåliga resultat i jämförelse med relaterade värk. Mätt i F-measure uppnåddes i genomsnitt 53% av genomsnittlig prestandan av bästa möjliga prognos av mjukvarudefekter av buggar i Java källkod. Specifikt så framträdde det att väldigt lite information verkar finnas för modellen att lära sig defekter i filer som ej fanns med i träningsdata. Computer Sciences Datavetenskap (datalogi)
6	Evaluation of Attention Mechanisms for Just-In-Time Software Defect Prediction / En Utvärdering av Attention Mechanisms för Just-In-Time Software Defect Prediction Isunza Navarro, Abgeiba Yaroslava January 2020 (has links) Just-In-Time Software Defect Prediction (JIT-DP) focuses on predicting errors in software at change-level with the objective of helping developers identify defects while the development process is still ongoing, and improving the quality of software applications. This work studies deep learning techniques by applying attention mechanisms that have been successful in, among others, Natural Language Processing (NLP) tasks. We introduce two networks named Convolutional Neural Network with Bidirectional Attention (BACNN) and Bidirectional Attention Code Network (BACoN) that employ a bi-directional attention mechanism between the code and message of a software change. Furthermore, we examine BERT [17] and RoBERTa [57] attention architectures for JIT-DP. More specifically, we study the effectiveness of the aforementioned attention-based models to predict defective commits compared to the current state of the art, DeepJIT [37] and TLEL [101]. Our experiments evaluate the models by using software changes from the OpenStack open source project. The results showed that attention-based networks outperformed the baseline models in terms of accuracy in the different evaluation settings. The attention-based models, particularly BERT and RoBERTa architectures, demonstrated promising results in identifying defective software changes and proved to be effective in predicting defects in changes of new software releases. / Just-In-Time Defect Prediction (JIT-DP) fokuserar på att förutspå fel i mjukvara vid ändringar i koden, med målet att hjälpa utvecklare att identifiera defekter medan utvecklingsprocessen fortfarande är pågående, och att förbättra kvaliteten hos applikationsprogramvara. Detta arbete studerar djupinlärningstekniker genom att tillämpa attentionmekanismer som har varit framgångsrika inom, bland annat, språkteknologi (NLP). Vi introducerar två nätverk vid namn Convolutional Neural Network with Bidirectional Attention (BACNN), och Bidirectional Attention Code Network (BACoN), som använder en tvåriktad attentionmekanism mellan koden och meddelandet om en mjukvaruändring. Dessutom undersöker vi BERT [17] och RoBERTa [57], attentionarkitekturer för JIT-DP. Mer specifikt studerar vi hur effektivt dessa attentionbaserade modeller kan förutspå defekta ändringar, och jämför dem med de bästa tillgängliga arkitekturerna DeePJIT [37] och TLEL [101]. Våra experiment utvärderar modellerna genom att använda mjukvaruändringar från det öppna källkodsprojektet OpenStack. Våra resultat visar att attentionbaserade nätverk överträffar referensmodellen sett till träffsäkerheten i de olika scenarierna. De attentionbaserade modellerna, framför allt BERT och RoBERTa, demonstrerade lovade resultat när det kommer till att identifiera defekta mjukvaruändringar och visade sig vara effektiva på att förutspå defekter i ändringar av nya mjukvaruversioner. Just-in-Time Software Defect Prediction Attention Mechanism Convolutional Neural Network Feature Extraction Just-in-Time Software Defect Prediction Attention Mechanism Convolutional Neural Network Feature Extraction Computer and Information Sciences Data- och informationsvetenskap
7	A Method For Product Defectiveness Prediction With Process Enactment Data In A Small Software Organization Sivrioglu, Damla 01 June 2012 (has links) (PDF) As a part of the quality management, product defectiveness prediction is vital for small software organizations as for instutional ones. Although for defect prediction there have been conducted a lot of studies, process enactment data cannot be used because of the difficulty of collection. Additionally, there is no proposed approach known in general for the analysis of process enactment data in software engineering. In this study, we developed a method to show the applicability of process enactment data for defect prediction and answered &ldquo / Is process enactment data beneficial for defect prediction?&rdquo / , &ldquo / How can we use process enactment data?&rdquo / and &ldquo / Which approaches and analysis methods can our method support?&rdquo / questions. We used multiple case study design and conducted case studies including with and without process enactment data in a small software development company. We preferred machine learning approaches rather than statistical ones, in order to cluster the data which includes process enactment informationsince we believed that they are convenient with the pattern oriented nature of the data. By the case studies performed, we obtained promising results. We evaluated performance values of prediction models to demonstrate the advantage of using process enactment data for the prediction of defect open duration value. When we have enough data points to apply machine learning methods and the data can be clusteredhomogeneously, we observed approximately 3% (ranging from -10% to %17) more accurate results from analyses including with process enactment data than the without ones. Keywords:
8	Cross-project defect prediction with meta-Learning / Predição de defeitos cruzada entre projetos apoiado por meta-aprendizado Porto, Faimison Rodrigues 29 September 2017 (has links) Defect prediction models assist tester practitioners on prioritizing the most defect-prone parts of the software. The approach called Cross-Project Defect Prediction (CPDP) refers to the use of known external projects to compose the training set. This approach is useful when the amount of historical defect data of a company to compose the training set is inappropriate or insufficient. Although the principle is attractive, the predictive performance is a limiting factor. In recent years, several methods were proposed aiming at improving the predictive performance of CPDP models. However, to the best of our knowledge, there is no evidence of which CPDP methods typically perform best. Moreover, there is no evidence on which CPDP methods perform better for a specific application domain. In fact, there is no machine learning algorithm suitable for all domains. The decision task of selecting an appropriate algorithm for a given application domain is investigated in the meta-learning literature. A meta-learning model is characterized by its capacity of learning from previous experiences and adapting its inductive bias dynamically according to the target domain. In this work, we investigate the feasibility of using meta-learning for the recommendation of CPDP methods. In this thesis, three main goals were pursued. First, we provide an experimental analysis to investigate the feasibility of using Feature Selection (FS) methods as an internal procedure to improve the performance of two specific CPDP methods. Second, we investigate which CPDP methods present typically best performances. We also investigate whether the typically best methods perform best for the same project datasets. The results reveal that the most suitable CPDP method for a project can vary according to the project characteristics, which leads to the third investigation of this work. We investigate the several particularities inherent to the CPDP context and propose a meta-learning solution able to learn from previous experiences and recommend a suitable CDPD method according to the characteristics of the project being predicted. We evaluate the learning capacity of the proposed solution and its performance in relation to the typically best CPDP methods. / Modelos de predição de defeitos auxiliam profissionais de teste na priorização de partes do software mais propensas a conter defeitos. A abordagem de predição de defeitos cruzada entre projetos (CPDP) refere-se à utilização de projetos externos já conhecidos para compor o conjunto de treinamento. Essa abordagem é útil quando a quantidade de dados históricos de defeitos é inapropriada ou insuficiente para compor o conjunto de treinamento. Embora o princípio seja atrativo, o desempenho de predição é um fator limitante nessa abordagem. Nos últimos anos, vários métodos foram propostos com o intuito de melhorar o desempenho de predição de modelos CPDP. Contudo, na literatura, existe uma carência de estudos comparativos que apontam quais métodos CPDP apresentam melhores desempenhos. Além disso, não há evidências sobre quais métodos CPDP apresentam melhor desempenho para um domínio de aplicação específico. De fato, não existe um algoritmo de aprendizado de máquina que seja apropriado para todos os domínios de aplicação. A tarefa de decisão sobre qual algoritmo é mais adequado a um determinado domínio de aplicação é investigado na literatura de meta-aprendizado. Um modelo de meta-aprendizado é caracterizado pela sua capacidade de aprender a partir de experiências anteriores e adaptar seu viés de indução dinamicamente de acordo com o domínio alvo. Neste trabalho, nós investigamos a viabilidade de usar meta-aprendizado para a recomendação de métodos CPDP. Nesta tese são almejados três principais objetivos. Primeiro, é conduzida uma análise experimental para investigar a viabilidade de usar métodos de seleção de atributos como procedimento interno de dois métodos CPDP, com o intuito de melhorar o desempenho de predição. Segundo, são investigados quais métodos CPDP apresentam um melhor desempenho em um contexto geral. Nesse contexto, também é investigado se os métodos com melhor desempenho geral apresentam melhor desempenho para os mesmos conjuntos de dados (ou projetos de software). Os resultados revelam que os métodos CPDP mais adequados para um projeto podem variar de acordo com as características do projeto sendo predito. Essa constatação conduz à terceira investigação realizada neste trabalho. Foram investigadas as várias particularidades inerentes ao contexto CPDP a fim de propor uma solução de meta-aprendizado capaz de aprender com experiências anteriores e recomendar métodos CPDP adequados, de acordo com as características do software. Foram avaliados a capacidade de meta-aprendizado da solução proposta e a sua performance em relação aos métodos base que apresentaram melhor desempenho geral. Cross-project defect prediction Engenharia de software experimental Experimental software engineering Meta-aprendizado Meta-learning Predição de defeitos em software Qualidade de software Software defect prediction Software quality assurance
9	Cross-project defect prediction with meta-Learning / Predição de defeitos cruzada entre projetos apoiado por meta-aprendizado Faimison Rodrigues Porto 29 September 2017 (has links) Defect prediction models assist tester practitioners on prioritizing the most defect-prone parts of the software. The approach called Cross-Project Defect Prediction (CPDP) refers to the use of known external projects to compose the training set. This approach is useful when the amount of historical defect data of a company to compose the training set is inappropriate or insufficient. Although the principle is attractive, the predictive performance is a limiting factor. In recent years, several methods were proposed aiming at improving the predictive performance of CPDP models. However, to the best of our knowledge, there is no evidence of which CPDP methods typically perform best. Moreover, there is no evidence on which CPDP methods perform better for a specific application domain. In fact, there is no machine learning algorithm suitable for all domains. The decision task of selecting an appropriate algorithm for a given application domain is investigated in the meta-learning literature. A meta-learning model is characterized by its capacity of learning from previous experiences and adapting its inductive bias dynamically according to the target domain. In this work, we investigate the feasibility of using meta-learning for the recommendation of CPDP methods. In this thesis, three main goals were pursued. First, we provide an experimental analysis to investigate the feasibility of using Feature Selection (FS) methods as an internal procedure to improve the performance of two specific CPDP methods. Second, we investigate which CPDP methods present typically best performances. We also investigate whether the typically best methods perform best for the same project datasets. The results reveal that the most suitable CPDP method for a project can vary according to the project characteristics, which leads to the third investigation of this work. We investigate the several particularities inherent to the CPDP context and propose a meta-learning solution able to learn from previous experiences and recommend a suitable CDPD method according to the characteristics of the project being predicted. We evaluate the learning capacity of the proposed solution and its performance in relation to the typically best CPDP methods. / Modelos de predição de defeitos auxiliam profissionais de teste na priorização de partes do software mais propensas a conter defeitos. A abordagem de predição de defeitos cruzada entre projetos (CPDP) refere-se à utilização de projetos externos já conhecidos para compor o conjunto de treinamento. Essa abordagem é útil quando a quantidade de dados históricos de defeitos é inapropriada ou insuficiente para compor o conjunto de treinamento. Embora o princípio seja atrativo, o desempenho de predição é um fator limitante nessa abordagem. Nos últimos anos, vários métodos foram propostos com o intuito de melhorar o desempenho de predição de modelos CPDP. Contudo, na literatura, existe uma carência de estudos comparativos que apontam quais métodos CPDP apresentam melhores desempenhos. Além disso, não há evidências sobre quais métodos CPDP apresentam melhor desempenho para um domínio de aplicação específico. De fato, não existe um algoritmo de aprendizado de máquina que seja apropriado para todos os domínios de aplicação. A tarefa de decisão sobre qual algoritmo é mais adequado a um determinado domínio de aplicação é investigado na literatura de meta-aprendizado. Um modelo de meta-aprendizado é caracterizado pela sua capacidade de aprender a partir de experiências anteriores e adaptar seu viés de indução dinamicamente de acordo com o domínio alvo. Neste trabalho, nós investigamos a viabilidade de usar meta-aprendizado para a recomendação de métodos CPDP. Nesta tese são almejados três principais objetivos. Primeiro, é conduzida uma análise experimental para investigar a viabilidade de usar métodos de seleção de atributos como procedimento interno de dois métodos CPDP, com o intuito de melhorar o desempenho de predição. Segundo, são investigados quais métodos CPDP apresentam um melhor desempenho em um contexto geral. Nesse contexto, também é investigado se os métodos com melhor desempenho geral apresentam melhor desempenho para os mesmos conjuntos de dados (ou projetos de software). Os resultados revelam que os métodos CPDP mais adequados para um projeto podem variar de acordo com as características do projeto sendo predito. Essa constatação conduz à terceira investigação realizada neste trabalho. Foram investigadas as várias particularidades inerentes ao contexto CPDP a fim de propor uma solução de meta-aprendizado capaz de aprender com experiências anteriores e recomendar métodos CPDP adequados, de acordo com as características do software. Foram avaliados a capacidade de meta-aprendizado da solução proposta e a sua performance em relação aos métodos base que apresentaram melhor desempenho geral. Engenharia de software experimental Meta-aprendizado Predição de defeitos em software Qualidade de software Cross-project defect prediction Experimental software engineering Meta-learning Software defect prediction Software quality assurance

Search results