Global ETD Search

1	Predicting likelihood of requirement implementation within the planned iteration Dehghan, Ali 31 May 2017 (has links) There has been a significant interest in the estimation of time and effort in fixing defects among both software practitioners and researchers over the past two decades. However, most of the focus has been on prediction of time and effort in resolving bugs, or other low level tasks, without much regard to predicting time needed to complete high-level requirements, a critical step in release planning. In this thesis, we describe a mixed-method empirical study on three large IBM projects in which we developed and evaluated a process of training a predictive model constituting a set of 29 features in nine categories in order to predict if whether or not a requirement will be completed within its planned iteration. We conducted feature engineering through iterative interviews with IBM software practitioners as well as analysis of large development and project management repositories of these three projects. Using machine learning techniques, we were able to make predictions on requirement completion time at four different stages of requirement lifetime. Using our industrial partner’s interest in high precision over recall, we then adopted a cost sensitive learning method and maximized precision of predictions (ranging from 0.8 to 0.97) while maintaining an acceptable recall. We also ranked the features based on their relative importance to the optimized predictive model. We show that although satisfying predictions can be made at early stages, even on the first day of requirement creation, performance of predictions improves over time by taking advantage of requirements’ progress data. Furthermore, feature importance ranking results show that although importance of features are highly dependent on project and prediction stage, there are certain features (e.g. requirement creator, time remained to the end of iteration, time since last requirement summary change and number of times requirement has been replanned for a new iteration) that emerge as important across most projects and stages, implying future worthwhile research directions for both researchers and practitioners. / Graduate mining software repositories machine learning completion time prediction release planning
2	Effort Modeling and Programmer Participation in Open Source Software Projects Koch, Stefan January 2005 (has links) (PDF) This paper analyses and develops models for programmer participation and effort estimation in open source software projects. This has not yet been a centre of research, although any results would be of high importance for assessing the efficiency of this model and for various decision-makers. In this paper, a case study is used for hypotheses generation regarding manpower function and effort modeling, then a large data set retrieved from a project repository is used to test these hypotheses. The main results are that Norden-Rayleigh-based approaches need to be complemented to account for the addition of new features during the lifecycle to be usable in this context, and that programmer-participation based effort models show significantly less effort than those based on output metrics like lines-of-code. (author's abstract) / Series: Working Papers on Information Systems, Information Business and Operations
3	Extracting Structured Knowledge from Textual Data in Software Repositories Hasan, Maryam 06 1900 (has links) Software team members, as they communicate and coordinate their work with others throughout the life-cycle of their projects, generate different kinds of textual artifacts. Despite the variety of works in the area of mining software artifacts, relatively little research has focused on communication artifacts. Software communication artifacts, in addition to source code artifacts, contain useful semantic information that is not fully explored by existing approaches. This thesis, presents the development of a text analysis method and tool to extract and represent useful pieces of information from a wide range of textual data sources associated with software projects. Our text analysis system integrates Natural Language Processing techniques and statistical text analysis methods, with software domain knowledge. The extracted information is represented as RDF-style triples which constitute interesting relations between developers and software products. We applied the developed system to analyze five different textual information, i.e., source code commits, bug reports, email messages, chat logs, and wiki pages. In the evaluation of our system, we found its precision to be 82%, its recall 58%, and its F-measure 68%.
4	DRACA: Decision-support for Root Cause Analysis and Change Impact Analysis Nadi, Sarah 12 1900 (has links) Most companies relying on an Information Technology (IT) system for their daily operations heavily invest in its maintenance. Tools that monitor network traffic, record anomalies and keep track of the changes that occur in the system are usually used. Root cause analysis and change impact analysis are two main activities involved in the management of IT systems. Currently, there exists no universal model to guide analysts while performing these activities. Although the Information Technology Infrastructure Library (ITIL) provides a guide to the or- ganization and structure of the tools and processes used to manage IT systems, it does not provide any models that can be used to implement the required features. This thesis focuses on providing simple and effective models and processes for root cause analysis and change impact analysis through mining useful artifacts stored in a Confguration Management Database (CMDB). The CMDB contains information about the different components in a system, called Confguration Items (CIs), as well as the relationships between them. Change reports and incident reports are also stored in a CMDB. The result of our work is the Decision support for Root cause Analysis and Change impact Analysis (DRACA) framework which suggests possible root cause(s) of a problem, as well as possible CIs involved in a change set based on di erent proposed models. The contributions of this thesis are as follows: - An exploration of data repositories (CMDBs) that have not been previously attempted in the mining software repositories research community. - A causality model providing decision support for root cause analysis based on this mined data. - A process for mining historical change information to suggest CIs for future change sets based on a ranking model. Support and con dence measures are used to make the suggestions. - Empirical results from applying the proposed change impact analysis process to industrial data. Our results show that the change sets in the CMDB were highly predictive, and that with a confidence threshold of 80% and a half life of 12 months, an overall recall of 69.8% and a precision of 88.5% were achieved. - An overview of lessons learned from using a CMDB, and the observations we made while working with the CMDB. Computer Science
5	DRACA: Decision-support for Root Cause Analysis and Change Impact Analysis Nadi, Sarah 12 1900 (has links) Most companies relying on an Information Technology (IT) system for their daily operations heavily invest in its maintenance. Tools that monitor network traffic, record anomalies and keep track of the changes that occur in the system are usually used. Root cause analysis and change impact analysis are two main activities involved in the management of IT systems. Currently, there exists no universal model to guide analysts while performing these activities. Although the Information Technology Infrastructure Library (ITIL) provides a guide to the or- ganization and structure of the tools and processes used to manage IT systems, it does not provide any models that can be used to implement the required features. This thesis focuses on providing simple and effective models and processes for root cause analysis and change impact analysis through mining useful artifacts stored in a Confguration Management Database (CMDB). The CMDB contains information about the different components in a system, called Confguration Items (CIs), as well as the relationships between them. Change reports and incident reports are also stored in a CMDB. The result of our work is the Decision support for Root cause Analysis and Change impact Analysis (DRACA) framework which suggests possible root cause(s) of a problem, as well as possible CIs involved in a change set based on di erent proposed models. The contributions of this thesis are as follows: - An exploration of data repositories (CMDBs) that have not been previously attempted in the mining software repositories research community. - A causality model providing decision support for root cause analysis based on this mined data. - A process for mining historical change information to suggest CIs for future change sets based on a ranking model. Support and con dence measures are used to make the suggestions. - Empirical results from applying the proposed change impact analysis process to industrial data. Our results show that the change sets in the CMDB were highly predictive, and that with a confidence threshold of 80% and a half life of 12 months, an overall recall of 69.8% and a precision of 88.5% were achieved. - An overview of lessons learned from using a CMDB, and the observations we made while working with the CMDB. Computer Science
6	Enabling Large-Scale Mining Software Repositories (MSR) Studies Using Web-Scale Platforms Shang, Weiyi 31 May 2010 (has links) The Mining Software Repositories (MSR) field analyzes software data to uncover knowledge and assist software developments. Software projects and products continue to grow in size and complexity. In-depth analysis of these large systems and their evolution is needed to better understand the characteristics of such large-scale systems and projects. However, classical software analysis platforms (e.g., Prolog-like, SQL-like, or specialized programming scripts) face many challenges when performing large-scale MSR studies. Such software platforms rarely scale easily out of the box. Instead, they often require analysis-specific one-time ad hoc scaling tricks and designs that are not reusable for other types of analysis and that are costly to maintain. We believe that the web community has faced many of the scaling challenges facing the software engineering community, as they cope with the enormous growth of the web data. In this thesis, we report on our experience in using MapReduce and Pig, two web-scale platforms, to perform large MSR studies. Through our case studies, we carefully demonstrate the benefits and challenges of using web platforms to prepare (i.e., Extract, Transform, and Load, ETL) software data for further analysis. The results of our studies show that: 1) web-scale platforms provide an effective and efficient platform for large-scale MSR studies; 2) many of the web community’s guidelines for using web-scale platforms must be modified to achieve the optimal performance for large-scale MSR studies. This thesis will help other software engineering researchers who want to scale their studies. / Thesis (Master, Computing) -- Queen's University, 2010-05-28 00:37:19.443 Software Engineering Mining Software Repositories MapReduce Hadoop Pig
7	TECHNIQUES FOR IMPROVING SOFTWARE DEVELOPMENT PROCESSES BY MINING SOFTWARE REPOSITORIES Dhaliwal, Tejinder 08 September 2012 (has links) Software repositories such as source code repositories and bug repositories record information about the software development process. By analyzing the rich data available in software repositories, we can uncover interesting information. This information can be leveraged to guide software developers, or to automate software development activities. In this thesis we investigate two activities of the development process: selective code integration and grouping of field crash-reports, and use the information available in software repositories to improve each of the two activities. / Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2012-09-04 12:26:59.388 Mining Software Repositories Field Crash Reports Selective Code Integration
8	Extracting Structured Knowledge from Textual Data in Software Repositories Hasan, Maryam Unknown Date No description available.
9	MINING UNSTRUCTURED SOFTWARE REPOSITORIES USING IR MODELS Thomas, STEPHEN 12 December 2012 (has links) Mining Software Repositories, which is the process of analyzing the data related to software development practices, is an emerging field which aims to aid development teams in their day to day tasks. However, data in many software repositories is currently unused because the data is unstructured, and therefore difficult to mine and analyze. Information Retrieval (IR) techniques, which were developed specifically to handle unstructured data, have recently been used by researchers to mine and analyze the unstructured data in software repositories, with some success. The main contribution of this thesis is the idea that the research and practice of using IR models to mine unstructured software repositories can be improved by going beyond the current state of affairs. First, we propose new applications of IR models to existing software engineering tasks. Specifically, we present a technique to prioritize test cases based on their IR similarity, giving highest priority to those test cases that are most dissimilar. In another new application of IR models, we empirically recover how developers use their mailing list while developing software. Next, we show how the use of advanced IR techniques can improve results. Using a framework for combining disparate IR models, we find that bug localization performance can be improved by 14–56% on average, compared to the best individual IR model. In addition, by using topic evolution models on the history of source code, we can uncover the evolution of source code concepts with an accuracy of 87–89%. Finally, we show the risks of current research, which uses IR models as black boxes without fully understanding their assumptions and parameters. We show that data duplication in source code has undesirable effects for IR models, and that by eliminating the duplication, the accuracy of IR models improves. Additionally, we find that in the bug localization task, an unwise choice of parameter values results in an accuracy of only 1%, where optimal parameters can achieve an accuracy of 55%. Through empirical case studies on real-world systems, we show that all of our proposed techniques and methodologies significantly improve the state-of-the-art. / Thesis (Ph.D, Computing) -- Queen's University, 2012-12-12 12:34:59.854 empirical studies mining software repositories data mining machine learning software engineering information retrieval
10	Information Theoretic Evaluation of Change Prediction Models for Large-Scale Software Askari, Mina January 2006 (has links) During software development and maintenance, as a software system evolves, changes are made and bugs are fixed in various files. In large-scale systems, file histories are stored in software repositories, such as CVS, which record modifications. By studying software repositories, we can learn about open source software development rocesses. Knowing where these changes will happen in advance, gives power to managers and developers to concentrate on those files. Due to the unpredictability in software development process, proposing an accurate change prediction model is hard. It is even harder to compare different models with the actual model of changes that is not available. <br /><br /> In this thesis, we first analyze the information generated during the development process, which can be obtained through mining the software repositories. We observe that the change data follows a Zipf distribution and exhibits self-similarity. Based on the extracted data, we then develop three probabilistic models to predict which files will have changes or bugs. One purpose of creating these models is to rank the files of the software that are most susceptible to having faults. <br /><br /> The first model is Maximum Likelihood Estimation (MLE), which simply counts the number of events i. e. , changes or bugs that occur in to each file, and normalizes the counts to compute a probability distribution. The second model is Reflexive Exponential Decay (RED), in which we postulate that the predictive rate of modification in a file is incremented by any modification to that file and decays exponentially. The result of a new bug occurring to that file is a new exponential effect added to the first one. The third model is called RED Co-Changes (REDCC). With each modification to a given file, the REDCC model not only increments its predictive rate, but also increments the rate for other files that are related to the given file through previous co-changes. <br /><br /> We then present an information-theoretic approach to evaluate the performance of different prediction models. In this approach, the closeness of model distribution to the actual unknown probability distribution of the system is measured using cross entropy. We evaluate our prediction models empirically using the proposed information-theoretic approach for six large open source systems. Based on this evaluation, we observe that of our three prediction models, the REDCC model predicts the distribution that is closest to the actual distribution for all the studied systems. Computer Science Change prediction models Software repositories Information theory Evaluation approach

Search results