1 |
Multi-heuristic theory assessment with iterative selectionAmmar, Kareem. January 2004 (has links)
Thesis (M.S.)--West Virginia University, 2004. / Title from document title page. Document formatted into pages; contains viii, 106 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 105-106).
|
2 |
Predicting likelihood of requirement implementation within the planned iterationDehghan, Ali 31 May 2017 (has links)
There has been a significant interest in the estimation of time and effort in fixing defects among both software practitioners and researchers over the past two decades. However, most of the focus has been on prediction of time and effort in resolving bugs, or other low level tasks, without much regard to predicting time needed to complete high-level requirements, a critical step in release planning. In this thesis, we describe a mixed-method empirical study on three large IBM projects in which we developed and evaluated a process of training a predictive model constituting a set of 29 features in nine categories in order to predict if whether or not a requirement will be completed within its planned iteration. We conducted feature engineering through iterative interviews with IBM software practitioners as well as analysis of large development and project management repositories of these three projects. Using machine learning techniques, we were able to make predictions on requirement completion time at four different stages of requirement lifetime. Using our industrial partner’s interest in high precision over recall, we then adopted a cost sensitive learning method and maximized precision of predictions (ranging from 0.8 to 0.97) while maintaining an acceptable recall. We also ranked the features based on their relative importance to the optimized predictive model. We show that although satisfying predictions can be made at early stages, even on the first day of requirement creation, performance of predictions improves over time by taking advantage of requirements’ progress data. Furthermore, feature importance ranking results show that although importance of features are highly dependent on project and prediction stage, there are certain features (e.g. requirement creator, time remained to the end of iteration, time since last requirement summary change and number of times requirement has been replanned for a new iteration) that emerge as important across most projects and stages, implying future worthwhile research directions for both researchers and practitioners. / Graduate
|
3 |
Extracting Structured Knowledge from Textual Data in Software RepositoriesHasan, Maryam 06 1900 (has links)
Software team members, as they communicate and coordinate their work with others throughout the life-cycle of their projects, generate different kinds of textual artifacts. Despite the variety of works in the area of mining software artifacts, relatively little research has focused on communication artifacts. Software communication artifacts, in addition to source code artifacts, contain useful semantic information that is not fully explored by existing approaches.
This thesis, presents the development of a text analysis method and tool to extract and represent useful pieces of information from a wide range of textual data sources associated with software projects. Our text analysis system integrates Natural Language Processing techniques and statistical text analysis methods, with software domain knowledge. The extracted information is represented as RDF-style triples which constitute interesting relations between developers and software products. We applied the developed system to analyze five different textual information, i.e., source code commits, bug reports, email messages, chat logs, and wiki pages. In the evaluation of our system, we found its precision to be 82%, its recall 58%, and its F-measure 68%.
|
4 |
DRACA: Decision-support for Root Cause Analysis and Change Impact AnalysisNadi, Sarah 12 1900 (has links)
Most companies relying on an Information Technology (IT) system for their
daily operations heavily invest in its maintenance. Tools that monitor network
traffic, record anomalies and keep track of the changes that occur in the system
are usually used. Root cause analysis and change impact analysis are two main
activities involved in the management of IT systems. Currently, there exists no
universal model to guide analysts while performing these activities. Although the
Information Technology Infrastructure Library (ITIL) provides a guide to the or-
ganization and structure of the tools and processes used to manage IT systems, it
does not provide any models that can be used to implement the required features.
This thesis focuses on providing simple and effective models and processes for
root cause analysis and change impact analysis through mining useful artifacts
stored in a Confguration Management Database (CMDB). The CMDB contains
information about the different components in a system, called Confguration Items
(CIs), as well as the relationships between them. Change reports and incident
reports are also stored in a CMDB. The result of our work is the Decision support
for Root cause Analysis and Change impact Analysis (DRACA) framework which
suggests possible root cause(s) of a problem, as well as possible CIs involved in a change set based on di erent proposed models. The contributions of this thesis are
as follows:
- An exploration of data repositories (CMDBs) that have not been previously
attempted in the mining software repositories research community.
- A causality model providing decision support for root cause analysis based
on this mined data.
- A process for mining historical change information to suggest CIs for future
change sets based on a ranking model. Support and con dence measures are
used to make the suggestions.
- Empirical results from applying the proposed change impact analysis process
to industrial data. Our results show that the change sets in the CMDB were
highly predictive, and that with a confidence threshold of 80% and a half
life of 12 months, an overall recall of 69.8% and a precision of 88.5% were
achieved.
- An overview of lessons learned from using a CMDB, and the observations we
made while working with the CMDB.
|
5 |
DRACA: Decision-support for Root Cause Analysis and Change Impact AnalysisNadi, Sarah 12 1900 (has links)
Most companies relying on an Information Technology (IT) system for their
daily operations heavily invest in its maintenance. Tools that monitor network
traffic, record anomalies and keep track of the changes that occur in the system
are usually used. Root cause analysis and change impact analysis are two main
activities involved in the management of IT systems. Currently, there exists no
universal model to guide analysts while performing these activities. Although the
Information Technology Infrastructure Library (ITIL) provides a guide to the or-
ganization and structure of the tools and processes used to manage IT systems, it
does not provide any models that can be used to implement the required features.
This thesis focuses on providing simple and effective models and processes for
root cause analysis and change impact analysis through mining useful artifacts
stored in a Confguration Management Database (CMDB). The CMDB contains
information about the different components in a system, called Confguration Items
(CIs), as well as the relationships between them. Change reports and incident
reports are also stored in a CMDB. The result of our work is the Decision support
for Root cause Analysis and Change impact Analysis (DRACA) framework which
suggests possible root cause(s) of a problem, as well as possible CIs involved in a change set based on di erent proposed models. The contributions of this thesis are
as follows:
- An exploration of data repositories (CMDBs) that have not been previously
attempted in the mining software repositories research community.
- A causality model providing decision support for root cause analysis based
on this mined data.
- A process for mining historical change information to suggest CIs for future
change sets based on a ranking model. Support and con dence measures are
used to make the suggestions.
- Empirical results from applying the proposed change impact analysis process
to industrial data. Our results show that the change sets in the CMDB were
highly predictive, and that with a confidence threshold of 80% and a half
life of 12 months, an overall recall of 69.8% and a precision of 88.5% were
achieved.
- An overview of lessons learned from using a CMDB, and the observations we
made while working with the CMDB.
|
6 |
Enabling Large-Scale Mining Software Repositories (MSR) Studies Using Web-Scale PlatformsShang, Weiyi 31 May 2010 (has links)
The Mining Software Repositories (MSR) field analyzes software data to uncover knowledge and assist software developments. Software projects and products continue to grow in size and complexity. In-depth analysis of these large systems and their evolution is needed to better understand the characteristics of such large-scale systems and projects. However, classical software analysis platforms (e.g., Prolog-like, SQL-like, or specialized programming scripts) face many challenges when performing large-scale MSR studies. Such software platforms rarely scale easily out of the box. Instead, they often require analysis-specific one-time ad hoc scaling tricks and designs that are not reusable for other types of analysis and that are costly to maintain. We believe that the web community has faced many of the scaling challenges facing the software engineering community, as they cope with the enormous growth of the web data. In this thesis, we report on our experience in using MapReduce and Pig, two web-scale platforms, to perform large MSR studies. Through our case studies, we carefully demonstrate the benefits and challenges of using web platforms to prepare (i.e., Extract, Transform, and Load, ETL) software data for further analysis. The results of our studies show that: 1) web-scale platforms provide an effective and efficient platform for large-scale MSR studies; 2) many of the web community’s guidelines for using web-scale platforms must be modified to achieve the optimal performance for large-scale MSR studies. This thesis will help other software engineering researchers who want to scale their studies. / Thesis (Master, Computing) -- Queen's University, 2010-05-28 00:37:19.443
|
7 |
TECHNIQUES FOR IMPROVING SOFTWARE DEVELOPMENT PROCESSES BY MINING SOFTWARE REPOSITORIESDhaliwal, Tejinder 08 September 2012 (has links)
Software repositories such as source code repositories and bug repositories record information about the software development process. By analyzing the rich data available in software repositories, we can uncover interesting information. This information can be leveraged to guide software developers, or to automate software development activities. In this thesis we investigate two activities of the development process: selective code integration and grouping of field crash-reports, and use the information available in software repositories to improve each of the two activities. / Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2012-09-04 12:26:59.388
|
8 |
Extracting Structured Knowledge from Textual Data in Software RepositoriesHasan, Maryam Unknown Date
No description available.
|
9 |
MINING UNSTRUCTURED SOFTWARE REPOSITORIES USING IR MODELSThomas, STEPHEN 12 December 2012 (has links)
Mining Software Repositories, which is the process of analyzing the data related
to software development practices, is an emerging field which aims to
aid development teams in their day to day tasks. However, data in many
software repositories is currently unused because the data is unstructured, and therefore
difficult to mine and analyze. Information Retrieval (IR) techniques, which were developed
specifically to handle unstructured data, have recently been used by researchers to mine
and analyze the unstructured data in software repositories, with some success.
The main contribution of this thesis is the idea that the research and practice of using
IR models to mine unstructured software repositories can be improved by going beyond the
current state of affairs. First, we propose new applications of IR models to existing software
engineering tasks. Specifically, we present a technique to prioritize test cases based on their
IR similarity, giving highest priority to those test cases that are most dissimilar. In another
new application of IR models, we empirically recover how developers use their mailing list
while developing software.
Next, we show how the use of advanced IR techniques can improve results. Using a
framework for combining disparate IR models, we find that bug localization performance
can be improved by 14–56% on average, compared to the best individual IR model. In
addition, by using topic evolution models on the history of source code, we can uncover the
evolution of source code concepts with an accuracy of 87–89%.
Finally, we show the risks of current research, which uses IR models as black boxes without
fully understanding their assumptions and parameters. We show that data duplication
in source code has undesirable effects for IR models, and that by eliminating the duplication,
the accuracy of IR models improves. Additionally, we find that in the bug localization
task, an unwise choice of parameter values results in an accuracy of only 1%, where optimal
parameters can achieve an accuracy of 55%.
Through empirical case studies on real-world systems, we show that all of our proposed
techniques and methodologies significantly improve the state-of-the-art. / Thesis (Ph.D, Computing) -- Queen's University, 2012-12-12 12:34:59.854
|
10 |
Mining Developer Dynamics for Agent-Based Simulation of Software EvolutionHerbold, Verena 27 June 2019 (has links)
No description available.
|
Page generated in 0.0635 seconds