Global ETD Search

11	Mining Developer Dynamics for Agent-Based Simulation of Software Evolution Herbold, Verena 27 June 2019 (has links) No description available. 510 Agent-Based Simulation Software Evolution Mining Software Repositories Hidden Markov Models Developer Contribution Informatik (PPN619939052)
12	An Analysis of the Differences between Unit and Integration Tests Trautsch, Fabian 08 April 2019 (has links) No description available. 510 unit testing integration testing empirical software engineering mining software repositories Informatik (PPN619939052)
13	Information Theoretic Evaluation of Change Prediction Models for Large-Scale Software Askari, Mina January 2006 (has links) During software development and maintenance, as a software system evolves, changes are made and bugs are fixed in various files. In large-scale systems, file histories are stored in software repositories, such as CVS, which record modifications. By studying software repositories, we can learn about open source software development rocesses. Knowing where these changes will happen in advance, gives power to managers and developers to concentrate on those files. Due to the unpredictability in software development process, proposing an accurate change prediction model is hard. It is even harder to compare different models with the actual model of changes that is not available. <br /><br /> In this thesis, we first analyze the information generated during the development process, which can be obtained through mining the software repositories. We observe that the change data follows a Zipf distribution and exhibits self-similarity. Based on the extracted data, we then develop three probabilistic models to predict which files will have changes or bugs. One purpose of creating these models is to rank the files of the software that are most susceptible to having faults. <br /><br /> The first model is Maximum Likelihood Estimation (MLE), which simply counts the number of events i. e. , changes or bugs that occur in to each file, and normalizes the counts to compute a probability distribution. The second model is Reflexive Exponential Decay (RED), in which we postulate that the predictive rate of modification in a file is incremented by any modification to that file and decays exponentially. The result of a new bug occurring to that file is a new exponential effect added to the first one. The third model is called RED Co-Changes (REDCC). With each modification to a given file, the REDCC model not only increments its predictive rate, but also increments the rate for other files that are related to the given file through previous co-changes. <br /><br /> We then present an information-theoretic approach to evaluate the performance of different prediction models. In this approach, the closeness of model distribution to the actual unknown probability distribution of the system is measured using cross entropy. We evaluate our prediction models empirically using the proposed information-theoretic approach for six large open source systems. Based on this evaluation, we observe that of our three prediction models, the REDCC model predicts the distribution that is closest to the actual distribution for all the studied systems. Computer Science Change prediction models Software repositories Information theory Evaluation approach
14	Evidence-based Software Process Recovery Hindle, Abram 20 October 2010 (has links) Developing a large software system involves many complicated, varied, and inter-dependent tasks, and these tasks are typically implemented using a combination of defined processes, semi-automated tools, and ad hoc practices. Stakeholders in the development process --- including software developers, managers, and customers --- often want to be able to track the actual practices being employed within a project. For example, a customer may wish to be sure that the process is ISO 9000 compliant, a manager may wish to track the amount of testing that has been done in the current iteration, and a developer may wish to determine who has recently been working on a subsystem that has had several major bugs appear in it. However, extracting the software development processes from an existing project is expensive if one must rely upon manual inspection of artifacts and interviews of developers and their managers. Previously, researchers have suggested the live observation and instrumentation of a project to allow for more measurement, but this is costly, invasive, and also requires a live running project. In this work, we propose an approach that we call software process recovery that is based on after-the-fact analysis of various kinds of software development artifacts. We use a variety of supervised and unsupervised techniques from machine learning, topic analysis, natural language processing, and statistics on software repositories such as version control systems, bug trackers, and mailing list archives. We show how we can combine all of these methods to recover process signals that we map back to software development processes such as the Unified Process. The Unified Process has been visualized using a time-line view that shows effort per parallel discipline occurring across time. This visualization is called the Unified Process diagram. We use this diagram as inspiration to produce Recovered Unified Process Views (RUPV) that are a concrete version of this theoretical Unified Process diagram. We then validate these methods using case studies of multiple open source software systems. Software development process Software Engineering Mining Software Repositories Software Process Recovery Computer Science (Software Engineering)
15	Evidence-based Software Process Recovery Hindle, Abram 20 October 2010 (has links) Developing a large software system involves many complicated, varied, and inter-dependent tasks, and these tasks are typically implemented using a combination of defined processes, semi-automated tools, and ad hoc practices. Stakeholders in the development process --- including software developers, managers, and customers --- often want to be able to track the actual practices being employed within a project. For example, a customer may wish to be sure that the process is ISO 9000 compliant, a manager may wish to track the amount of testing that has been done in the current iteration, and a developer may wish to determine who has recently been working on a subsystem that has had several major bugs appear in it. However, extracting the software development processes from an existing project is expensive if one must rely upon manual inspection of artifacts and interviews of developers and their managers. Previously, researchers have suggested the live observation and instrumentation of a project to allow for more measurement, but this is costly, invasive, and also requires a live running project. In this work, we propose an approach that we call software process recovery that is based on after-the-fact analysis of various kinds of software development artifacts. We use a variety of supervised and unsupervised techniques from machine learning, topic analysis, natural language processing, and statistics on software repositories such as version control systems, bug trackers, and mailing list archives. We show how we can combine all of these methods to recover process signals that we map back to software development processes such as the Unified Process. The Unified Process has been visualized using a time-line view that shows effort per parallel discipline occurring across time. This visualization is called the Unified Process diagram. We use this diagram as inspiration to produce Recovered Unified Process Views (RUPV) that are a concrete version of this theoretical Unified Process diagram. We then validate these methods using case studies of multiple open source software systems. Software development process Software Engineering Mining Software Repositories Software Process Recovery Computer Science (Software Engineering)
16	Supporting Development Decisions with Software Analytics Baysal, Olga January 2014 (has links) Software practitioners make technical and business decisions based on the understanding they have of their software systems. This understanding is grounded in their own experiences, but can be augmented by studying various kinds of development artifacts, including source code, bug reports, version control meta-data, test cases, usage logs, etc. Unfortunately, the information contained in these artifacts is typically not organized in the way that is immediately useful to developers’ everyday decision making needs. To handle the large volumes of data, many practitioners and researchers have turned to analytics — that is, the use of analysis, data, and systematic reasoning for making decisions. The thesis of this dissertation is that by employing software analytics to various development tasks and activities, we can provide software practitioners better insights into their processes, systems, products, and users, to help them make more informed data-driven decisions. While quantitative analytics can help project managers understand the big picture of their systems, plan for its future, and monitor trends, qualitative analytics can enable developers to perform their daily tasks and activities more quickly by helping them better manage high volumes of information. To support this thesis, we provide three different examples of employing software analytics. First, we show how analysis of real-world usage data can be used to assess user dynamic behaviour and adoption trends of a software system by revealing valuable information on how software systems are used in practice. Second, we have created a lifecycle model that synthesizes knowledge from software development artifacts, such as reported issues, source code, discussions, community contributions, etc. Lifecycle models capture the dynamic nature of how various development artifacts change over time in an annotated graphical form that can be easily understood and communicated. We demonstrate how lifecycle models can be generated and present industrial case studies where we apply these models to assess the code review process of three different projects. Third, we present a developer-centric approach to issue tracking that aims to reduce information overload and improve developers’ situational awareness. Our approach is motivated by a grounded theory study of developer interviews, which suggests that customized views of a project’s repositories that are tailored to developer-specific tasks can help developers better track their progress and understand the surrounding technical context of their working environments. We have created a model of the kinds of information elements that developers feel are essential in completing their daily tasks, and from this model we have developed a prototype tool organized around developer-specific customized dashboards. The results of these three studies show that software analytics can inform evidence-based decisions related to user adoption of a software project, code review processes, and improved developers’ awareness on their daily tasks and activities.
17	Validating the Quality of a Big Data Java Corpus Palmqvist, Simon January 2018 (has links) Recent research within the field of Software Engineering have used GitHub, the largest hub for open source projects with almost 20 million users and 57 million repositories, to mine large amounts of source code to get more trustworthy results when developing machine and deep learning models. Mining GitHub comes with many challenges since the dataset is large and the data does not only contain quality software projects. In this project, we try to mine projects from GitHub based on earlier research by others and try to validate the quality by comparing the projects with a small subset of quality projects with the help of software complexity metrics. mining software repositories GitHub GHTorrent Chidamber & Kemerer metrics software complexity Computer Sciences Datavetenskap (datalogi)
18	Inappropriate Software Changes: Rejection and Rework Souza, Rodrigo Rocha Gomes e 17 July 2015 (has links) Submitted by Rodrigo Souza (rodrigorgs@gmail.com) on 2015-08-10T11:52:14Z No. of bitstreams: 1 rodrigo-thesis-final.pdf: 3966742 bytes, checksum: a3ca54b041cfdc2f5dad0f568b6d3a61 (MD5) / Made available in DSpace on 2015-08-10T11:52:14Z (GMT). No. of bitstreams: 1 rodrigo-thesis-final.pdf: 3966742 bytes, checksum: a3ca54b041cfdc2f5dad0f568b6d3a61 (MD5) / Introdução: A escrita de mudanças no código-fonte para corrigir defeitos ou implementar novas funcionalidades é uma tarefa importante no desenvolvimento de software, uma vez que contribui para evoluir um sistema de software. Nem todas as mudanças, no entanto, são aceitas na primeira tentativa. Mudanças inadequadas podem ser rejeitadas por causa de problemas encontrados durante a revisão de código, durante o teste automatizado, ou durante o teste manual, possivelmente resultando em retrabalho. Nosso objetivo é entender melhor a associação estatística entre diferentes tipos de rejeição --- revisões de código negativas, commits suplementares, reversão de commits e reabertura de tíquetes ---, caracterizar seus impactos em um projeto e entender como elas são afetadas por certas mudanças de processo. Para este fim, esta tese apresenta uma análise de três grandes projetos de software livre desenvolvidos pela Mozilla Foundation, os quais sofreram mudanças significativas no seu processo, como a adoção de lançamentos frequentes. Métodos: Para perseguir nosso objetivo, nos baseamos em tíquetes e commits de um período de mais de quatro anos do histórico dos projetos. Computamos métricas sobre a ocorrência de diversos tipos de rejeição de mudanças e medimos o tempo que leva tanto para submeter uma mudança quanto para rejeitar mudanças inapropriadas. Além disso, validamos nossos resultados com desenvolvedores da Mozilla. Resultados: Descobrimos que técnicas usadas em estudos anteriores para detectar mudanças inadequadas são imprecisas; por isso, propusemos uma técnica alternativa. Determinamos que mudanças inadequadas são um problema relevante e diário, que afeta cerca de 18% de todos os tíquetes em um projeto. Também descobrimos que, quando a Mozilla adotou lançamentos frequentes, embora a proporção de commits revertidos tenha aumentado, as reversões foram realizadas mais cedo no processo. / Background: Writing source code changes to fix bugs or implement new features is an important software development task, as it contributes to evolve a software system. Not all changes are accepted in the first attempt, though. Inappropriate changes can be rejected because of problems found during code review, automated testing, or manual testing, possibly resulting in rework. Our objective is to better understand the statistical association between different types of rejection---negative code reviews, supplementary commits, reverts, and issue reopening---to characterize their impacts within a project, and to understand how they are affected by certain process changes. To this end, this thesis presents an analysis of three large open source projects developed by the Mozilla Foundation, which underwent significant changes in their process, such as the adoption of rapid releases. Methods: To pursue our objective, we analyzed issues and source code commits from over four years of the projects' history. We computed metrics on the occurrence of multiple types of change rejection and measured the time it takes both to submit a change and to reject inappropriate changes. Furthermore, we validated our findings by discussing them with Mozilla developers. Results: We found that techniques used in previous studies to detect inappropriate changes are imprecise; because of that, we proposed an alternative technique. We determined that inappropriate changes are a relevant, daily problem, that affects about 18% of all issues in a project. We also discovered that, under rapid releases, although the proportion of reverted commits at Mozilla increased, the reverts were performed earlier in the process. software engineering software evolution mining software repositories software quality release engineering
19	Boa Views: Enabling Modularization and Sharing of Boa Queries Hung, Che Shian 09 August 2019 (has links) No description available. Computer Science
20	Observational Studies of Software Engineering Using Data from Software Repositories Delorey, Daniel Pierce 06 March 2007 (has links) (PDF) Data for empirical studies of software engineering can be difficult to obtain. Extrapolations from small controlled experiments to large development environments are tenuous and observation tends to change the behavior of the subjects. In this thesis we propose the use of data gathered from software repositories in observational studies of software engineering. We present tools we have developed to extract data from CVS repositories and the SourceForge Research Archive. We use these tools to gather data from 9,999 Open Source projects. By analyzing these data we are able to provide insights into the structure of Open Source projects. For example, we find that the vast majority of the projects studied have never had more than three contributors and that the vast majority of authors studied have never contributed to more than one project. However, there are projects that have had up to 120 contributors in a single year and authors who have contributed to more than 20 projects which raises interesting questions about team dynamics in the Open Source community. We also use these data to empirically test the belief that productivity is constant in terms of lines of code per programmer per year regardless of the programming language used. We find that yearly programmer productivity is not constant across programming languages, but rather that developers using higher level languages tend to write fewer lines of code per year than those using lower level languages. thesis empirical software engineering open source software mining software repositories programmer productivity programming languages Computer Sciences

Search results