Global ETD Search

11	Evidence-based Software Process Recovery Hindle, Abram 20 October 2010 (has links) Developing a large software system involves many complicated, varied, and inter-dependent tasks, and these tasks are typically implemented using a combination of defined processes, semi-automated tools, and ad hoc practices. Stakeholders in the development process --- including software developers, managers, and customers --- often want to be able to track the actual practices being employed within a project. For example, a customer may wish to be sure that the process is ISO 9000 compliant, a manager may wish to track the amount of testing that has been done in the current iteration, and a developer may wish to determine who has recently been working on a subsystem that has had several major bugs appear in it. However, extracting the software development processes from an existing project is expensive if one must rely upon manual inspection of artifacts and interviews of developers and their managers. Previously, researchers have suggested the live observation and instrumentation of a project to allow for more measurement, but this is costly, invasive, and also requires a live running project. In this work, we propose an approach that we call software process recovery that is based on after-the-fact analysis of various kinds of software development artifacts. We use a variety of supervised and unsupervised techniques from machine learning, topic analysis, natural language processing, and statistics on software repositories such as version control systems, bug trackers, and mailing list archives. We show how we can combine all of these methods to recover process signals that we map back to software development processes such as the Unified Process. The Unified Process has been visualized using a time-line view that shows effort per parallel discipline occurring across time. This visualization is called the Unified Process diagram. We use this diagram as inspiration to produce Recovered Unified Process Views (RUPV) that are a concrete version of this theoretical Unified Process diagram. We then validate these methods using case studies of multiple open source software systems. Software development process Software Engineering Mining Software Repositories Software Process Recovery Computer Science (Software Engineering)
12	Evidence-based Software Process Recovery Hindle, Abram 20 October 2010 (has links) Developing a large software system involves many complicated, varied, and inter-dependent tasks, and these tasks are typically implemented using a combination of defined processes, semi-automated tools, and ad hoc practices. Stakeholders in the development process --- including software developers, managers, and customers --- often want to be able to track the actual practices being employed within a project. For example, a customer may wish to be sure that the process is ISO 9000 compliant, a manager may wish to track the amount of testing that has been done in the current iteration, and a developer may wish to determine who has recently been working on a subsystem that has had several major bugs appear in it. However, extracting the software development processes from an existing project is expensive if one must rely upon manual inspection of artifacts and interviews of developers and their managers. Previously, researchers have suggested the live observation and instrumentation of a project to allow for more measurement, but this is costly, invasive, and also requires a live running project. In this work, we propose an approach that we call software process recovery that is based on after-the-fact analysis of various kinds of software development artifacts. We use a variety of supervised and unsupervised techniques from machine learning, topic analysis, natural language processing, and statistics on software repositories such as version control systems, bug trackers, and mailing list archives. We show how we can combine all of these methods to recover process signals that we map back to software development processes such as the Unified Process. The Unified Process has been visualized using a time-line view that shows effort per parallel discipline occurring across time. This visualization is called the Unified Process diagram. We use this diagram as inspiration to produce Recovered Unified Process Views (RUPV) that are a concrete version of this theoretical Unified Process diagram. We then validate these methods using case studies of multiple open source software systems. Software development process Software Engineering Mining Software Repositories Software Process Recovery Computer Science (Software Engineering)
13	Supporting Development Decisions with Software Analytics Baysal, Olga January 2014 (has links) Software practitioners make technical and business decisions based on the understanding they have of their software systems. This understanding is grounded in their own experiences, but can be augmented by studying various kinds of development artifacts, including source code, bug reports, version control meta-data, test cases, usage logs, etc. Unfortunately, the information contained in these artifacts is typically not organized in the way that is immediately useful to developers’ everyday decision making needs. To handle the large volumes of data, many practitioners and researchers have turned to analytics — that is, the use of analysis, data, and systematic reasoning for making decisions. The thesis of this dissertation is that by employing software analytics to various development tasks and activities, we can provide software practitioners better insights into their processes, systems, products, and users, to help them make more informed data-driven decisions. While quantitative analytics can help project managers understand the big picture of their systems, plan for its future, and monitor trends, qualitative analytics can enable developers to perform their daily tasks and activities more quickly by helping them better manage high volumes of information. To support this thesis, we provide three different examples of employing software analytics. First, we show how analysis of real-world usage data can be used to assess user dynamic behaviour and adoption trends of a software system by revealing valuable information on how software systems are used in practice. Second, we have created a lifecycle model that synthesizes knowledge from software development artifacts, such as reported issues, source code, discussions, community contributions, etc. Lifecycle models capture the dynamic nature of how various development artifacts change over time in an annotated graphical form that can be easily understood and communicated. We demonstrate how lifecycle models can be generated and present industrial case studies where we apply these models to assess the code review process of three different projects. Third, we present a developer-centric approach to issue tracking that aims to reduce information overload and improve developers’ situational awareness. Our approach is motivated by a grounded theory study of developer interviews, which suggests that customized views of a project’s repositories that are tailored to developer-specific tasks can help developers better track their progress and understand the surrounding technical context of their working environments. We have created a model of the kinds of information elements that developers feel are essential in completing their daily tasks, and from this model we have developed a prototype tool organized around developer-specific customized dashboards. The results of these three studies show that software analytics can inform evidence-based decisions related to user adoption of a software project, code review processes, and improved developers’ awareness on their daily tasks and activities.
14	Validating the Quality of a Big Data Java Corpus Palmqvist, Simon January 2018 (has links) Recent research within the field of Software Engineering have used GitHub, the largest hub for open source projects with almost 20 million users and 57 million repositories, to mine large amounts of source code to get more trustworthy results when developing machine and deep learning models. Mining GitHub comes with many challenges since the dataset is large and the data does not only contain quality software projects. In this project, we try to mine projects from GitHub based on earlier research by others and try to validate the quality by comparing the projects with a small subset of quality projects with the help of software complexity metrics. mining software repositories GitHub GHTorrent Chidamber & Kemerer metrics software complexity Computer Sciences Datavetenskap (datalogi)
15	Inappropriate Software Changes: Rejection and Rework Souza, Rodrigo Rocha Gomes e 17 July 2015 (has links) Submitted by Rodrigo Souza (rodrigorgs@gmail.com) on 2015-08-10T11:52:14Z No. of bitstreams: 1 rodrigo-thesis-final.pdf: 3966742 bytes, checksum: a3ca54b041cfdc2f5dad0f568b6d3a61 (MD5) / Made available in DSpace on 2015-08-10T11:52:14Z (GMT). No. of bitstreams: 1 rodrigo-thesis-final.pdf: 3966742 bytes, checksum: a3ca54b041cfdc2f5dad0f568b6d3a61 (MD5) / Introdução: A escrita de mudanças no código-fonte para corrigir defeitos ou implementar novas funcionalidades é uma tarefa importante no desenvolvimento de software, uma vez que contribui para evoluir um sistema de software. Nem todas as mudanças, no entanto, são aceitas na primeira tentativa. Mudanças inadequadas podem ser rejeitadas por causa de problemas encontrados durante a revisão de código, durante o teste automatizado, ou durante o teste manual, possivelmente resultando em retrabalho. Nosso objetivo é entender melhor a associação estatística entre diferentes tipos de rejeição --- revisões de código negativas, commits suplementares, reversão de commits e reabertura de tíquetes ---, caracterizar seus impactos em um projeto e entender como elas são afetadas por certas mudanças de processo. Para este fim, esta tese apresenta uma análise de três grandes projetos de software livre desenvolvidos pela Mozilla Foundation, os quais sofreram mudanças significativas no seu processo, como a adoção de lançamentos frequentes. Métodos: Para perseguir nosso objetivo, nos baseamos em tíquetes e commits de um período de mais de quatro anos do histórico dos projetos. Computamos métricas sobre a ocorrência de diversos tipos de rejeição de mudanças e medimos o tempo que leva tanto para submeter uma mudança quanto para rejeitar mudanças inapropriadas. Além disso, validamos nossos resultados com desenvolvedores da Mozilla. Resultados: Descobrimos que técnicas usadas em estudos anteriores para detectar mudanças inadequadas são imprecisas; por isso, propusemos uma técnica alternativa. Determinamos que mudanças inadequadas são um problema relevante e diário, que afeta cerca de 18% de todos os tíquetes em um projeto. Também descobrimos que, quando a Mozilla adotou lançamentos frequentes, embora a proporção de commits revertidos tenha aumentado, as reversões foram realizadas mais cedo no processo. / Background: Writing source code changes to fix bugs or implement new features is an important software development task, as it contributes to evolve a software system. Not all changes are accepted in the first attempt, though. Inappropriate changes can be rejected because of problems found during code review, automated testing, or manual testing, possibly resulting in rework. Our objective is to better understand the statistical association between different types of rejection---negative code reviews, supplementary commits, reverts, and issue reopening---to characterize their impacts within a project, and to understand how they are affected by certain process changes. To this end, this thesis presents an analysis of three large open source projects developed by the Mozilla Foundation, which underwent significant changes in their process, such as the adoption of rapid releases. Methods: To pursue our objective, we analyzed issues and source code commits from over four years of the projects' history. We computed metrics on the occurrence of multiple types of change rejection and measured the time it takes both to submit a change and to reject inappropriate changes. Furthermore, we validated our findings by discussing them with Mozilla developers. Results: We found that techniques used in previous studies to detect inappropriate changes are imprecise; because of that, we proposed an alternative technique. We determined that inappropriate changes are a relevant, daily problem, that affects about 18% of all issues in a project. We also discovered that, under rapid releases, although the proportion of reverted commits at Mozilla increased, the reverts were performed earlier in the process. software engineering software evolution mining software repositories software quality release engineering
16	Boa Views: Enabling Modularization and Sharing of Boa Queries Hung, Che Shian 09 August 2019 (has links) No description available. Computer Science
17	Observational Studies of Software Engineering Using Data from Software Repositories Delorey, Daniel Pierce 06 March 2007 (has links) (PDF) Data for empirical studies of software engineering can be difficult to obtain. Extrapolations from small controlled experiments to large development environments are tenuous and observation tends to change the behavior of the subjects. In this thesis we propose the use of data gathered from software repositories in observational studies of software engineering. We present tools we have developed to extract data from CVS repositories and the SourceForge Research Archive. We use these tools to gather data from 9,999 Open Source projects. By analyzing these data we are able to provide insights into the structure of Open Source projects. For example, we find that the vast majority of the projects studied have never had more than three contributors and that the vast majority of authors studied have never contributed to more than one project. However, there are projects that have had up to 120 contributors in a single year and authors who have contributed to more than 20 projects which raises interesting questions about team dynamics in the Open Source community. We also use these data to empirically test the belief that productivity is constant in terms of lines of code per programmer per year regardless of the programming language used. We find that yearly programmer productivity is not constant across programming languages, but rather that developers using higher level languages tend to write fewer lines of code per year than those using lower level languages. thesis empirical software engineering open source software mining software repositories programmer productivity programming languages Computer Sciences
18	MiSFIT: Mining Software Fault Information and Types Kidwell, Billy R 01 January 2015 (has links) As software becomes more important to society, the number, age, and complexity of systems grow. Software organizations require continuous process improvement to maintain the reliability, security, and quality of these software systems. Software organizations can utilize data from manual fault classification to meet their process improvement needs, but organizations lack the expertise or resources to implement them correctly. This dissertation addresses the need for the automation of software fault classification. Validation results show that automated fault classification, as implemented in the MiSFIT tool, can group faults of similar nature. The resulting classifications result in good agreement for common software faults with no manual effort. To evaluate the method and tool, I develop and apply an extended change taxonomy to classify the source code changes that repaired software faults from an open source project. MiSFIT clusters the faults based on the changes. I manually inspect a random sample of faults from each cluster to validate the results. The automatically classified faults are used to analyze the evolution of a software application over seven major releases. The contributions of this dissertation are an extended change taxonomy for software fault analysis, a method to cluster faults by the syntax of the repair, empirical evidence that fault distribution varies according to the purpose of the module, and the identification of project-specific trends from the analysis of the changes. software faults software fault classification software taxonomy mining software repositories software evolution Computational Engineering Computer and Systems Architecture Computer Engineering
19	Multi-version software quality analysis through mining software repositories Kriukov, Illia January 2018 (has links) The main objective of this thesis is to identify how the software repository features influence software quality during software evolution. To do that the mining software repository area was used. This field analyzes the rich data from software repositories to extract interesting and actionable information about software systems, projects and software engineering. The ability to measure code quality and analyze the impact of software repository features on software quality allows us to better understand project history, project quality state, development processes and conduct future project analysis. Existing work in the area of software quality describes software quality analysis without a connection to the software repository features. Thus they lose important information that can be used for preventing bugs, decision-making and optimizing development processes. To conduct the analysis specific tool was developed, which cover quality measurement and repository features extraction. During the research general procedure of the software quality analysis was defined, described and applied in practice. It was found that there is no most influential repository feature and the correlation between software quality and software repository features exist, but it is too small to make a real influence. software quality analysis mining software repositories software repository features quality measurement quality metrics Computer Sciences Datavetenskap (datalogi) Software Engineering Programvaruteknik
20	In Pursuit of Optimal Workflow Within The Apache Software Foundation January 2017 (has links) abstract: The following is a case study composed of three workflow investigations at the open source software development (OSSD) based Apache Software Foundation (Apache). I start with an examination of the workload inequality within the Apache, particularly with regard to requirements writing. I established that the stronger a participant's experience indicators are, the more likely they are to propose a requirement that is not a defect and the more likely the requirement is eventually implemented. Requirements at Apache are divided into work tickets (tickets). In our second investigation, I reported many insights into the distribution patterns of these tickets. The participants that create the tickets often had the best track records for determining who should participate in that ticket. Tickets that were at one point volunteered for (self-assigned) had a lower incident of neglect but in some cases were also associated with severe delay. When a participant claims a ticket but postpones the work involved, these tickets exist without a solution for five to ten times as long, depending on the circumstances. I make recommendations that may reduce the incidence of tickets that are claimed but not implemented in a timely manner. After giving an in-depth explanation of how I obtained this data set through web crawlers, I describe the pattern mining platform I developed to make my data mining efforts highly scalable and repeatable. Lastly, I used process mining techniques to show that workflow patterns vary greatly within teams at Apache. I investigated a variety of process choices and how they might be influencing the outcomes of OSSD projects. I report a moderately negative association between how often a team updates the specifics of a requirement and how often requirements are completed. I also verified that the prevalence of volunteerism indicators is positively associated with work completion but what was surprising is that this correlation is stronger if I exclude the very large projects. I suggest the largest projects at Apache may benefit from some level of traditional delegation in addition to the phenomenon of volunteerism that OSSD is normally associated with. / Dissertation/Thesis / Doctoral Dissertation Industrial Engineering 2017 Management Industrial engineering mining software repositories open source software process mining sequential pattern mining software analytics workflow mining

Search results