1 |
AI Supported Software Development: Moving Beyond Code CompletionPudari, Rohith 30 August 2022 (has links)
AI-supported programming has arrived, as shown by the introduction and successes of large language models for code, such as Copilot/Codex (Github/OpenAI) and AlphaCode (DeepMind). Above-average human performance on programming challenges is now possible. However, software development is much more than solving programming contests. Moving beyond code completion to AI-supported software development will require an AI system that can, among other things, understand how to avoid code smells, follow language idioms, and eventually (maybe!) propose rational software designs.
In this study, we explore the current limitations of Copilot and offer a simple taxonomy for understanding the classification of AI-supported code completion tools in this space. We first perform an exploratory study on Copilot’s code suggestions for language idioms and code smells. Copilot does not follow language idioms and avoid code smells in most of our test scenarios. We then conduct additional investigation to determine the current boundaries of Copilot by introducing a taxonomy of software abstraction hierarchies where ‘basic programming functionality’ such as code compilation and syntax checking is at the least abstract level, software architecture analysis and design are at the most abstract level. We conclude by providing a discussion on challenges for future development of AI-supported code completion tools to reach the design level of abstraction in our taxonomy. / Graduate
|
2 |
Investigating student experiences with GitHub and Stack Overflow: an exploratory studyBhasin, Trishala 29 July 2021 (has links)
Programmers who want to improve their skills and background in software development rely heavily on developer social platforms such as GitHub and Stack Overflow to enhance their learning. Stack Overflow provides answers to questions they have about languages or library skills they wish to acquire, while contributing to open-source projects hosted on sites like GitHub gives them valuable experience. Students also use these platforms during their education: most will rely heavily on Stack Overflow at some point in their schooling, while many can benefit from contributing to GitHub projects to build their expertise and professional portfolios. We already know from previous research that developers face barriers participating on these platforms, and therefore we may expect that at least some students will experience similar or possibly even bigger barriers. This research describes a semi-structured interview study followed by a survey with university students to explore how they use the GitHub and Stack Overflow platforms. I identified the benefits the students report from using these tools and the barriers they face. I have concluded with some preliminary recommendations on how to reduce the hurdles students may face with these and other developer social platforms, and I have also suggested future work to mitigate these roadblocks. / Graduate
|
3 |
How Do Java Developers Reuse StackOverflow Answers in Their GitHub Projects?Chen, Juntong 09 September 2022 (has links)
StackOverflow (SO) is a widely used question-and-answer (QandA) website for software developers and computer scientists. GitHub is a code hosting platform for collaboration and version control. Popular software libraries are open-source and published in repositories on GitHub. Preliminary observation shows developers cite SO questions in their GitHub repository. This observation inspired us to explore the relationship between SO posts and GitHub repositories; to help software developers better understand the characterization of SO answers that are reused by GitHub projects.
For this study, we conducted an empirical study to investigate the SO answers reused by Java code from public GitHub projects. We used a hybrid approach to ensure precise results: code clone detection, keyword-based search, and manual inspection. This approach helped us identify the leveraged answers from developers.
Based on the identified answers, we further investigated the topics of the discussion threads; answer characteristics (e.g., scores, ages, code lengths, and text lengths) and developers' reuse practices.
We observed both reused and unused answers. Compared with unused answers, We found that the reused answers mostly have higher scores, longer code, and longer plain text explanations. Most reused answers were related to implementing specific coding tasks. In one of our observations, 9% (40/430) of scenarios, developers entirely copied code from one or multiple answers of an SO discussion thread. Furthermore, we observed that in the other 91% (390/430) of scenarios, developers only partially reused code or created brand new code from scratch.
We investigated 130 SO discussion threads referred to by Java developers in 356 GitHub projects. We then arranged those into five different categories. Our findings can help the SO community have a better distribution of programming knowledge and skills, as well as inspire future research related to SO and GitHub. / Master of Science / StackOverflow (SO) is a widely used question-and-answer (QandA) website for software developers and computer scientists. GitHub is a code hosting platform for collaboration and version control. Popular software libraries are open-source and published in repositories on GitHub. Preliminary observation shows developers cite SO questions in their GitHub repository. This observation inspired us to explore the relationship between SO posts and GitHub repositories; to help software developers better understand the characterization of SO answers that are reused by GitHub projects. Our objectives are to guide SO answerers to help developers better; help tool builders understand how SO answers shape software products.
Thus, we conducted an empirical study to investigate the SO answers reused by Java code from public GitHub projects. We used a hybrid approach to refine our dataset and to ensure precise results. Our hybrid approach includes three steps. The first step is code clone detection. We compared two code snippets with a code clone detection tool to find the similarity. The second step is a keyword-based search. We created multiple keywords to search within GitHub code to find the referenced answers missed by step one. Lastly, we manually inspected the outputs of both step one and two to ensure zero false positives in our data. This approach helped us identify the leveraged answers from developers. Based on the identified answers, we further investigated the topics of the discussion threads, answer characteristics, and developers' reuse practices.
We observed both reused and unused answers. Compared with unused answers, We found that the reused answers mostly have higher scores, longer code, and longer plain text explanations.
Most reused answers were related to implementing specific coding tasks. In one of our observations, 9% of scenarios, developers entirely copied code from one or multiple answers of an SO discussion thread. Furthermore, we observed that in the other 91% of scenarios, developers only partially reused code or created brand new code from scratch. Our findings can help the SO community have a better distribution of programming knowledge and skills, as well as inspire future research related to SO and GitHub.
|
4 |
Impact of using Suggestion Bot while code reviewingPalvannan, Nivishree 03 July 2023 (has links)
Peer code reviews play a critical role in maintaining code quality, and GitHub has introduced several new features to assist with the review process. One of these features is suggested changes, which allows for precise code modifications in pull requests to be suggested in review comments. Despite the availability of such helpful features, many pull requests remain unattended due to lower priority. To address this issue, we developed a bot called ``Suggestion Bot" to automatically review the codebase using GitHub's suggested changes functionality. An empirical study was also conducted to compare the effectiveness of this bot with manual reviews. The findings suggest that implementing this bot can expedite response times and improve the quality of pull request comments for pull-based software development projects. In addition to providing automated suggestions, this feature also offers valuable, concise, and targeted feedback. / Master of Science / Code review, often known as peer review, is a process used to ensure the quality of software. Code review is a process in software development that involves one or more individuals examining the source code of a program, either after it has been implemented or during a pause in the development process. The creator of the code cannot be one of the individuals. "Reviewers" refers to the individuals conducting the checking, excluding the author. However, the majority of reviewers won't have the time to examine and validate the peer's code base, so they'll assign it the lowest priority possible. This could cause pull requests to stall out without being reviewed. Therefore, as part of our research, we are creating a bot called SUGGESTION BOT that provides code changes in pull requests. The author can then accept, reject, or alter these ideas as a necessary component of the pull request. Additionally, we compared the effectiveness of our bot with the manual pull request review procedure, which clearly demonstrated that the incorporation of this bot significantly shortened the turnaround time. Besides giving automated recommendations, this functionality also provides useful, brief, and focused feedback.
|
5 |
Mining energy – aware commits: exploring changes performed by open – source developers to impact the energy consumption of software systemsMOURA, Irineu Martins de Lima 24 August 2015 (has links)
Submitted by Irene Nascimento (irene.kessia@ufpe.br) on 2016-09-06T17:39:17Z
No. of bitstreams: 2
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
DissertacaoDeMestrado-IrineuMoura-imlm2.pdf: 1240260 bytes, checksum: 4bbaf8839fa3d5be7fca586e1f290f68 (MD5) / Made available in DSpace on 2016-09-06T17:39:17Z (GMT). No. of bitstreams: 2
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
DissertacaoDeMestrado-IrineuMoura-imlm2.pdf: 1240260 bytes, checksum: 4bbaf8839fa3d5be7fca586e1f290f68 (MD5)
Previous issue date: 2015-08-24 / Energy consumption has been gaining traction as yet another major concern that mainstream software developers must be aware of. It used to be mainly the focus of hardware designers and low level software developers, e.g., device driver developers. Nowadays, however, mostly due to the ubiquity of battery-powered devices, any developer in the software stack must be prepared to deal with this concern. Thus, to be able to properly assist them and to provide guidance in future research it is crucial to understand how they have been handling this matter. This thesis aims to aid in this regard by exploring a set of software changes, i.e., commits, to obtain insights into actual solutions implemented by open source developers when dealing with energy consumption. We use as our main data source GITHUB, a source code hosting platform for collaborative development, and extract a sample of the available commits across several different projects. From this sample, we manually curate a set of energy-aware commits, that is, any commit that refers to a source code change where developers intentionally modify, or aim to modify, the energy consumption (or power dissipation) of a system or make it easier for other developers or end users to do so. We then apply a qualitative research method to extract recurring patterns of information and to group the commits that intend to save energy into categories. A small survey was also conducted to assess the quality of our analysis and to further expand our understanding of the changes. During our analysis we also cover different aspects of the commits. We observe that the majority of the changes (~47%) still target lower levels of the software stack, i.e., kernels, drivers and OS-related services, while application level changes encompass ~34% of them. We notice that developers may not always be certain of the energy consumption impact of their changes before actually performing them, among our dataset we identify several instances (~12%) of commits where developers show signs of uncertainty towards their change’s effectiveness. We also highlight the possible software quality attributes that may be favored over energy efficiency. Notably, we spot a few instances of commits where developers performed a change that would negatively impact the energy consumption of the system in order to fix a bug. It is also worth noting, we draw attention to a specific group of changes which we call "energy-aware interfaces". They add tuning knobs that can be used by developers or end users to control the energy consumption of an underlying component. / O controle do consumo de energia tem ganhado cada vez mais atenção como outro tipo de interesse ao qual desenvolvedores de software devem estar atentos. Antes esse tipo de preocupação era principalmente o foco de designers de hardware e desenvolvedores de baixonível, como por exemplo, desenvolvedores de drivers de dispositivos. Entretanto, devido à ubiquidade de dispositivos dependentes de bateria, qualquer desenvolvedor deve estar preparado para enfrentar essa questão. Logo, entender como eles estão lidando com o consumo de energia é crucial para estarmos aptos a auxiliá-los e para prover uma direção adequada para pesquisas futuras. Com o intuito de ajudar nesse sentido, essa tese explora um conjunto de mudanças de software, isto é, commits, para entender melhor sobre os tipos de soluções que são implementadas de fato por desenvolvedores de código aberto quando os mesmos devem lidar com o consumo de energia. Nós utilizamos o GITHUB como nossa principal fonte de dados, uma plataforma de hospedagem de código fonte para o desenvolvimento colaborativo de projetos de software, e extraímos uma amostra dos commits disponíveis entre vários projetos diferentes. Dessa amostra, nós manualmente selecionamos um conjunto de commits "energy-aware", isto é, qualquer commit que se refere a uma modificação de código onde o desenvolvedor propositalmente modifica, ou intenciona modificar, o consumo de energia (ou a dissipação de potência) de um sistema ou torna mais fácil para que outros desenvolvedores ou usuários finais possam fazê-lo. Nós então aplicamos sobre esses commits um método de análise qualitativa para extrair padrões recorrentes de informação e para agrupar os commits que intencionam reduzir o consumo energético em categorias. Uma pequena pesquisa também foi realizada com os autores dos commits para avaliar a qualidade da nossa análise e para expandir nosso entendimento sobre as modificações. Nós também consideramos diferentes aspectos dos commits durante a análise. Observamos que a maioria das modificações (~47%) ainda se aplicam às mais baixas camadas de software, isto é, kernels e drivers, enquanto que mudanças a nível de aplicação compreendem ~34% do nosso conjunto de dados. Nós notamos que os desenvolvedores nem sempre estão seguros do impacto de suas modificações no consumo de energia antes de realizá-las, em nosso conjunto de dados identificamos várias instâncias de modificações (~12%) em que os desenvolvedores demonstram sinais de incerteza em relação à eficácia de suas mudanças. Também apontamos alguns dos possíveis atributos de qualidade de software que são favorecidos em detrimento do consumo de energia. Entre essas, destacamos alguns commits onde os desenvolvedores realizaram uma modificação que impactaria negativamente no consumo de energia com o intuito de consertar algum problema existente no software. Também achamos interessante ressaltar um grupo específico de modificações que chamamos de “interfaces energy-aware”. Elas adicionam controles no software em questão que possibilitam outros desenvolvedores ou usuários finais a ajustar o consumo de energia de algum componente subjacente.
|
6 |
A comparative assessment of improvements in workflow automation : An analysis based on GitHub Actions in opensource projectsSpångberg, Mattias, Wiklund, Albin January 2023 (has links)
The number of people working together in repositories grows every day. With increasing activity and interaction in a repository the amount of work required to maintain high quality and productivity is a problem. Automating workflows is a solution many developers lean towards in order to handle the problem but the effects of workflow automation is not yet determined enough to say that it actually helps. Based on GitHub’s workflow automation tool, GitHub Actions, this study looks at the effects of workflow automation by analysing the amount and speed of work in repositories on GitHub. To further understand the effects this study looks at the impact of the number of people interacting with a repository on the speed in which developers work. This study performs a statistical analysis on the difference between repositories that use workflow automation and those that do not to further increase knowledge of developers so that they can make informed decisions. Analysis on the effects of workflow automation shows that repositories that use it have an increased amount of committed code, more pull requests, uses issues more, faster pull request closure, and faster issue closure rates. In general repositories using workflow automation have more stars and contributors than those without. Analysis of the impact of the number of contributors show that usage of workflow automation increases with contributors. The study concludes that further research is required to determine if workflow automation is the causing factor of this or the implementation of workflow automation is an effect of increased activity in repositories.
|
7 |
Open-Source Software engagement and participation on Github pre and during the covid-19 pandemicMadyopa, Ellah January 2021 (has links)
In my study, I present the mining, collection and analysis of GitHub projects data in an endeavor to understand how the activity and engagement on the different projects has been before and during the Covid-19 pandemic. Data was collected from 20 repositories via Github API. I eventually applied some statistical analysis of the data, applied the ANOVA tests measuring the p-values to understand the level of variance pre and during the Covid-19 pandemic. Open-Source software have been used since long before and my study seeks to explore the magnitude to which Open-Source software participation on different projects has been affected by the different work environments that the users have become accustomed to lately. Open-Source software has been under study previously by different authors ie on the participation of users etc. No study has yet been done on the impact a global pandemic has on the engagement on Open-Source platform on different projects, analysing the trends of participation along the project’s life cycle pre and during the covid-19 pandemic. In these unpredictable, interesting times the study is aimed at highlighting how Open-Source engagement has been behaving by looking at the trends, patterns of engagement, the decrease or increase of activity in certain projects. From the results I realise trends and patterns in some projects and also interesting insights in the Github OS project lifecycles. The findings of my study pointed to how participation and engagement on GitHub Open-Source increased During the Pandemic more than During the Pandemic and this is evidenced by 70% of all the 20 repositories I took under investigation. / <p>The presentation was held via Zoom </p>
|
8 |
Lagring av JSON-objekt i MySQL med datatyperna BLOB och JSON : Jämförelse av prestanda vid hämtning av JSON-objekt lagrade som datatypen BLOB och datatypen JSON i MySQL / Storing JSON-objects in MySQL with the datatypes BLOB and JSON : Comparison in performance for retrieving JSON-objects stored as the datatype BLOB and the datatype JSON in MySQLLarsson, Mikael January 2018 (has links)
Det finns intresse för att forska på data från Github. Genom ett API, Github REST API, kan man få data från Github i JSON-format som sen kan lagras i en databas för forskning och analysering. Från och med MySQL 5.7.8 finns det en specifik datatyp för JSON, men det finns även argument för respektive datatyp för BLOB. Med tanke på att stor vikt ligger på databasen för att erhålla snabbhet, effektivitet och bearbetning av data analyseras respektive datatyp för att avgöra vilken datatyp som presterar bäst vid hämtning av data. Genom att utföra ett experiment där tiden det tar för databasen att hantera en fråga påvisar resultatet vilken datatyp som är mest lämplig. Resultatet visar att respektive datatyp för BLOB presterar bättre än datatypen JSON, särskilt ju större JSON-objekten är.
|
9 |
Linking Developer Experience with Lambda and Smart Pointer UsageRoos, Marcus, Karlsson, Alexander January 2022 (has links)
Assessing developer’s experience and proficiency haslong been a tough task for recruiters to tackle properly. Afunctional programming concept known as lambdas has been proven to be an indicator of more experienced developers. Previous existing studies also showed that GitHub repositories with more experienced developers use lambda functions to a greater extent. Previous research raises the question of whether other functional programming concepts such as smart pointers can be a potential indicator of a developer’s experience. To the best of our knowledge, no attempts have been made to link lambda or smart pointer usage to the individual GitHub developer’s experience level. This thesis aims to address this gap and investigate if lambda or smart pointers be linked to an individual developer’s experience level. To achieve this, we propose a new metric called User Repository Experience (URE), which will rank the developers within a repository in different percentiles. We also developed a tool that analyze the commits found in GitHub repositories, locates lambda and smart pointers and links them to the URE metric, this data is then saved to a log file. We designed a second tool that parses the log files and then prints the data in a readable format. The results from this study showed that lambda and smart pointers are both valid and promising indicators of experience, and thus the URE is a potential metric for representing more experienced developers.
|
10 |
An Investigation of the Viability of Crowdfunded Open Source Software : The Case of Open Source NPM Packages Funded Through GitHub SponsorsNihlén, Malcolm January 2022 (has links)
Open Source software plays an essential part in modern software development. Ifn ot used directly by a project, it may very well be used second or even third hand through its dependencies. Any disruption in the delivery, quality or reliability of a vital dependency may send large shock waves throughout a project. This poses the question of how much trust we can put into these open source dependencies and their overall viability. In this study we investigate the effects funding through GitHub Sponsors has on NPM packages and how it affects their viability. By analyzing data queried from GitHub, NPM and NPMS we analyze how different factors affects the viability of NPM packages, with the overall aim of determining the impact funding has on a projects viability.
|
Page generated in 0.0331 seconds