Global ETD Search

71	Monitoramento de métricas de código-fonte em projetos de software livre / Source code metrics tracking on free and open source projects Paulo Roberto Miranda Meirelles 20 May 2013 (has links) Nesta tese de doutorado, apresentamos uma abordagem para a observação das métricas de código-fonte, estudando-as através de suas distribuições e associações, além de discutir as relações de causalidade e implicações práticas-gerenciais para monitoramento das mesmas. Em nossos estudos avaliamos a distribuição e correlações dos valores das métricas de 38 projetos de software livre, dentre os com mais contribuidores ativos em seus repositórios. Para tal, coletamos e analisamos os valores para cada métrica em mais de 344.872 classes e módulos dos projetos avaliados. Complementarmente, para mostrarmos a utilidade do monitoramento de métricas, descrevemos uma extensão e adaptação do modelo de causalidade do conceito de atratividade de projetos de software livre, que indica uma relação estatística entre os valores das métricas de código-fonte e a quantidade de downloads, contribuidores e atualizações (commits) nos repositórios dos projetos. Para isso, realizamos estudos empíricos com milhares de projetos de software livre. Do ponto de vista prático, também contribuímos com um conjunto de ferramentas inovador para a automação da avaliação de projetos de software livre, com ênfase nos estudos e na seleção de métricas, o que permite a análise de código-fonte de acordo com a percepção de qualidade das comunidades de software livre. Entre as principais contribuições desta tese está uma análise detalhada, em relação ao comportamento, valores e estudos de caso, de 15 métricas de código-fonte, o que representa um avanço em comparação a literatura relacionada ao ampliar o número de métricas avaliadas e propor uma abordagem que visa diminuir as contradições das análises das métricas. / In this Ph.D dissertation we present an approach about source code metrics tracking. We have researched source code metrics distributions and associations to discuss their causality and management-practices implications. Our studies have assessed distributions and correlations of source code metric values on 38 free software projects, which have a lot of activated contributors in their repositories. We have collected and analyzed metrics from 344,872 classes and modules of about 38 free software projects. Additionally, to show how it is useful to track source code metrics, we have extended the model of free software attractiveness to include source code metrics. Our technical attractiveness model indicates a statistical relationship between source code metrics and number of downloads, contributors, and commits in the analyzed free software repositories. For that, we have conducted empirical studies with 8,450 free software projects. From a practical point of view, we have contributed with a set of innovative tools for automated evaluation of free software projects. Our tool allow the analyses of source code metrics that mirror quality perceptions from the free software communities point of view. engenharia de software experimental métricas de código-fonte software livre experimental software engineering. free and open source software source code metrics
72	A comparison of latency for MongoDB and PostgreSQL with a focus on analysis of source code Lindvall, Josefin, Sturesson, Adam January 2021 (has links) The purpose of this paper is to clarify the differences in latency between PostgreSQL and MongoDB as a consequence of their differences in software architecture. This has been achieved through benchmarking of Insert, Read and Update operations with the tool “Yahoo! Cloud Serving Benchmark”, and through source code analysis of both database management systems (DBMSs). The overall structure of the architecture has been researched with Big O notation as a tool to examine the complexity of the source code. The result from the benchmarking show that the latency for Insert and Update operations were lower for MongoDB, while the latency for Read was lower for PostgreSQL. The results from the source code analysis show that both DBMSs have a complexity of O(n), but that there are multiple differences in their software architecture affecting latency. The most important difference was the length of the parsing process which was larger for PostgreSQL. The conclusion is that there are significant differences in latency and source code and that room exists for further research in the field. The biggest limitation of the experiment consist of factors such as background processes which affected latency and could not be eliminated, resulting in a low validity. Architecture Database Management Systems Latency MongoDB NoSQL PostgreSQL Relational Database Management Systems Source Code Analysis Computer Engineering Datorteknik
73	Investigating topic modeling techniques for historical feature location. Schulte, Lukas January 2021 (has links) Software maintenance and the understanding of where in the source code features are implemented are two strongly coupled tasks that make up a large portion of the effort spent on developing applications. The concept of feature location investigated in this thesis can serve as a supporting factor in those tasks as it facilitates the automation of otherwise manual searches for source code artifacts. Challenges in this subject area include the aggregation and composition of a training corpus from historical codebase data for models as well as the integration and optimization of qualified topic modeling techniques. Building up on previous research, this thesis provides a comparison of two different techniques and introduces a toolkit that can be used to reproduce and extend on the results discussed. Specifically, in this thesis a changeset-based approach to feature location is pursued and applied to a large open-source Java project. The project is used to optimize and evaluate the performance of Latent Dirichlet Allocation models and Pachinko Allocation models, as well as to compare the accuracy of the two models with each other. As discussed at the end of the thesis, the results do not indicate a clear favorite between the models. Instead, the outcome of the comparison depends on the metric and viewpoint from which it is assessed. feature location topic modeling changesets latent dirichlet distribution pachinko alloca-tion mining software repositories source code comprehension Software Engineering Programvaruteknik
74	[en] SYNTHESIS OF CODE ANOMALIES: REVEALING DESIGN PROBLEMS IN THE SOURCE CODE / [pt] SÍNTESE DE ANOMALIAS DE CÓDIGO: REVELANDO PROBLEMAS DE PROJETO NO CÓDIGO FONTE WILLIAN NALEPA OIZUMI 03 February 2016 (has links) [pt] Problemas de projeto afetam quase todo sistema de software, fazendo com que a sua manutenção seja cara e impeditiva. Como documentos de projeto raramente estão disponíveis, desenvolvedores frequentemente precisam identificar problemas de projeto a partir do código fonte. Entretanto, a identificação de problemas de projeto não é uma tarefa trivial por diversas razões. Por exemplo, a materialização de problemas de projeto tende a ser espalhada por diversos elementos de código anômalos na implementação. Infelizmente, trabalhos prévios assumiram erroneamente que cada anomalia de código individual – popularmente conhecida como code smell – pode ser usada como um indicador preciso de problema de projeto. Porém, evidências empíricas recentes mostram que diversos tipos de problemas de projeto são frequentemente relacionados a um conjunto de anomalias de código inter-relacionadas, conhecidas como aglomerações de anomalias de código. Neste contexto, esta dissertação propõe uma nova técnica para a síntese de aglomerações de anomalias de código. A técnica tem como objetivo: (i) buscar formas variadas de aglomeração em um programa, e (ii) sumarizar diferentes tipos de informação sobre cada aglomeração. A avaliação da técnica de síntese baseou-se na análise de diversos projetos de software da indústria e em um experimento controlado com desenvolvedores profissionais. Ambos estudos sugerem que o uso da técnica de síntese ajudou desenvolvedores a identificar problemas de projeto mais relevantes do que o uso de técnicas convencionais. / [en] Design problems affect almost all software projects and make their maintenance expensive and impeditive. As design documents are rarely available, programmers often need to identify design problems from the source code. However, the identification of design problems is not a trivial task for several reasons. For instance, the reification of a design problem tends to be scattered through several anomalous code elements in the implementation. Unfortunately, previous work has wrongly assumed that each single code anomaly - popularly known as code smell - can be used as an accurate indicator of a design problem. There is growing empirical evidence showing that several types of design problems are often related to a set of inter-related code anomalies, the so-called code-anomaly agglomerations, rather than individual anomalies only. In this context, this dissertation proposes a new technique for the synthesis of code-anomaly agglomerations. The technique is intended to: (i) search for varied forms of agglomeration in a program, and (ii) summarize different types of information about each agglomeration. The evaluation of the synthesis technique was based on the analysis of several industry-strength software projects and a controlled experiment with professional programmers. Both studies suggest the use of the synthesis technique helped programmers to identify more relevant design problems than the use of conventional techniques. [pt] SINTESE [pt] CODIGO FONTE [pt] PROBLEMA DE PROJETO [pt] ANOMALIA DE CODIGO [en] SYNTHESIS [en] SOURCE CODE [en] DESIGN PROBLEM [en] CODE SMELL
75	Learning to Edit Code : Towards Building General Purpose Models for Source Code Editing Chakraborty, Saikat January 2022 (has links) The way software developers edit code day-to-day tends to be repetitive, often using existing code elements. Many researchers have tried to automate the repetitive code editing process by mining specific change templates. However, such templates are often manually implemented for automated applications. Consequently, such template-based automated code editing is very tedious to implement. In addition, template-based code editing is often narrowly-scoped and low noise tolerant. Machine Learning, specially deep learning-based techniques, could help us solve these problems because of their generalization and noise tolerance capacities. The advancement of deep neural networks and the availability of vast open-source evolutionary data opens up the possibility of automatically learning those templates from the wild and applying those in the appropriate context. However, deep neural network-based modeling for code changes, and code, in general, introduces some specific problems that need specific attention from the research community. For instance, source code exhibit strictly defined syntax and semantics inherited from the properties of Programming Language (PL). In addition, source code vocabulary (possible number of tokens) can be arbitrarily large. This dissertation formulates the problem of automated code editing as a multi-modal translation problem, where, given a piece of code, the context, and some guidance, the objective is to generate edited code. In particular, we divide the problem into two sub-problems — source code understanding and generation. We empirically show that the deep neural networks (models in general) for these problems should be aware of the PL-properties (i.e., syntax, semantics). This dissertation investigates two primary directions of endowing the models with knowledge about PL-properties — (i) explicit encoding: where we design models catering to a specific property, and (ii) implicit encoding: where we train a very-large model to learn these properties from very large corpus of source code in unsupervised ways. With implicit encoding, we custom design the model to cater to the need for that property. As an example of such models, we developed CODIT — a tree-based neural model for syntactic correctness. We design CODIT based on the Context Free Grammar of the programming language. Instead of generating source code, CODIT first generates the tree structure by sampling the production rule from the CFG. Such a mechanism prohibits infeasible production rule selection. In the later stage, CODIT generates the edited code conditioned on the tree generated earlier. Suchconditioning makes the edited code syntactically correct. CODIT showed promise in learning code edit patterns in the wild and effectiveness in automatic program repair. In another empirical study, we showed that a graph-based model is better suitable for source code understanding tasks such as vulnerability detection. On the other hand, with implicit encoding, we use a very large (with several hundred million parameters) yet generic model. However, we pre-train these models on a super-large (usually hundreds of gigabytes) collection of source code and code metadata. We empirically show that if sufficiently pre-trained, such models are capable enough to learn PL properties such as syntax and semantics. In this dissertation, we developed two such pre-trained models, with two different learning objectives. First, we developed PLBART— the first-ever pre-trained encoder-decoder-based model for source code and show that such pre-train enables the model to generate syntactically and semantically correct code. Further, we show an in-depth empirical study on using PLBART in automated code editing. Finally, we develop another pre-trained model — NatGen to encode the natural coding convention followed by developers into the model. To design NatGen, we first deliberately modify the code from the developers’ written version preserving the original semantics. We call such transformations ‘de-naturalizing’ transformations. Following the previous studies on induced unnaturalness in code, we defined several such ‘de-naturalizing’ transformations and applied those to developer-written code. We pre-train NatGen to reverse the effect of these transformations. That way, NatGen learns to generate code similar to the developers’ written by undoing any unnaturalness induced by our forceful ‘de-naturalizing‘ transformations. NatGen has performed well in code editing and other source code generation tasks. The models and empirical studies we performed while writing this dissertation go beyond the scope of automated code editing and are applicable to other software engineering automation problems such as Code translation, Code summarization, Code generation, Vulnerability detection,Clone detection, etc. Thus, we believe this dissertation will influence and contribute to the advancement of AI4SE and PLP. Computer science Neural networks (Computer science) Computer software--Development Source code (Computer science) Deep learning (Machine learning)
76	Deep Learning Approaches for Clustering Source Code by Functionality / Djupinlärningsmetoder för gruppering av källkod efter funktionalitet Hägglund, Marcus January 2021 (has links) With the rise of artificial intelligence, applications for machine learning can be found in nearly everyaspect of modern life, from healthcare and transportation to software services like recommendationsystems. Consequently, there are now more developers engaged in the field than ever - with the numberof implementations rapidly increasing by the day. In order to meet the new demands, it would be usefulto provide services that allow for an easy orchestration of a large number of repositories. Enabling usersto easily share, access and search for source code would be beneficial for both research and industryalike. A first step towards this is to find methods for clustering source code by functionality. The problem of clustering source code has previously been studied in the literature. However, theproposed methods have so far not leveraged the capabilities of deep neural networks (DNN). In thiswork, we investigate the possibility of using DNNs to learn embeddings of source code for the purpose ofclustering by functionality. In particular, we evaluate embeddings from Code2Vec and cuBERT modelsfor this specific purpose. From the results of our work we conclude that both Code2Vec and cuBERT are capable of learningsuch embeddings. Among the different frameworks that we used to fine-tune cuBERT, we found thebest performance for this task when fine-tuning the model under the triplet loss criterion. With thisframework, the model was capable of learning embeddings that yielded the most compact and well-separated clusters. We found that a majority of the cluster assignments were semantically coherent withrespect to the functionalities implemented by the methods. With these results, we have found evidenceindicating that it is possible to learn embeddings of source code that encode the functional similaritiesamong the methods. Future research could therefore aim to further investigate the possible applicationsof the embeddings learned by the different frameworks. / Med den avsevärda ökningen av användandet av artificiell intelligens går det att finna tillämpningar förmaskininlärningsalgoritmer i nästan alla aspekter av det moderna livet, från sjukvård och transport tillmjukvarutjänster som rekommendationssystem. Till följd av detta så är det fler utvecklare än någonsinengagerade inom området, där antalet nya implementationer ökar för var dag. För att möta de nyakraven skulle det vara användbart att kunna tillhandahålla tjänster som möjliggör en enkel hantering avett stort antal kodförråd. Att göra det möjligt för användare att enkelt dela, komma åt och söka efterkällkod skulle vara till nytta inom både forskning och industri. Ett första steg mot detta är att hittametoder som gör det möjligt att klustra källkod med avseende på funktionalitet. Problemet med klustring av källkod är något som har tidigare studerats. De föreslagna metoderna hardock hittils inte utnyttjat kapaciteten hos djupa neurala nätverk (DNN). I detta arbete undersöker vimöjligheten att använda DNN för inlärning av inbäddningar av källkod i syfte att klustra med avseendepå funktionalitet. I synnerhet så utvärderar vi inbäddningar från Code2Vec- och cuBERT-modeller fördetta specifika ändamål. Från resultatet av vårt arbete drar vi slutsatsen att både Code2Vec och cuBERT har kapacitet för attlära sig sådana inbäddningar. Bland de olika ramverken som vi undersökte för att finjustera cuBERT,fann vi att modellen som finjusterades under triplet-förlustkriteriet var bäst lämpad för denna uppgift.Med detta ramverk kunde modellen lära sig inbäddningar som resulterade i de mest kompakta och välseparerade klusterna, där en majoritet av klustertilldelningarna var semantiskt sammanhängande medavseende på funktionaliteten som metoderna implementerade. Med dessa resultat har vi funnit beläggsom tyder på att det är möjligt att lära sig inbäddning av källkod som bevarar och åtger funktionellalikheter mellan metoder. Framtida forskning kan därför syfta till att ytterligare undersöka de olikamöjliga användningsområdena för de inbäddningar som lärts in inom de olika ramverken. Deep learning Cluster Analysis Source Code Code2Vec cuBERT Djupinlärning Klusteranalys Källkod Code2Vec cuBERT Probability Theory and Statistics Sannolikhetsteori och statistik
77	Mining Software Repositories to Support Software Evolution Kagdi, Huzefa H. 15 July 2008 (has links) No description available. Computer Science Software Engineering Software Evolution Mining Software Repositories MSR Source Code Analysis Software Traceability Change Analysis and Managment
78	An Empirical Study Investigating Source Code Summarization Using Multiple Sources of Information Sama, Sanjana 30 May 2018 (has links) No description available. Computer Science Information Systems Information Technology Automatic source code summarization Eye-tracking in code summarization Software engineering Code summarization
79	Un formalisme pour la traçabilité des transformations Lemoine, Mathieu 12 1900 (has links) Dans le développement logiciel en industrie, les documents de spécification jouent un rôle important pour la communication entre les analystes et les développeurs. Cependant, avec le temps, les changements de personel et les échéances toujours plus courtes, ces documents sont souvent obsolètes ou incohérents avec l'état effectif du système, i.e., son code source. Pourtant, il est nécessaire que les composants du système logiciel soient conservés à jour et cohérents avec leurs documents de spécifications pour faciliter leur développement et maintenance et, ainsi, pour en réduire les coûts. Maintenir la cohérence entre spécification et code source nécessite de pouvoir représenter les changements sur les uns et les autres et de pouvoir appliquer ces changements de manière cohérente et automatique. Nous proposons une solution permettant de décrire une représentation d'un logiciel ainsi qu'un formalisme mathématique permettant de décrire et de manipuler l'évolution des composants de ces représentations. Le formalisme est basé sur les triplets de Hoare pour représenter les transformations et sur la théorie des groupes et des homomorphismes de groupes pour manipuler ces transformations et permettrent leur application sur les différentes représentations du système. Nous illustrons notre formalisme sur deux représentations d'un système logiciel : PADL, une représentation architecturale de haut niveau (semblable à UML), et JCT, un arbre de syntaxe abstrait basé sur Java. Nous définissons également des transformations représentant l'évolution de ces représentations et la transposition permettant de reporter les transformations d'une représentation sur l'autre. Enfin, nous avons développé et décrivons brièvement une implémentation de notre illustration, un plugiciel pour l'IDE Eclipse détectant les transformations effectuées sur le code par les développeurs et un générateur de code pour l'intégration de nouvelles représentations dans l'implémentation. / When developing software system in industry, system specifications are heavily used in communication among analysts and developers. However, system evolution, employee turn-over and shorter deadlines lead those documents either not to be up-to-date or not to be consistent with the actual system source code. Yet, having up-to-date documents would greatly help analysts and developers and reduce development and maintenance costs. Therefore, we need to keep those documents up-to-date and consistent. We propose a novel mathematical formalism to describe and manipulate the evolution of these documents. The mathematical formalism is based on Hoare triple to represent the transformations and group theory and groups homomorphisms to manipulate these transformations and apply them on different representations. We illustrate our formalism using two representation of a same system: PADL, that is an abstract design specification (similar to UML), and JCT, that is an Abstract Syntax Tree for Java. We also define transformations describing their evolutions, and transformations transposition from one representation to another. Finally, we provide an implementation of our illustration, a plugin for the Eclipse IDE detecting source code transformations made by a developer and a source code generator for integrating new representations in the implementation. traçabilité traceability modèle model code source source code théorie des groupes group theory transpositions transpositions
80	Génération automatique de configurations et de scénarios d'utilisation d'outils de visualisation à partir de spécifications de tâches d'analyse de logiciels Sfayhi, Ahmed 04 1900 (has links) Nous proposons une approche qui génère des scénarios de visualisation à partir des descriptions de tâches d'analyse de code. La dérivation de scénario est considérée comme un processus d'optimisation. Dans ce contexte, nous évaluons différentes possibilités d'utilisation d'un outil de visualisation donnée pour effectuer la tâche d'analyse, et sélectionnons le scénario qui nécessite le moins d'effort d'analyste. Notre approche a été appliquée avec succès à diverses tâches d'analyse telles que la détection des défauts de conception. / We propose an approach that derives interactive visualization scenarios from descriptions of code analysis tasks. The scenario derivation is treated as an optimization process. In this context, we evaluate different possibilities of using a given visualization tool to perform the analysis task, and select the scenario that requires the least effort from the analyst. Our approach was applied successfully to various analysis tasks such as design defect detection and feature location. Rétro ingénierie Visualisation du code source Visualisation de logiciels Tâches d'analyse Reverse engineering Source code visualization Software visualization Analysis task

Search results