Spelling suggestions: "subject:"core translation""
1 |
Deep Learning for Code Generation using Snippet Level Parallel DataJain, Aneesh 05 January 2023 (has links)
In the last few years, interest in the application of deep learning methods for software engineering tasks has surged. A variety of different approaches like transformer based methods, statistical machine translation models, models inspired from natural language settings have been proposed and shown to be effective at tasks like code summarization, code synthesis and code translation. Multiple benchmark data sets have also been released but all suffer from one limitation or the other. Some data sets only support a select few programming languages while others support only certain tasks. These limitations restrict researchers' ability to be able to perform thorough analyses of their proposed methods. In this work we aim to alleviate some of the limitations faced by researchers who work in the paradigm of deep learning applications for software engineering tasks. We introduce a large, parallel, multi-lingual programming language data set that supports tasks like code summarization, code translation, code synthesis and code search in 7 different languages. We provide benchmark results for the current state of the art models on all these tasks and we also explore some limitations of current evaluation metrics for code related tasks. We provide a detailed analysis of the compilability of code generated by deep learning models because that is a better measure of ascertaining usability of code as opposed to scores like BLEU and CodeBLEU. Motivated by our findings about compilability, we also propose a reinforcement learning based method that incorporates code compilability and syntax level feedback as rewards and we demonstrate it's effectiveness in generating code that has less syntax errors as compared to baselines. In addition, we also develop a web portal that hosts the models we have trained for code translation. The portal allows translation between 42 possible language pairs and also allows users to check compilability of the generated code. The intent of this website is to give researchers and other audiences a chance to interact with and probe our work in a user-friendly way, without requiring them to write their own code to load and inference the models. / Master of Science / Deep neural networks have now become ubiquitous and find their applications in almost every technology and service we use today. In recent years, researchers have also started applying neural network based methods to problems in the software engineering domain. Software engineering by it's nature requires a lot of documentation, and creating this natural language documentation automatically using programs as input to the neural networks has been one their first applications in this domain. Other applications include translating code between programming languages and searching for code using natural language as one does on websites like stackoverflow. All of these tasks now have the potential to be powered by deep neural networks. It is common knowledge that neural networks are data hungry and in this work we present a large data set containing codes in multiple programming languages like Java, C++, Python, C#, Javascript, PHP and C. Our data set is intended to foster more research in automating software engineering tasks using neural networks. We provide an analysis of performance of multiple state of the art models using our data set in terms of compilability, which measures the number of syntax errors in the code, as well as other metrics. In addition, propose our own deep neural network based model for code translation, which uses feedback from programming language compilers in order to reduce the number of syntax errors in the generated code. We also develop and present a website where some of our code translation models have been hosted. The website allows users to interact with our work in an easy manner without any knowledge of deep learning and get a sense of how these technologies are being applied for software engineering tasks.
|
2 |
Helping Developers Migrate their Code across Programming LanguagesElarnaoty, Mohammed Elsayed 15 October 2024 (has links)
Migrating source code from one programming language to another is a common task in software development.
This migration can be done by completely rewriting the code in the target language, or it can be facilitated through code-reuse or automation techniques.
This thesis explores both approaches.
For code-reuse, two new cross-language code search techniques are proposed that enable developers to search for code in one language using code from another.
These techniques address the limitations of existing methods in the context of code migration.
The first technique leverages a Siamese network combined with Word2Vec embeddings, while the second employs transformers.
For code automation, the concept of Translation Types is introduced to categorize code translations.
An empirical study was conducted to analyze the differences between human-translated and machine-translated code.
Based on these findings, two multi-output code translation techniques were developed that produce multiple translations aligned with the different styles that developers use when translating their code.
The first tool employs a denoising autoencoder and a blueprint-guided beam search algorithm to generate translations of specific types.
This algorithm mimics the translation operations that developers apply in similar software projects.
The second tool utilizes GPT-4 with a specialized prompt to generate translations tailored to the requested types.
In the evaluation, these approaches produced automated code translations that better aligned with developer preferences while maintaining correctness compared to existing methods. / Doctor of Philosophy / In the world of software development, it is often necessary to convert code written in one programming language into another. This process can be quite time-consuming, especially if developers have to rewrite everything from scratch. To make this task easier, this thesis explores two approaches: finding reusable code snippets in other languages and using automated tools to translate code.
Firstly, this thesis presents two techniques that help developers search for similar code written in different programming languages. These techniques aim to accurately retrieve potential code snippets, ensuring that developers find what they need quickly, with the most relevant results appearing at the top of the list. The two techniques use machine learning models to understand and match code across languages.
Additionally, this thesis explores ways to automate code translation by recognizing that different developers have their own style when translating code. A taxonomy of "Translation Types" is introduced to capture these differences. After studying how human and machine translations vary, two existing tools were adapted to generate translations. The first tool uses machine learning to create translations based on common developer patterns, while the second employs the powerful GPT-4 model to produce translations tailored to specific developer styles.
Overall, the presented approaches in this thesis enable developers to convert code accurately and efficiently, reducing the time and effort needed for software migration.
|
3 |
Open Code Translation from Executable and Translatable UML Models - Implicit BridgingLöfqvist, Mikael January 2007 (has links)
<p>Executable and Translatable UML (xtUML) is the next abstraction level in software development, where both programming language and software architecture have been abstracted away. xtUML is a well defined UML profile, extended with precise action semantics. This allows the developers to define a problem area, domain, in such a detail that it can be executed. By defining the system with xtUML-models, domains, the system functionality can be verified early in the development process. Translation to code can be done in different ways and this work will be performed in an environment where code is automatically generated with a model compiler.</p><p>The goal with a domain is that it should be independent of other domains, reused without modification and exchanged with another domain solving the same problem. However a domain can make assumptions that certain functionality is available and these assumptions are requirements for another domain.</p><p>To fulfil these goals there must be a minimal coupling between the domains. This can be solved with the technique Implicit Bridging, where the bridge dependency between domains is defined in a bridge. The dependency is in the form of mappings/coupling between elements in both domains. By defining a bridge interface for a server domain a client domain can use the resources offered by the server domain.</p><p>The work performed shows how an implementation of Implicit Bridging could be realized by applying the technique in a microwave oven system. From the system design five different mapping types have been implemented. The applicability and the quality of the implementation have been verified by testing the generated system functionality and also verifying the goals, exchangeability and reuse of domains, of the system.</p>
|
4 |
Open Code Translation from Executable and Translatable UML Models - Implicit BridgingLöfqvist, Mikael January 2007 (has links)
Executable and Translatable UML (xtUML) is the next abstraction level in software development, where both programming language and software architecture have been abstracted away. xtUML is a well defined UML profile, extended with precise action semantics. This allows the developers to define a problem area, domain, in such a detail that it can be executed. By defining the system with xtUML-models, domains, the system functionality can be verified early in the development process. Translation to code can be done in different ways and this work will be performed in an environment where code is automatically generated with a model compiler. The goal with a domain is that it should be independent of other domains, reused without modification and exchanged with another domain solving the same problem. However a domain can make assumptions that certain functionality is available and these assumptions are requirements for another domain. To fulfil these goals there must be a minimal coupling between the domains. This can be solved with the technique Implicit Bridging, where the bridge dependency between domains is defined in a bridge. The dependency is in the form of mappings/coupling between elements in both domains. By defining a bridge interface for a server domain a client domain can use the resources offered by the server domain. The work performed shows how an implementation of Implicit Bridging could be realized by applying the technique in a microwave oven system. From the system design five different mapping types have been implemented. The applicability and the quality of the implementation have been verified by testing the generated system functionality and also verifying the goals, exchangeability and reuse of domains, of the system.
|
5 |
Rekonstrukcijos metodų analizė modernizuojant informacinę sistemą / Analysis of software re-engineering methods for modernization of information systemMalinauskienė, Eglė 27 May 2004 (has links)
This master thesis covers re-engineering methods of legacy systems. Legacy system is an old system, which is hardly compliant with modern technologies and used only because it has become an integral part of organization business process support during the long period of its maintenance. These systems are large, monolithic and difficult to modify, and cost and risk of their replacement are difficult to predict. The science of software engineering offers an incremental modernization of information systems applying the re-engineering of legacy software. The main goal of software re-engineering is to transform the software in the way, it would become easier to understand, maintain and re-use, at the same time preserving its useful, time trusted functions. The main re-engineering methods are source code translation, reverse engineering and data re-engineering. This thesis covers the analysis of these methods, which was made during the re-engineering of wood production and sales accounting system. The adoption and realization time rate of every method was examined. The influence of the applied re-engineering methods to the system reliability, efficiency, usability and other quality metrics is given.
|
6 |
Hardware-in-the-Loop Simulation of Aircraft ActuatorBraun, Robert January 2009 (has links)
<p>Advanced computer simulations will play a more and more important role in future aircraft development and aeronautic research. Hardware-in-the-loop simulations enable examination of single components without the need of a full-scale model of the system. This project investigates the possibility of conducting hardware-in-the-loop simulations using a hydraulic test rig utilizing modern computer equipment. Controllers and models have been built in Simulink and Hopsan. Most hydraulic and mechanical components used in Hopsan have also been translated from Fortran to C and compiled into shared libraries (.dll). This provides an easy way of importing Hopsan models in LabVIEW, which is used to control the test rig. The results have been compared between Hopsan and LabVIEW, and no major differences in the results could be found. Importing Hopsan components to LabVIEW can potentially enable powerful features not available in Hopsan, such as hardware-in-the-loop simulations, multi-core processing and advanced plotting tools. It does however require fast computer systems to achieve real-time speed. The results of this project can provide interesting starting points in the development of the next generation of Hopsan.</p>
|
7 |
Hardware-in-the-Loop Simulation of Aircraft ActuatorBraun, Robert January 2009 (has links)
Advanced computer simulations will play a more and more important role in future aircraft development and aeronautic research. Hardware-in-the-loop simulations enable examination of single components without the need of a full-scale model of the system. This project investigates the possibility of conducting hardware-in-the-loop simulations using a hydraulic test rig utilizing modern computer equipment. Controllers and models have been built in Simulink and Hopsan. Most hydraulic and mechanical components used in Hopsan have also been translated from Fortran to C and compiled into shared libraries (.dll). This provides an easy way of importing Hopsan models in LabVIEW, which is used to control the test rig. The results have been compared between Hopsan and LabVIEW, and no major differences in the results could be found. Importing Hopsan components to LabVIEW can potentially enable powerful features not available in Hopsan, such as hardware-in-the-loop simulations, multi-core processing and advanced plotting tools. It does however require fast computer systems to achieve real-time speed. The results of this project can provide interesting starting points in the development of the next generation of Hopsan.
|
Page generated in 0.1412 seconds