381 |
Decompiling Go : Using metadata to improve decompilation readability / Dekompilering av Go : Att använda metadata för att förbättra dekompileringens läsbarhetGrenfeldt, Mattias January 2023 (has links)
Malware written in Go is on the rise, and yet, tools for investigating Go programs, such as decompilers, are limited. A decompiler takes a compiled binary and tries to recover its source code. Go is a high-level language that requires runtime metadata to implement many of its features, such as garbage collection and polymorphism. While decompilers have to some degree used this metadata to benefit manual reverse engineering, there is more that can be done. To remedy this, we extend the decompiler Ghidra with improvements that increase the readability of the decompilation of Go binaries by using runtime metadata. We make progress towards enabling Ghidra to represent Go's assembly conventions. We implement multiple analyses: some which reduce noise for the reverse engineer to filter through, some which enhance the decompilation by adding types, etc. The analyses are a mix of reimplementations of previous work and novel improvements. The analyses use metadata known beforehand but in new ways: applying data types at polymorphic function call sites, and using function names to import signatures from source code. We also discover previously unused metadata, which points to promising future work. Our experimental evaluation compares our extension against previously existing extensions for decompilers using multiple readability metrics. Our extension improves on metrics measuring the amount of code, such as lines of code. It also decreases the number of casts. However, the extension performs worse on other metrics, producing more variables and glue functions. In conclusion, our extension produces more compact code while also increasing its informativeness for the reverse engineer. / Datorvirus skrivna i Go ökar, men verktyg för att undersöka Go-program, såsom dekompilatorer, är begränsade. En dekompilator tar en kompilerad binär och försöker återskapa dess ursprungliga källkod. Go är ett högnivåspråk som kräver metadata under körtid för att implementera många av dess funktionaliteter, såsom automatisk minneshantering och polymorfism. Medan dekompilatorer i någon mån har använt denna metadata för att gynna manuell reverse engineering, så finns det mer som kan göras. För att åtgärda detta bygger vi en utökning till dekompilatorn Ghidra som förbättrar dekompileringens läsbarhet för Go-binärer genom att använda körtidsmetadata. Vi gör framsteg mot att få Ghidra att kunna representera Gos assemblerkonventioner. Vi implementerar flera analyser: några som minskar bruset för undersökaren att filtrera bort, några som förbättrar dekompileringen genom att lägga till datatyper, etc. Vissa analyser är återimplementationer av tidigare arbeten, och vissa är originella. Analyserna använder tidigare känd metadata, men på nya sätt: de applicerar datatyper vid anrop till polymorfiska funktioner, och använder funktionsnamn för att importera funktionssignaturer från källkod. Vi upptäcker även tidigare okänd metadata, som är lovande att undersöka i framtida studier. Vår experimentella utvärdering jämför vår utökning mot tidigare existerande utökningar av dekompilatorer med hjälp av flera läsbarhetsmått. Vår utökning förbättrar mått av mängd kod, såsom antal kodrader. Den minskar också antalet typkonverteringar. Dock så presterar utökningen sämre på andra mått och producerar fler variabler och limfunktioner. Sammanfattningsvis producerar vår utökning mer kompakt kod samtidigt som den ökar mängden användbar information tillgänglig för undersökaren.
|
382 |
Automated Adaptive Software Maintenance: A Methodology and Its ApplicationsTansey, Wesley 11 August 2008 (has links)
In modern software development, maintenance accounts for the majority of the total cost and effort in a software project. Especially burdensome are those tasks which require applying a new technology in order to adapt an application to changed requirements or a different environment.
This research explores methodologies, techniques, and approaches for automating such adaptive maintenance tasks. By combining high-level specifications and generative techniques, a new methodology shapes the design of approaches to automating adaptive maintenance tasks in the application domains of high performance computing (HPC) and enterprise software. Despite the vast differences of these domains and their respective requirements, each approach is shown to be effective at alleviating their adaptive maintenance burden.
This thesis proves that it is possible to effectively automate tedious and error-prone adaptive maintenance tasks in a diverse set of domains by exploiting high-level specifications to synthesize specialized low-level code. The specific contributions of this thesis are as follows: (1) a common methodology for designing automated approaches to adaptive maintenance, (2) a novel approach to automating the generation of efficient marshaling logic for HPC applications from a high-level visual model, and (3) a novel approach to automatically upgrading legacy enterprise applications to use annotation-based frameworks.
The technical contributions of this thesis have been realized in two software tools for automated adaptive maintenance: MPI Serializer, a marshaling logic generator for MPI applications, and Rosemari, an inference and transformation engine for upgrading enterprise applications.
This thesis is based on research papers accepted to IPDPS '08 and OOPSLA '08. / Master of Science
|
383 |
Exploring school atlases: applying digital tools for visual data analysis and data managementNyamador, Enock Seth, Moser, Jana, Meyer, Philipp 15 November 2024 (has links)
Digital tools and computer programming are useful in easing and improving the speed and repeat-
ability of outputs in social sciences and humanities research. Data visualisation plays an important
role in getting insights into (large) datasets, communicating results and sharing knowledge amongst
researchers. There exist several tools and software for data collection and visualisation but they
are not always designed to fit all situations. In this rather technical working paper, we present some
possibilities and advantages of using computer programming within the scope of a research project:
(1) analysing quantitative datasets through means of visualisations produced within our research by
(de)coding school atlases, and (2) data management for large sets of source-data, especially optimis-
ation and embedding of coherent metadata in atlas scans to prepare for archiving and reuse. Together,
we have developed an effective and efficient technical workflow for the processing, visualisation and
management of our research data.
|
384 |
Hide-Metadata Based Data Integration Environment for Hydrological DatasetsRavindran, Nimmy 30 December 2004 (has links)
Efficient data integration is one of the most challenging problems in data management, interoperation and analysis. The Earth science data which are heterogeneous are collected at various geographical locations for scientific studies and operational uses. The intrinsic problem of archiving, distributing and searching such huge scientific datasets is compounded by the heterogeneity of data and queries, thus limiting scientific analysis, and generation/validation of hydrologic forecast models. The data models of hydrologic research communities such as National Weather Service (NWS), National Oceanic and Atmospheric Administration (NOAA), and US Geological Survey (USGS) are diverse and complex. A complete derivation of any useful hydrological models from data integrated from all these sources is often a time consuming process.
One of the current trends of data harvesting in scientific community is towards a distributed digital library initiative. However, these approaches may not be adequate for data sources / entities who do not want to "upload" the data into a "data pool." In view of this, we present here an effective architecture to address the issues of data integration in such a diverse environment for hydrological studies. The heterogeneities in these datasets are addressed based on the autonomy of data source in terms of design, communication, association and execution using a hierarchical integration model. A metadata model is also developed for defining data as well as the data sources, thus providing a uniform view of the data for different kind of users. An implementation of the model using web based system that integrates widely varied hydrology datasets from various data sources is also being developed. / Master of Science
|
385 |
Discipline-Independent Text Information Extraction from Heterogeneous Styled References Using Knowledge from the WebPark, Sung Hee 11 July 2013 (has links)
In education and research, references play a key role. They give credit to prior works, and provide support for reviews, discussions, and arguments. The set of references attached to a publication can help describe that publication, can aid with its categorization and retrieval, can support bibliometric studies, and can guide interested readers and researchers. If suitably analyzed, that set can aid with the analysis of the publication itself, especially regarding all its citing passages. However, extracting and parsing references are difficult problems. One concern is that there are many styles of references, and identifying what style was employed is problematic, especially in heterogeneous collections of theses and dissertations, which cover many fields and disciplines, and where different styles may be used even in the same publication. We address these problems by drawing upon suitable knowledge found in the WWW. In particular, we use appropriate lists (e.g., of names, cities, and other types of entities). We use available information about the many reference styles found, in a type of reverse engineering. We use available references to guide machine learning. In particular, we research a two-stage classifier approach, with multi-class classification with respect to reference styles, and partially solve the problem of parsing surface representations of references. We describe empirical evidence for the effectiveness of our approach and plans for improvement of our method. / Ph. D.
|
386 |
What do all the numbers mean? Making sure that we have all the pieces of the puzzle.Sparrow, Thomas, Gaffney, Christopher F., Schmidt, Armin R. January 2009 (has links)
No / No Abstract
|
387 |
An Extensible Framework for Annotation-based Parameter Passing in Distributed Object SystemsGopal, Sriram 28 July 2008 (has links)
Modern distributed object systems pass remote parameters based on their runtime type. This design choice limits the expressiveness, readability, and maintainability of distributed applications. While a rich body of research is concerned with middleware extensibility, modern distributed object systems do not offer programming facilities to extend their remote parameter passing semantics. Thus, extending these semantics requires understanding and modifying the underlying middleware implementation.
This thesis addresses these design shortcomings by presenting (i) a declarative and extensible approach to remote parameter passing that decouples parameter passing from parameter types, and (ii) a plugin-based framework, DeXteR, that enables the programmer to extend the native set of remote parameter passing semantics, without having to understand or modify the underlying middleware implementation.
DeXteR treats remote parameter passing as a distributed cross-cutting concern. It uses generative and aspect-oriented techniques, enabling the implementation of different parameter passing semantics as reusable application-level plugins that work with application, system, and third-party library classes. The flexibility and expressiveness of the framework is validated by implementing several non-trivial parameter passing semantics as DeXteR plugins. The material presented in this thesis has been accepted for publication at the ACM/USENIX Middleware 2008 conference. / Master of Science
|
388 |
Towards a Data Quality Framework for Heterogeneous DataMicic, Natasha, Neagu, Daniel, Campean, Felician, Habib Zadeh, Esmaeil 22 April 2017 (has links)
Yes / Every industry has significant data output as a product of their working process, and with the recent advent of big data mining and integrated data warehousing it is the case for a robust methodology for assessing the quality for sustainable and consistent processing. In this paper a review is conducted on Data Quality (DQ) in multiple domains in order to propose connections between their methodologies. This critical review suggests that within the process of DQ assessment of heterogeneous data sets, not often are they treated as separate types of data in need of an alternate data quality assessment framework. We discuss the need for such a directed DQ framework and the opportunities that are foreseen in this research area and propose to address it through degrees of heterogeneity.
|
389 |
Exploring JPEG File Containers Without Metadata : A Machine Learning Approach for Encoder ClassificationIko Mattsson, Mattias, Wagner, Raya January 2024 (has links)
This thesis explores a method for identifying JPEG encoders without relying on metadata by analyzing characteristics inherent to the JPEG file format itself. The approach uses machine learning to differentiate encoders based on features such as quantization tables, Huffman tables, and marker sequences. These features are extracted from the file container and analyzed to identify the source encoder. The random forest classification algorithm was applied to test the efficacy of the approach across different datasets, aiming to validate the model's performance and reliability. The results confirm the model's capability to identify JPEG source encoders, providing a useful approach for digital forensic investigations.
|
390 |
La modélisation d'objets pédagogiques pour une plateforme sémantique d'apprentissage / The modeling of learning objects for a semantic learning platformBalog-Crisan, Radu 13 December 2011 (has links)
Afin de rendre les objets pédagogiques (OP) accessibles, réutilisables et adaptables, il est nécessaire de les modéliser. Outre la forme et la structure, il faut aussi décrire la sémantique des OP. Ainsi, nous proposons un schéma de modélisation d'OP d'après la norme LOM (Learning Object Metadata), en utilisant un modèle de données de type RDF (Ressource Description Framework). Pour encoder, échanger et réutiliser les métadonnées structurées d'OP, nous avons implémenté l'application RDF4LOM (RDF for LOM). Le recours aux outils du Web sémantique nous permet de proposer le prototype d'une plateforme sémantique d'apprentissage (SLCMS), qui valorise à la fois les ressources internes, les OP modélisés avec RDF, ainsi que les ressources externes (wikis, blogs ou encore agendas sémantiques). L'architecture du SLCMS est basée sur un Noyau sémantique capable d'interpréter les métadonnées et de créer des requêtes intelligentes. Pour la description des contraintes sémantiques et des raisonnements sur les OP, nous utilisons les ontologies. Grâce à des ontologies précises et complètes, les OP seront « interprétables » et « compréhensibles » par les machines. Pour le module Quiz sémantique, nous avons modélisé l'ontologie Quiz et l'ontologie LMD. La plateforme sémantique d'apprentissage permet la recherche d'OP pertinents, la génération de parcours personnalisés pour les apprenants et, en perspective, l'adaptabilité aux styles d'apprentissage. / In order to make Learning Objects (LO) accessible, reusable and adaptable, it is necessary to model them. Besides form and structure, one must also define the semantics associated with a given LO. Thus, we propose a modeling scheme for LOs that respects the LOM (Learning Object Metadata) standard and which uses a RDF-based (Resource Description Framework) data model. In order to encode, exchange and reuse such structured metadata for LOs, we have developed the RDF4LOM (RDF for LOM) application. By using Semantic Web tools, we are able to deliver a prototype of a semantic learning platform (SLCMS) that enhances internal resources, LOs modeled with RDF as well as external resources (semantic wikis, blogs or calendars). The architecture of this SLCMS is based upon a semantic Kernel whose role is to interpret metadata and create intelligent queries. We use ontologies, for the description of semantic constraints and reasoning rules concerning the LOs. By means of accurate and complete ontologies, the LOs will be machine-interpretable and also machine-understandable. For the semantic Quiz module, we have developed the Quiz and LMD ontologies. The semantic learning platform enables searching for appropriate LOs, generating personalized learning paths for learners and, as en evolution, adaptation to learning styles.
|
Page generated in 0.0573 seconds