Global ETD Search

1	Memory Footprint Reduction of Operating System Kernels He, Haifeng January 2009 (has links) As the complexity of embedded systems grows, there is an increasing use of operating systems (OSes) in embedded devices, such as mobile phones, media players and other consumer electronics. Despite their convenience and flexibility, such operating systems can be overly general and contain features and code that are not needed in every application context, which incurs unnecessary performance overheads. In most embedded systems, resources, such as processing power, available memory, and power consumption, are strictly constrained. In particular, the amount of memory on embedded devices is often very limited. This, together with the popular usage of operating systems in embedded devices, makes it important to reduce the memory footprint of operating systems. This dissertation addresses this challenge and presents automated ways to reduce the memory footprint of OS kernels for embedded systems. First, we present kernel code compaction, an automated approach that reduces the code size of an OS kernel statically by removing unused functionality. OS kernel code tends to be different from ordinary application code, including the presence of a significant amount of hand-written assembly code, multiple entry points, implicit control flow paths involving interrupt handlers, and frequent indirect control flow via function pointers. We use a novel "approximated compilation" technique to apply source-level pointer analysis to hand-written assembly code. A prototype implementation of our idea on an Intel x86 platform and a minimally configured Linux kernel obtains a code size reduction of close to 24%.Even though code compaction can remove a portion of the entire OS kernel code, when exercised with typical embedded benchmarks, such as MiBench, most kernel code is executed infrequently if at all. Our second contribution is on-demand code loading, an automated approach that keeps the rarely used code on secondary storage and loads it into main memory only when it is needed. In order to minimize the overhead of code loading, a greedy node-coalescing algorithm is proposed to group closely related code together. The experimental results show that this approach can reduce memory requirements for the Linux kernel code by about 53%with little degradation in performance. Last, we describe dynamic data structure compression, an approach that reduces the runtime memory footprint of dynamic data structures in an OS kernel. A prototype implementation for the Linux kernel reduces the memory consumption of the slab allocators in Linux by 17.5%when running the MediaBench suite while incurring only minimal increases in execution time (1.9%). Binary rewriting Code compaction Memory footprint reduction Operating Systems
2	rave: A Framework for Code and Memory Randomization of Linux Containers Blackburn, Christopher Nogueira 23 July 2021 (has links) Memory corruption continues to plague modern software systems, as it has for decades. With the emergence of code-reuse attacks which take advantage of these vulnerabilities like Return- Oriented Programming (ROP) or non-control data attacks like Data-Oriented programming (DOP), defenses against these are growing thin. These attacks, and more advanced variations of them, are becoming more difficult to detect and to mitigate. In this arms race, it is critical to not only develop mitigation techniques, but also ways we can effectively deploy those techniques. In this work, we present rave - a framework which takes common design features of defenses against memory corruption and code-reuse and puts them in a real-world setting. Rave consists of two components: librave, the library responsible for static binary analysis and instrumentation, and CRIU-rave, an extended version of the battle-tested process migration tool available for Linux. In our prototype of this framework, we have shown that these tools can be used to rewrite live applications, like NGINX, with enough randomization to disrupt memory corruption attacks. This work is supported in part by ONR under grant N00014-18-1-2022 and NAVSEA/NEEC/NSWC Dahlgren under grant N00174-20-1-0009. / Master of Science / Memory corruption attacks continue to be a concrete threat against modern computer systems. Malicious actors can take advantage of related vulnerabilities to carry out more advance, hard-to-detect attacks which give them control of the target or leak critical information. Many works have been developed to defend against these sophisticated attacks and their triggers (memory corruption), but many struggle to be adopted into the real-world for reasons such as instability or difficulty in deployment. In this work, we introduce rave, a framework which seeks to address issues of stability and deployment by designing a way for defenders to coordinate and apply mitigation techniques in a real-world setting. Migration Binary Rewriting Memory Corruption Code-Reuse Randomization
3	Software lock elision for x86 machine code Roy, Amitabha January 2011 (has links) More than a decade after becoming a topic of intense research there is no transactional memory hardware nor any examples of software transactional memory use outside the research community. Using software transactional memory in large pieces of software needs copious source code annotations and often means that standard compilers and debuggers can no longer be used. At the same time, overheads associated with software transactional memory fail to motivate programmers to expend the needed effort to use software transactional memory. The only way around the overheads in the case of general unmanaged code is the anticipated availability of hardware support. On the other hand, architects are unwilling to devote power and area budgets in mainstream microprocessors to hardware transactional memory, pointing to transactional memory being a 'niche' programming construct. A deadlock has thus ensued that is blocking transactional memory use and experimentation in the mainstream. This dissertation covers the design and construction of a software transactional memory runtime system called SLE_x86 that can potentially break this deadlock by decoupling transactional memory from programs using it. Unlike most other STM designs, the core design principle is transparency rather than performance. SLE_x86 operates at the level of x86 machine code, thereby becoming immediately applicable to binaries for the popular x86 architecture. The only requirement is that the binary synchronise using known locking constructs or calls such as those in Pthreads or OpenMPlibraries. SLE_x86 provides speculative lock elision (SLE) entirely in software, executing critical sections in the binary using transactional memory. Optionally, the critical sections can also be executed without using transactions by acquiring the protecting lock. The dissertation makes a careful analysis of the impact on performance due to the demands of the x86 memory consistency model and the need to transparently instrument x86 machine code. It shows that both of these problems can be overcome to reach a reasonable level of performance, where transparent software transactional memory can perform better than a lock. SLE_x86 can ensure that programs are ready for transactional memory in any form, without being explicitly written for it. 005.3
4	Kernel optimization by layout restructuring / Estimation d'efficacité et restructuration automatisées de noyaux de calcul Haine, Christopher 03 July 2017 (has links) Bien penser la structuration de données est primordial pour obtenir de hautes performances, alors que les processeurs actuels perdent un temps considérable à attendre la complétion de transactions mémoires. En particulier les localités spatiales et temporelles de données doivent être optimisées.Cependant, les transformations de structures de données ne sont pas proprement explorées par les compilateurs, en raison de la difficulté que pose l'évaluation de performance des transformations potentielles. De plus,l'optimisation des structures de données est chronophage, sujette à erreur etles transformations à considérer sont trop nombreuses pour être implémentées à la main dans l'optique de trouver une version de code efficace.On propose de guider les programmeurs à travers le processus de restructuration de données grace à un retour utilisateur approfondi, tout d'abord en donnant une description multidimensionnelle de la structure de donnée initiale, faite par une analyse de traces mémoire issues du binaire de l'application de l'utilisateur, dans le but de localiser des problèmes de stride au niveau instruction, indépendemment du langage d'entrée. On choisit de focaliser notre étude sur les transformations de structure de données, traduisibles dans un formalisme proche du C pour favoriser la compréhension de l'utilisateur, que l'on applique et évalue sur deux cas d'étude qui sont des applications réelles,à savoir une simulation d'ondes cardiaques et une simulation de chromodynamique quantique sur réseau, avec différents jeux d'entrées. La prédiction de performance de différentes transformations est conforme à 5% près aux versions réécrites à la main. / Careful data layout design is crucial for achieving high performance, as nowadays processors waste a considerable amount of time being stalled by memory transactions, and in particular spacial and temporal locality have to be optimized. However, data layout transformations is an area left largely unexplored by state-of-the-art compilers, due to the difficulty to evaluate the possible performance gains of transformations. Moreover, optimizing data layout is time-consuming, error-prone, and layout transformations are too numerous tobe experimented by hand in hope to discover a high performance version. We propose to guide application programmers through data layout restructuring with an extensive feedback, firstly by providing a comprehensive multidimensional description of the initial layout, built via analysis of memory traces collected from the application binary textit {in fine} aiming at pinpointing problematic strides at the instruction level, independently of theinput language. We choose to focus on layout transformations,translatable to C-formalism to aid user understanding, that we apply and assesson case study composed of two representative multithreaded real-lifeapplications, a cardiac wave simulation and lattice QCD simulation, with different inputs and parameters. The performance prediction of different transformations matches (within 5%) with hand-optimized layout code. Profilage de performance Restructuration de données Vectorisation Réécriture de binaire Performance profiling Layout restructuring Vectorization Binary rewriting
5	Applying Dynamic Software Updates to Computationally-Intensive Applications Kim, Dong Kwan 22 July 2009 (has links) Dynamic software updates change the code of a computer program while it runs, thus saving the programmer's time and using computing resources more productively. This dissertation establishes the value of and recommends practices for applying dynamic software updates to computationally-intensive applications—a computing domain characterized by long-running computations, expensive computing resources, and a tedious deployment process. This dissertation argues that updating computationally-intensive applications dynamically can reduce their time-to-discovery metrics—the total time it takes from posing a problem to arriving at a solution—and, as such, should become an intrinsic part of their software lifecycle. To support this claim, this dissertation presents the following technical contributions: (1) a distributed consistency algorithm for synchronizing dynamic software updates in a parallel HPC application, (2) an implementation of the Proxy design pattern that is more efficient than the existing implementations, and (3) a dynamic update approach for Java Virtual Machine (JVM)-based applications using the Proxy pattern to offer flexibility and efficiency advantages, making it suitable for computationally-intensive applications. The contributions of this dissertation are validated through performance benchmarks and case studies involving computationally-intensive applications from the bioinformatics and molecular dynamics simulation domains. / Ph. D. Proxy Pattern Java Virtual Machine Binary Rewriting HPC Dynamic Software Updates Computationally-Intensive Applications
6	Cyber-Physical Analysis and Hardening of Robotic Aerial Vehicle Controllers Taegyu Kim (10716420) 06 May 2021 (has links) Robotic aerial vehicles (RAVs) have been increasingly deployed in various areas (e.g., commercial, military, scientific, and entertainment). However, RAVs’ security and safety issues could not only arise from either of the “cyber” domain (e.g., control software) and “physical” domain (e.g., vehicle control model) but also stem in their interplay. Unfortunately, existing work had focused mainly on either the “cyber-centric” or “control-centric” approaches. However, such a single-domain focus could overlook the security threats caused by the interplay between the cyber and physical domains. <br>In this thesis, we present cyber-physical analysis and hardening to secure RAV controllers. Through a combination of program analysis and vehicle control modeling, we first developed novel techniques to (1) connect both cyber and physical domains and then (2) analyze individual domains and their interplay. Specifically, we describe how to detect bugs after RAV accidents using provenance (Mayday), how to proactively find bugs using fuzzing (RVFuzzer), and how to patch vulnerable firmware using binary patching (DisPatch). As a result, we have found 91 new bugs in modern RAV control programs, and their developers confirmed 32 cases and patch 11 cases. Computer System Security Computer Systems Security Drone Embedded Systems Program Analysis Binary Analysis Forensic Analysis Fuzzing Testing Binary Rewriting
7	Généralisation de l’analyse de performance décrémentale vers l’analyse différentielle / Generalization of the decremental performance analysis to differential analysis Bendifallah, Zakaria 17 September 2015 (has links) Une des étapes les plus cruciales dans le processus d’analyse des performances d’une application est la détection des goulets d’étranglement. Un goulet étant tout évènement qui contribue à l’allongement temps d’exécution, la détection de ses causes est importante pour les développeurs d’applications afin de comprendre les défauts de conception et de génération de code. Cependant, la détection de goulets devient un art difficile. Dans le passé, des techniques qui reposaient sur le comptage du nombre d’évènements, arrivaient facilement à trouver les goulets. Maintenant, la complexité accrue des micro-architectures modernes et l’introduction de plusieurs niveaux de parallélisme ont rendu ces techniques beaucoup moins efficaces. Par conséquent, il y a un réel besoin de réflexion sur de nouvelles approches.Notre travail porte sur le développement d’outils d’évaluation de performance des boucles de calculs issues d’applications scientifiques. Nous travaillons sur Decan, un outil d’analyse de performance qui présente une approche intéressante et prometteuse appelée l’Analyse Décrémentale. Decan repose sur l’idée d’effectuer des changements contrôlés sur les boucles du programme et de comparer la version obtenue (appelée variante) avec la version originale, permettant ainsi de détecter la présence ou pas de goulets d’étranglement.Tout d’abord, nous avons enrichi Decan avec de nouvelles variantes, que nous avons conçues, testées et validées. Ces variantes sont, par la suite, intégrées dans une analyse de performance poussée appelée l’Analyse Différentielle. Nous avons intégré l’outil et l’analyse dans une méthodologie d’analyse de performance plus globale appelée Pamda.Nous décrirons aussi les différents apports à l’outil Decan. Sont particulièrement détaillées les techniques de préservation des structures de contrôle du programme,ainsi que l’ajout du support pour les programmes parallèles.Finalement, nous effectuons une étude statistique qui permet de vérifier la possibilité d’utiliser des compteurs d’évènements, autres que le temps d’exécution, comme métriques de comparaison entre les variantes Decan / A crucial step in the process of application performance analysis is the accurate detection of program bottlenecks. A bottleneck is any event which contributes to extend the execution time. Determining their cause is important for application developpers as it enable them to detect code design and generation flaws.Bottleneck detection is becoming a difficult art. Techniques such as event counts,which succeeded to find bottlenecks easily in the past, became less efficient because of the increasing complexity of modern micro-processors, and because of the introduction of parallelism at several levels. Consequently, a real need for new analysis approaches is present in order to face these challenges.Our work focuses on performance analysis and bottleneck detection of computeintensive loops in scientific applications. We work on Decan, a performance analysis and bottleneck detection tool, which offers an interesting and promising approach called Decremental Analysis. The tool, which operates at binary level, is based on the idea of performing controlled modifications on the instructions of a loop, and comparing the new version (called variant) to the original one. The goal is to assess the cost of specific events, and thus the existence or not of bottlenecks.Our first contribution, consists of extending Decan with new variants that we designed, tested and validated. Based on these variants, we developed analysis methods which we used to characterize hot loops and find their bottlenecks. Welater, integrated the tool into a performance analysis methodology (Pamda) which coordinates several analysis tools in order to achieve a more efficient application performance analysis.Second, we introduce several improvements on the Decan tool. Techniquesdeveloped to preserve the control flow of the modified programs, allowed to use thetool on real applications instead of extracted kernels. Support for parallel programs(thread and process based) was also added. Finally, our tool primarily relying on execution time as the main concern for its analysis process, we study the opportunity of also using other hardware generated events, through a study of their stability, precision and overhead Analyse de performances Réécriture binaire Analyse dynamique Analyse statique du code Optimization de code Parallélisme Accès mémoire Performance analysis Binary rewriting Dynamic code analysis Static code analysis Code optimization Parallelism Memory accesses Hardware counters

1

Page generated in 0.0535 seconds