Global ETD Search

1	Memory Footprint Reduction of Operating System Kernels He, Haifeng January 2009 (has links) As the complexity of embedded systems grows, there is an increasing use of operating systems (OSes) in embedded devices, such as mobile phones, media players and other consumer electronics. Despite their convenience and flexibility, such operating systems can be overly general and contain features and code that are not needed in every application context, which incurs unnecessary performance overheads. In most embedded systems, resources, such as processing power, available memory, and power consumption, are strictly constrained. In particular, the amount of memory on embedded devices is often very limited. This, together with the popular usage of operating systems in embedded devices, makes it important to reduce the memory footprint of operating systems. This dissertation addresses this challenge and presents automated ways to reduce the memory footprint of OS kernels for embedded systems. First, we present kernel code compaction, an automated approach that reduces the code size of an OS kernel statically by removing unused functionality. OS kernel code tends to be different from ordinary application code, including the presence of a significant amount of hand-written assembly code, multiple entry points, implicit control flow paths involving interrupt handlers, and frequent indirect control flow via function pointers. We use a novel "approximated compilation" technique to apply source-level pointer analysis to hand-written assembly code. A prototype implementation of our idea on an Intel x86 platform and a minimally configured Linux kernel obtains a code size reduction of close to 24%.Even though code compaction can remove a portion of the entire OS kernel code, when exercised with typical embedded benchmarks, such as MiBench, most kernel code is executed infrequently if at all. Our second contribution is on-demand code loading, an automated approach that keeps the rarely used code on secondary storage and loads it into main memory only when it is needed. In order to minimize the overhead of code loading, a greedy node-coalescing algorithm is proposed to group closely related code together. The experimental results show that this approach can reduce memory requirements for the Linux kernel code by about 53%with little degradation in performance. Last, we describe dynamic data structure compression, an approach that reduces the runtime memory footprint of dynamic data structures in an OS kernel. A prototype implementation for the Linux kernel reduces the memory consumption of the slab allocators in Linux by 17.5%when running the MediaBench suite while incurring only minimal increases in execution time (1.9%). Binary rewriting Code compaction Memory footprint reduction Operating Systems
2	A memory profiler for 3D graphics application using ninary instrumentation Deo, Mrinal 25 July 2011 (has links) This report describes the architecture and implementation of a memory profiler for 3D graphics applications. The memory profiling is done for parts of the program which runs on the graphics processor and is responsible for rendering the image. The shaders are parsed and every memory instruction is instrumented with additional instruction for profiling. The results are then transferred from the video memory to CPU memory. Profiling is done for a frame and completes in less than three minutes. The report also describes various analyses that can be done using the results obtained from this profiler. The report discusses the design of an analytical cache model that can be used to identify candidate memory buffers suitable for caching among all the buffers used by an application. The profiler can segregate results for reads and writes separately, can handle all formats of texture access instructions and predicated instructions. / text 3D graphics Computer graphics Binary instrumentation Render target Memory profiling Cache analysis Memory footprint Rendering DirectX
3	Evaluation de l'affectation des tâches sur une architecture à mémoire distribuée pour des modèles flot de données / Efficient evaluation of mappings of dataflow applications onto distributed memory architectures Lesparre, Youen 02 March 2017 (has links) Avec l'augmentation de l'utilisation des smartphones, des objets connectés et des véhicules automatiques, le domaine des systèmes embarqués est devenu omniprésent dans notre environnement. Ces systèmes sont souvent contraints en terme de consommation et de taille. L'utilisation des processeurs many-cores dans des systèmes embarqués permet une conception rapide tout en respectant des contraintes temps-réels et en conservant une consommation énergétique basse.Exécuter une application sur un processeur many-core requiert un dispatching des tâches appelé problème de mapping et est connu comme étant NP-complet.Les contributions de cette thèse sont divisées en trois parties :Tout d'abord, nous étendons d'importantes propriétés dataflow au modèle Phased Computation Graph.Ensuite, nous présentons un générateur de graphe dataflow capable de générer des Synchonous Dataflow Graphs, Cyclo-Static Dataflow Graphs et Phased Computation Graphs vivant avec plus de 10000 tâches en moins de 30 secondes. Le générateur est comparé à SDF3 et PREESM.Enfin, la contribution majeure de cette thèse propose une nouvelle méthode d'évaluation d'un mapping en utilisant les modèles Synchonous Dataflow Graphe et Cyclo-Static Dataflow Graphe. La méthode évalue efficacement la mémoire consommée par les communications d'un dataflow mappé sur une architecture à mémoire distribuée. L'évaluation est déclinée en deux versions, la première garantit la vivacité alors que la seconde ajoute une contrainte de débit. La méthode d'évaluation est expérimentée avec des dataflow générés par Turbine et avec des applications réelles. / With the increasing use of smart-phones, connected objects or automated vehicles, embedded systems have become ubiquitous in our living environment. These systems are often highly constrained in terms of power consumption and size. They are more and more implemented with many-core processor array that allow, rapid design to meet stringent real-time constraints while operating at relatively low frequency, with reduced power consumption.Running an application on a processor array requires dispatching its tasks on the processors in order to meet capacity and performance constraints. This mapping problem is known to be NP-complete.The contributions of this thesis are threefold:First we extend important notions from the Cyclo-Static Dataflow Graph to the Phased Computation Graph model and two equivalent sufficient conditions of liveness.Second, we present a random dataflow graph generator able to generate Synchonous Dataflow Graphs, Cyclo-Static Dataflow Graphs and Phased Computation Graphs. The Generator, is able to generate live dataflow of up to 10,000 tasks in less than 30 seconds. It is compared with SDF3 and PREESM.Third and most important, we propose a new method of evaluation of a mapping using the Synchonous Dataflow Graph and the Cyclo-Static Dataflow Graph models. The method evaluates efficiently the memory footprint of the communications of a dataflow graph mapped on a distributed architecture. The evaluation is declined in two versions, the first guarantees a live mapping while the second accounts for a constraint on throughput.The evaluation method is experimented on dataflow graphs from Turbine and on real-life applications. Synchronous Dataflow Graph Cyclo-Static Dataflow Graph Phased Computation Graph Génération aléatoire Évaluation d'un mapping Synchronous Dataflow Graph Cyclo-Static Dataflow Graph Phased Computation Graph Random generator Mapping evaluation Memory footprint 005.7

1

Page generated in 0.0627 seconds