Global ETD Search

361	Energy efficient branch prediction Hicks, Michael Andrew January 2010 (has links) Energy efficiency is of the utmost importance in modern high-performance embedded processor design. As the number of transistors on a chip continues to increase each year, and processor logic becomes ever more complex, the dynamic switching power cost of running such processors increases. The continual progression in fabrication processes brings a reduction in the feature size of the transistor structures on chips with each new technology generation. This reduction in size increases the significance of leakage power (a constant drain that is proportional to the number of transistors). Particularly in embedded devices, the proportion of an electronic product’s power budget accounted for by the CPU is significant (often as much as 50%). Dynamic branch prediction is a hardware mechanism used to forecast the direction, and target address, of branch instructions. This is essential to high performance pipelined and superscalar processors, where the direction and target of branches is not computed until several stages into the pipeline. Accurate branch prediction also acts to increase energy efficiency by reducing the amount of time spent executing mis-speculated instructions. ‘Stalling’ is no longer a sensible option when the significance of static power dissipation is considered. Dynamic branch prediction logic typically accounts for over 10% of a processor’s global power dissipation, making it an obvious target for energy optimisation. Previous approaches at increasing the energy efficiency of dynamic branch prediction logic has focused on either fully dynamic or fully static techniques. Dynamic techniques include the introduction of a new cache-like structure that can decide whether branch prediction logic should be accessed for a given branch, and static techniques tend to focus on scheduling around branch instructions so that a prediction is not needed (or the branch is removed completely). This dissertation explores a method of combining static techniques and profiling information with simple hardware support in order to reduce the number of accesses made to a branch predictor. The local delay region is used on unconditional absolute branches to avoid prediction, and, for most other branches, Adaptive Branch Bias Measurement (through profiling) is used to assign a static prediction that is as accurate as a dynamic prediction for that branch. This information is represented as two hint-bits in branch instructions, and then interpreted by simple hardware logic that bypasses both the lookup and update phases for appropriate branches. The global processor power saving that can be achieved by this Combined Algorithm is around 6% on the experimental architectures shown. These architectures are based upon real contemporary embedded architecture specifications. The introduction of the Combined Algorithm also significantly reduces the execution time of programs on Multiple Instruction Issue processors. This is attributed to the increase achieved in global prediction accuracy. 621.31
362	Migrering till Linux för inbyggda system : En förstudie gjord på företag Low VisionInternational Bergman, Johannes, Torsson, Markus January 2017 (has links) Användningen av Linux i inbyggda system fortsätter att öka för varje år. Öppen källkod och nya verktyg för utvecklandet av Linux för inbyggda system har inte bara gjort Linux till ett kostnadseffektivt val, utan även ett tidseffektivt val. Målet med den här undersökningen har varit att åt LVI undersöka en möjlig migration av operativsystem i deras inbäddade system från Windows XP Embedded till ett inbyggt Linuxbaserat operativsystem för ARM-processorer med stöd för OCR-behandling. Linux och öppen källkod till inbyggda system för med sig en hel del fördelar. Några av dessa inkluderar låg kostnad, full kontroll över ditt inbyggda system samt möjligheten att testa och utvärdera mjukvara helt gratis. För att komma fram till ett resultat har vi undersökt vilka alternativ som finns och om det finns stöd för de funktioner som LVI använder sig av. Resultatet av den här undersökningen är en redovisning av de val man står inför och vad som kan lämpa sig bäst för LVI. Vi har främst undersökt Yocto Project och Buildroot i denna undersökning och anser att Yocto Project är ett bra val för LVI. Två enklare applikationer har även skrivits där bildhantering och maskinläsning uppvisas. Applikationerna har utvecklats i C++ med hjälp av OpenCV och Tesseract-ocr. data datateknik dator linux os Embedded Systems Inbäddad systemteknik
363	Embedded Processor Selection/Performance Estimation using FPGA-based Profiling Obeidat, Fadi 26 July 2010 (has links) In embedded systems, modeling the performance of the candidate processor architectures is very important to enable the designer to estimate the capability of each architecture against the target application. Considering the large number of available embedded processors, the need has increased for building an infrastructure by which it is possible to estimate the performance of a given application on a given processor with a minimum of time and resources. This dissertation presents a framework that employs the softcore MicroBlaze processor as a reference architecture where FPGA-based profiling is implemented to extract the functional statistics that characterize the target application. Linear regression analysis is implemented for mapping the functional statistics of the target application to the performance of the candidate processor architecture. Hence, this approach does not require running the target application on each candidate processor; instead, it is run only on the reference processor which allows testing many processor architectures in very short time. Embedded Sysytems Performance Modeling FPGA-based Profiling Processor Selection Engineering
364	Design of a Small Form-Factor Flight Control System Ward, Garrett 28 April 2014 (has links) This work outlines a design for a small form-factor flight control system designed to fly in a wide variety of airframes. The system was designed with future expansion in mind while providing a complete, all-in-one solution to meet present needs. This system as presented meets most needs while remaining relatively low cost. It has a completely integrated IMU solution as well as on- board GPS. It is capable of basic waypoint navigation. This solution was testing using software and hardware-in-the-loop simulation which proved its functionality. embedded systems flight control systems unmanned aerial vehicles Engineering
365	ENERGY EFFICIENT EMBEDDED SYSTEM DESIGN FOR MEDICAL CARE SYSTEM USING WIRELESS SENSOR NETWORK LI, QI 05 December 2008 (has links) Recent surveys on medical service systems show that the cost of patient monitoring has grown significantly. The widespread use of portable digital medical device makes it possible to provide a more comprehensive tracking of patient conditions. However, the development of a full scale, distributed health monitoring system is much delayed due to the lack of efficient wireless communication in a large distributed network. This becomes a challenging research topic which is to find a way to provide accurate and real time patient information to medical experts in a fast, efficient and cost effective fashion. This paper proposes a novel solution on building a system which links patients and doctors together using embedded system technology and wireless sensor network. The content presented in this thesis introduces the design and implement of such a system. embedded system wireless sensor network Computer Sciences Physical Sciences and Mathematics
366	Security-driven Design Optimization of Mixed Cryptographic Implementations in Distributed, Reconfigurable, and Heterogeneous Embedded Systems Nam, HyunSuk, Nam, HyunSuk January 2017 (has links) Distributed heterogeneous embedded systems are increasingly prevalent in numerous applications, including automotive, avionics, smart and connected cities, Internet of Things, etc. With pervasive network access within these systems, security is a critical design concern. This dissertation presents a modeling and optimization framework for distributed, reconfigurable, and heterogeneous embedded systems. Distributed embedded systems consist of numerous interconnected embedded devices, each composed of different computing resources, such single core processors, asymmetric multicore processors, field-programmable gate arrays (FPGAs), and various combinations thereof. A dataflow-based modeling framework for streaming applications integrates models for computational latency, mixed cryptographic implementations for inter-task and intra task communication, security levels, communication latency, and power consumption. For the security model, we present a level-based modeling of cryptographic algorithms using mixed cryptographic implementations, including both symmetric and asymmetric implementations. We utilize a multi-objective genetic optimization algorithm to optimize security and energy consumption subject to latency and minimum security level constraints. The presented methodology is evaluated using a video-based object detection and tracking application and several synthetic benchmarks representing various application types. Experimental results for these design and optimization frameworks demonstrate the benefits of mixed cryptographic algorithm security model compared to single cryptographic algorithm alternatives. We further consider several distributed heterogeneous embedded systems architectures. Codesign Distributed Embedded Systems Genetic Algorithm Mixed Cryptography
367	GPU-aware Component-based Development for Embedded Systems Campeanu, Gabriel January 2016 (has links) Nowadays, more and more embedded systems are equipped with e.g., various sensors that produce large amount of data. One of the challenges of traditional (CPU-based) embedded systems is to process this considerable amount of data such that it produces the appropriate performance level demanded by embedded applications. A solution comes from the usage of a specialized processing unit such as Graphics Processing Unit (GPU). A GPU can process large amount of data thanks to its parallel processing architecture, delivering an im- proved performance outcome compared to CPU. A characteristic of the GPU is that it cannot work alone; the CPU must trigger all its activities. Today, taking advantage of the latest technology breakthrough, we can benefit of the GPU technology in the context of embedded systems by using heterogeneous CPU-GPU embedded systems. Component-based development has demonstrated to be a promising methology in handling software complexity. Through component models, which describe the component specification and their interaction, the methodology has been successfully used in embedded system domain. The existing component models, designed to handle CPU-based embedded systems, face challenges in developing embedded systems with GPU capabilities. For example, current so- lutions realize the communication between components with GPU capabilities via the RAM system. This introduces an undesired overhead that negatively affects the system performance. This Licentiate presents methods and techniques that address the component- based development of embedded systems with GPU capabilities. More concretely, we provide means for component models to explicitly address the GPU-aware component-based development by using specific artifacts. For example, the overhead introduced by the traditional way of communicating via RAM is reduced by inserting automatically generated adapters that facilitate a direct component communication over the GPU memory. Another contribution of the thesis is a component allocation method over the system hardware. The proposed solution offers alternative options in opti- mizing the total system performance and balancing various system properties (e.g., memory usage, GPU load). For the validation part of our proposed solutions, we use an underwater robot demonstrator equipped with GPU hardware. / Ralf 3 GPU component-based development embedded systems GPU development
368	Estimation of Orientation in a Dual-Tag Ultra Wideband Indoor Positioning System Johansson, Oscar, Wassénius, Lucas January 2019 (has links) In this report the feasibility of using a dual-tag setup in an indoor positioning system was investigated. The reason for the dual-tag setup was to be able to estimate both position and orientation. The system was designed using UWB-technology, with an time of flight trilateration algorithm to calculate the position. The orientation was then estimated from the relative position between the two tags. The system was tested both with stationary tags, but also with the tags moving along two paths. These tests were conducted for different separation distance between the tags, namely 20 cm, 30 cm and 40 cm. The result was that the mean position error for stationary tags was less than 8 cm for all separations and the mean orientation error was less than 3$^\circ$ for all separations. For the moving tag tests a decrease of the error in orientation of about 30 \% could be observed for a separation of 30 and 40 cm compared to 20 cm. However this difference is small in absolute values so more tests are needed to draw any conclusion about whether 30 and 40 cm tag separation performs better than 20 cm tag separation. The performance of the system could also be increased further by optimizing the anchor placement as well as the calibration of the antenna delays of the UWB-modules. UWB Ultra Wideband Indoor Positioning Embedded Systems Inbäddad systemteknik
369	Acceleration of deep convolutional neural networks on multiprocessor system-on-chip Reiche Myrgård, Martin January 2019 (has links) In this master thesis some of the most promising existing frameworks and implementations of deep convolutional neural networks on multiprocessor system-on-chips (MPSoCs) are researched and evaluated. The thesis’ starting point was a previousthesis which evaluated possible deep learning models and frameworks for object detection on infra-red images conducted in the spring of 2018. In order to fit an existing deep convolutional neural network (DCNN) on a Multiple-Processor-System on Chip it needs modifications. Most DCNNs are trained on Graphic processing units (GPUs) with a bit width of 32 bit. This is not optimal for a platform with hard memory constraints such as the MPSoC which means it needs to be shortened. The optimal bit width depends on the network structure and requirements in terms of throughput and accuracy although most of the currently available object detection networks drop significantly when reduced below 6 bits width. After reducing the bit width, the network needs to be quantized and pruned for better memory usage. After quantization it can be implemented using one of many existing frameworks. This thesis focuses on Xilinx CHaiDNN and DNNWeaver V2 though it touches a little on revision, HLS4ML and DNNWeaver V1 as well. In conclusion the implementation of two network models on Xilinx Zynq UltraScale+ ZCU102 using CHaiDNN were evaluated. Conversion of existing network were done and quantization tested though not fully working. The results were a two to six times more power efficient implementation in comparison to GPU inference. Neurala nätverk MPSoC FPGA DCNN Embedded Systems Inbäddad systemteknik
370	Contribution à l’optimisation de densité de code pour Processeur Embarqué / Contribution to the optimization of Embedded processor code density Fahmi, Youssef 13 June 2013 (has links) Les systèmes embarqués prennent une place de plus en plus grande dans le marché actuelavec des dispositifs basée sur des systèmes on-chip. Ces systèmes embarqués ont descontraintes très fortes concernant leurs coût, taille, consommation, fiabilité et dimensions.Dans ce contexte la densité de code d'un processeur devient un critère important.Dans cette thèse l'idée était de prendre un processeur RISC(l'APS3 de la société Cortus)qui a de bonne performance pour le monde embarqué et d'augmenter sa densité de code.Plusieurs méthodes ont été testé :– compression à base de Huffman.– compression à base de dictionnaire.– modification du jeu d'instructions.Les méthodes de compression ont montrée leur limites dans notre cas car soit ellesn'étaient pas compatible avec nos objectifs , soit elles offraient un gain pas assez importantcomparé aux surplus en terme de taille et de cycle en plus lors de l'exécution. Ce qui nousa poussé vers la modification du jeu d'instructions.Le résultat obtenu est une augmentation de la taille du code de 25% dans la phase derecherche et de 20.8% dans la version finale du processeur car il aura fallu faire un compromispour garder une petite taille et de bonnes performances.L'APS3CD est le résultat de cette thèse. il a une surface de 49605m2, une fréquencemaximale de 444 MHZ, un score de 2.16 DMIPS/MHZ et une consommation de12 W/MHZ(UMC90). il offre 20.8% de gain par rapport à l'APS3 et 40% par rapport aucortex-m3 (avec gcc) qui est une référence en terme de densité de code dans le marché.Toutefois le gain obtenu peut être augmente en travaillant sur le compilateur car lecompilateur actuel (gcc) n'utilise pas pleinement les instructions complexes ajoutés (dansquelque cas). Une continuation possible serait de travailler sur un compilateur qui soitmeilleur que gcc qui à la base n'est pas destinée aux systèmes embarqué avec des demandesde densité de code. Un exemple est la différence de taille du code entre gcc etiar ou keil pour les processeurs ARM. / Since the market is moving toward portable devices with a one device System on-Chip(SoC), code density of a processor becomes an important criteria.The idea of this thesis was to improve the code density of the Cortus processor theAPS3, which is an embedded RISC processor with good performances.Several methods were tried :– Huffman compression.– Dictionnary based compression.– Instruction set modification.Compression methods have shown their limits in this case either because they werenot compatible with our goals or did not provided a gain large enough compared to surplusesin terms of size and cycle number when running. This prompted us to modifie theinstruction set.The result was 25% of code density improvement in the research phase and 20.8% ofcode density improvement in the final version of the processor because we had to keepgood perfomances and small size of the APS3.APS3CD is the result of this thesis. It has an area of 49605m2, a maximum frequencyof 444 MHZ, a score of 2.16 DMIPS/MHz and a consumption of 12W/MHZ(UMC90). itoffers 20.8% gain over the APS3 and 40% compared to the cortex-m3 (with gcc) which is arefrence in termof code density in the market.However, the gain can be increased by working on the compiler because the currentcompiler (gcc) does not fully utilize the complex instructions added (in some cases). Apossible continuation would be to work on a compiler better than gcc wich is not designedfor embedded systems applications with code density at the base. An example is the codesize difference between gcc and keil or iar for ARM processors. Processeur Embarqué Gnu Compression Processor Embedded Gnu Compression

Search results