Global ETD Search

1	Energy efficient branch prediction Hicks, Michael Andrew January 2010 (has links) Energy efficiency is of the utmost importance in modern high-performance embedded processor design. As the number of transistors on a chip continues to increase each year, and processor logic becomes ever more complex, the dynamic switching power cost of running such processors increases. The continual progression in fabrication processes brings a reduction in the feature size of the transistor structures on chips with each new technology generation. This reduction in size increases the significance of leakage power (a constant drain that is proportional to the number of transistors). Particularly in embedded devices, the proportion of an electronic product’s power budget accounted for by the CPU is significant (often as much as 50%). Dynamic branch prediction is a hardware mechanism used to forecast the direction, and target address, of branch instructions. This is essential to high performance pipelined and superscalar processors, where the direction and target of branches is not computed until several stages into the pipeline. Accurate branch prediction also acts to increase energy efficiency by reducing the amount of time spent executing mis-speculated instructions. ‘Stalling’ is no longer a sensible option when the significance of static power dissipation is considered. Dynamic branch prediction logic typically accounts for over 10% of a processor’s global power dissipation, making it an obvious target for energy optimisation. Previous approaches at increasing the energy efficiency of dynamic branch prediction logic has focused on either fully dynamic or fully static techniques. Dynamic techniques include the introduction of a new cache-like structure that can decide whether branch prediction logic should be accessed for a given branch, and static techniques tend to focus on scheduling around branch instructions so that a prediction is not needed (or the branch is removed completely). This dissertation explores a method of combining static techniques and profiling information with simple hardware support in order to reduce the number of accesses made to a branch predictor. The local delay region is used on unconditional absolute branches to avoid prediction, and, for most other branches, Adaptive Branch Bias Measurement (through profiling) is used to assign a static prediction that is as accurate as a dynamic prediction for that branch. This information is represented as two hint-bits in branch instructions, and then interpreted by simple hardware logic that bypasses both the lookup and update phases for appropriate branches. The global processor power saving that can be achieved by this Combined Algorithm is around 6% on the experimental architectures shown. These architectures are based upon real contemporary embedded architecture specifications. The introduction of the Combined Algorithm also significantly reduces the execution time of programs on Multiple Instruction Issue processors. This is attributed to the increase achieved in global prediction accuracy. 621.31
2	Exploration of non-volatile magnetic memory for processor architecture / Exploration d'architecture de processeur à technologie mémoire non volatile MRAM Senni, Sophiane 14 December 2015 (has links) De par la réduction continuelle des dimensions du transistor CMOS, concevoir des systèmes sur puce (SoC) à la fois très denses et énergétiquement efficients devient un réel défi. Concernant la densité, réduire la dimension du transistor CMOS est sujet à de fortes contraintes de fabrication tandis que le coût ne cesse d'augmenter. Concernant l'aspect énergétique, une augmentation importante de la puissance dissipée par unité de surface frêne l'évolution en performance. Ceci est essentiellement dû à l'augmentation du courant de fuite dans les transistors CMOS, entraînant une montée de la consommation d'énergie statique. En observant les SoCs actuels, les mémoires embarquées volatiles tels que la SRAM et la DRAM occupent de plus en plus de surface silicium. C'est la raison pour laquelle une partie significative de la puissance totale consommée provient des composants mémoires. Ces deux dernières décennies, de nouvelles mémoires non volatiles sont apparues possédant des caractéristiques pouvant aider à résoudre les problèmes des SoCs actuels. Parmi elles, la MRAM est une candidate à fort potentiel car elle permet à la fois une forte densité d'intégration et une consommation d'énergie statique quasi nulle, tout en montrant des performances comparables à la SRAM et à la DRAM. De plus, la MRAM a la capacité d'être non volatile. Ceci est particulièrement intéressant pour l'ajout de nouvelles fonctionnalités afin d'améliorer l'efficacité énergétique ainsi que la fiabilité. Ce travail de thèse a permis de mener une exploration en surface, performance et consommation énergétique de l'intégration de la MRAM au sein de la hiérarchie mémoire d'un processeur. Une première exploration fine a été réalisée au niveau mémoire cache pour des architectures multicoeurs. Une seconde étude a permis d'évaluer la possibilité d'intégrer la MRAM au niveau registre pour la conception d'un processeur non volatile. Dans le cadre d'applications des objets connectés, de nouvelles fonctionnalités ainsi que les intérêts apportés par la non volatilité ont été étudiés et évalués. / With the downscaling of the complementary metal-oxide semiconductor (CMOS) technology,designing dense and energy-efficient systems-on-chip (SoC) is becoming a realchallenge. Concerning the density, reducing the CMOS transistor size faces up to manufacturingconstraints while the cost increases exponentially. Regarding the energy, a significantincrease of the power density and dissipation obstructs further improvement inperformance. This issue is mainly due to the growth of the leakage current of the CMOStransistors, which leads to an increase of the static energy consumption. Observing currentSoCs, more and more area is occupied by embedded volatile memories, such as staticrandom access memory (SRAM) and dynamic random access memory (DRAM). As a result,a significant proportion of total power is spent into memory systems. In the past twodecades, alternative memory technologies have emerged with attractive characteristics tomitigate the aforementioned issues. Among these technologies, magnetic random accessmemory (MRAM) is a promising candidate as it combines simultaneously high densityand very low static power consumption while its performance is competitive comparedto SRAM and DRAM. Moreover, MRAM is non-volatile. This capability, if present inembedded memories, has the potential to add new features to SoCs to enhance energyefficiency and reliability. In this thesis, an area, performance and energy exploration ofembedding the MRAM technology in the memory hierarchy of a processor architectureis investigated. A first fine-grain exploration was made at cache level for multi-core architectures.A second study evaluated the possibility to design a non-volatile processorintegrating MRAM at register level. Within the context of internet of things, new featuresand the benefits brought by the non-volatility were investigated. Mram Processeur embarqué Memory hierarchy Mram Embedded processor Memory hierarchy
3	Embedded early vision techniques for efficient background modeling and midground detection Valentine, Brian Evans 26 March 2010 (has links) An automated vision system performs critical tasks in video surveillance, while decreasing costs and increasing efficiency. It can provide high quality scene monitoring without the limitations of human distraction and fatigue. Advances in embedded processors, wireless networks, and imager technology have enabled computer vision systems to be deployed pervasively in stationary surveillance monitors, hand-held devices, and vehicular sensors. However, the size, weight, power, and cost requirements of these platforms present a great challenge in developing real-time systems. This dissertation explores the development of background modeling algorithms for surveillance on embedded platforms. Our contributions are as follows: - An efficient pixel-based adaptive background model, called multimodal mean, which produces results comparable to the widely used mixture of Gaussians multimodal approach, at a much reduced computational cost and greater control of occluded object persistence. - A novel and efficient chromatic clustering-based background model for embedded vision platforms that leverages the color uniformity of large, permanent background objects to yield significant speedups in execution time. - A multi-scale temporal model for midground analysis which provides a means to "tune-in" to changes in the scene beyond the standard background/foreground framework, based on user-defined temporal constraints. Multimodal mean reduces instruction complexity with the use of fixed integer arithmetic and periodic long-term adaptation that occurs once every d frames. When combined with fixed thresholding, it performs 6.2 times faster than the mixture of Gaussians method while using 18% less storage. Furthermore, fixed thresholding compares favorably to standard deviation thresholding with a percentage difference in error less than five percent when used on scenes with stable lighting conditions and modest multimodal activity. The chromatic clustering-based approach to optimized background modeling takes advantage of the color distributions in large permanent background objects, such as a road, building, or sidewalk, to speedup execution time. It abstracts their colors to a small color palette and suppresses their adaptation during processing. When run on a representative embedded platform it reduces storage usage by 58% and increases runtime execution by 45%. Multiscale temporal modeling for midground analysis presents a unified approach for scene analysis that can be applied to several application domains. It extends scene analysis from the standard background/foreground framework to one that includes a temporal midground object saliency window that is defined by the user. When applied to stationary object detection, the midground model provides accurate results at low sampling frame rates (~ 1 fps) while using only 18 Mbytes of storage and 15 Mops/sec processing throughput. Embedded processor Computer vision Computer vision Video surveillance
4	SCIL processor : a common intermediate language processor for embedded systems Zhou, Tongyao January 2008 (has links) Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal. Embedded processor Softcore CIL SCIL processor Embedded system .Net language Processeur embarqué Softcore CIL SCIL processeur Système embarqué .Net langage
5	SCIL processor : a common intermediate language processor for embedded systems Zhou, Tongyao January 2008 (has links) Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal Embedded processor Softcore CIL SCIL processor Embedded system .Net language Processeur embarqué Softcore CIL SCIL processeur Système embarqué .Net langage
6	Real Time Traffic Sign Recognition System On Fpga Irmak, Hasan 01 September 2010 (has links) (PDF) In this thesis, a new algorithm is proposed for the recognition of triangular, circular and rectangular traffic signs and it is implemented on an FPGA platform. The system can recognize 32 different traffic signs with high recognition accuracy. In the proposed method, first the image is segmented into red and blue regions, and according to the area of the each segment, the dominant color is decided. Then, Laplacian of Gaussian (LoG) based edge detection is applied to the segmented image which is followed by Hough Transform for shape extraction. Then, recognition based on Informative Pixel Percentage (IPP) matching is executed on the extracted shapes. The Traffic Sign Recognition (TSR) system is implemented on Virtex 5 FX70T FPGA, which has an embedded PPC440 processor. Some modules of TSR algorithm are designed in the FPGA logic while remaining modules are designed in the PPC440 processor. Work division between FPGA and PPC440 is carried out considering their capabilities and shortcomings of FPGA and processor. Benefits of using an FPGA with an embedded processor are exploited to optimize the system.
7	Soft error analysis with and without operating system Casagrande, Luiz Gustavo January 2016 (has links) A complexidade dos sistemas integrados em chips bem como a arquitetura de processadores comerciais vem crescendo dramaticamente nos últimos anos. Com isto, a dificuldade de avaliarmos a suscetibilidade às falhas em decorrência da incidência de partículas espaciais carregadas nestes dispositivos cresce com a mesma taxa. Este trabalho apresenta uma análise comparativa da susceptibilidade à erros de software em um microprocessador embarcado ARM Cortex-A9 single core de larga escala comercial, amplamente utilizado em aplicações críticas, executando um conjunto de 11 aplicações desenvolvidas para um ambiente bare metal e para o sistema operacional Linux. A análise de soft errors é executada por injeção de falhas na plataforma de simulação OVPSim juntamente com o injetor OVPSim-FIM, capaz de sortear o momento e local de injeção de uma falha. A campanha de injeção de falhas reproduz milhares de bit-flips no banco de registradores do microprocessador durante a execução do conjunto de benchmarks que possuem um comportamento de código diverso, desde dependência de fluxo de controle até aplicações intensivas em dados. O método de análise consiste em comparar execuções da aplicação onde falhas foram injetadas com uma execução livre de falhas. Os resultados apresentam a taxa de falhas que são classificadas em: mascaradas (UNACE), travamento ou perda de controle de fluxo (HANG) e erro nos resultados (SDC). Adicionalmente, os erros são classificados por registradores, separando erros latentes por sua localização nos resultados e por exceções detectadas pelo sistema operacional, provendo novas possibilidades de análise para um processador desta escala. O método proposto e os resultados obtidos podem ajudar a orientar desenvolvedores de software na escolha de diferentes arquiteturas de código, a fim de aprimorar a tolerância à falhas do sistema embarcado como um todo. / The complexity of integrated system on-chips as well as commercial processor’s architecture has increased dramatically in recent years. Thus, the effort for assessing the susceptibility to faults due to the incidence of spatial charged particles in these devices has growth at the same rate. This work presents a comparative analysis of soft errors susceptibility in the commercial large-scale embedded microprocessor ARM Cortex-A9 single core, widely used in critical applications, performing a set of 11 applications developed for a bare metal environment and the Linux operating system. The soft errors analysis is performed by fault injection in OVPSim simulation platform along with the OVPSim-FIM fault injector, able to randomly select the time and place to inject the fault. The fault injection campaign reproduces thousands of bit-flips in the microprocessor register file during the execution of the benchmarks set, with a diverse code behavior ranging from control flow dependency to data intensive applications. The analysis method is based on comparing applications executions where faults were injected with a fault-free implementation. The results show the error rate classified by their effect as: masked (UNACE), crash or loss of control flow (HANG) and silent data corruption (SDC); and by register locations. By separating latent errors by its location in the results and exceptions detected by the operating system, one can provide new better observability for a large-scale processor. The proposed method and the results can guide software developers in choosing different code architectures in order to improve the fault tolerance of the embedded system as a whole. Microeletrônica Sistemas operacionais Injecao : Falhas Open virtual platform (OVP) Soft error ARM cortex-A9 Bare metal Linux operating system Embedded processor
8	Soft error analysis with and without operating system Casagrande, Luiz Gustavo January 2016 (has links) A complexidade dos sistemas integrados em chips bem como a arquitetura de processadores comerciais vem crescendo dramaticamente nos últimos anos. Com isto, a dificuldade de avaliarmos a suscetibilidade às falhas em decorrência da incidência de partículas espaciais carregadas nestes dispositivos cresce com a mesma taxa. Este trabalho apresenta uma análise comparativa da susceptibilidade à erros de software em um microprocessador embarcado ARM Cortex-A9 single core de larga escala comercial, amplamente utilizado em aplicações críticas, executando um conjunto de 11 aplicações desenvolvidas para um ambiente bare metal e para o sistema operacional Linux. A análise de soft errors é executada por injeção de falhas na plataforma de simulação OVPSim juntamente com o injetor OVPSim-FIM, capaz de sortear o momento e local de injeção de uma falha. A campanha de injeção de falhas reproduz milhares de bit-flips no banco de registradores do microprocessador durante a execução do conjunto de benchmarks que possuem um comportamento de código diverso, desde dependência de fluxo de controle até aplicações intensivas em dados. O método de análise consiste em comparar execuções da aplicação onde falhas foram injetadas com uma execução livre de falhas. Os resultados apresentam a taxa de falhas que são classificadas em: mascaradas (UNACE), travamento ou perda de controle de fluxo (HANG) e erro nos resultados (SDC). Adicionalmente, os erros são classificados por registradores, separando erros latentes por sua localização nos resultados e por exceções detectadas pelo sistema operacional, provendo novas possibilidades de análise para um processador desta escala. O método proposto e os resultados obtidos podem ajudar a orientar desenvolvedores de software na escolha de diferentes arquiteturas de código, a fim de aprimorar a tolerância à falhas do sistema embarcado como um todo. / The complexity of integrated system on-chips as well as commercial processor’s architecture has increased dramatically in recent years. Thus, the effort for assessing the susceptibility to faults due to the incidence of spatial charged particles in these devices has growth at the same rate. This work presents a comparative analysis of soft errors susceptibility in the commercial large-scale embedded microprocessor ARM Cortex-A9 single core, widely used in critical applications, performing a set of 11 applications developed for a bare metal environment and the Linux operating system. The soft errors analysis is performed by fault injection in OVPSim simulation platform along with the OVPSim-FIM fault injector, able to randomly select the time and place to inject the fault. The fault injection campaign reproduces thousands of bit-flips in the microprocessor register file during the execution of the benchmarks set, with a diverse code behavior ranging from control flow dependency to data intensive applications. The analysis method is based on comparing applications executions where faults were injected with a fault-free implementation. The results show the error rate classified by their effect as: masked (UNACE), crash or loss of control flow (HANG) and silent data corruption (SDC); and by register locations. By separating latent errors by its location in the results and exceptions detected by the operating system, one can provide new better observability for a large-scale processor. The proposed method and the results can guide software developers in choosing different code architectures in order to improve the fault tolerance of the embedded system as a whole. Microeletrônica Sistemas operacionais Injecao : Falhas Open virtual platform (OVP) Soft error ARM cortex-A9 Bare metal Linux operating system Embedded processor
9	Soft error analysis with and without operating system Casagrande, Luiz Gustavo January 2016 (has links) A complexidade dos sistemas integrados em chips bem como a arquitetura de processadores comerciais vem crescendo dramaticamente nos últimos anos. Com isto, a dificuldade de avaliarmos a suscetibilidade às falhas em decorrência da incidência de partículas espaciais carregadas nestes dispositivos cresce com a mesma taxa. Este trabalho apresenta uma análise comparativa da susceptibilidade à erros de software em um microprocessador embarcado ARM Cortex-A9 single core de larga escala comercial, amplamente utilizado em aplicações críticas, executando um conjunto de 11 aplicações desenvolvidas para um ambiente bare metal e para o sistema operacional Linux. A análise de soft errors é executada por injeção de falhas na plataforma de simulação OVPSim juntamente com o injetor OVPSim-FIM, capaz de sortear o momento e local de injeção de uma falha. A campanha de injeção de falhas reproduz milhares de bit-flips no banco de registradores do microprocessador durante a execução do conjunto de benchmarks que possuem um comportamento de código diverso, desde dependência de fluxo de controle até aplicações intensivas em dados. O método de análise consiste em comparar execuções da aplicação onde falhas foram injetadas com uma execução livre de falhas. Os resultados apresentam a taxa de falhas que são classificadas em: mascaradas (UNACE), travamento ou perda de controle de fluxo (HANG) e erro nos resultados (SDC). Adicionalmente, os erros são classificados por registradores, separando erros latentes por sua localização nos resultados e por exceções detectadas pelo sistema operacional, provendo novas possibilidades de análise para um processador desta escala. O método proposto e os resultados obtidos podem ajudar a orientar desenvolvedores de software na escolha de diferentes arquiteturas de código, a fim de aprimorar a tolerância à falhas do sistema embarcado como um todo. / The complexity of integrated system on-chips as well as commercial processor’s architecture has increased dramatically in recent years. Thus, the effort for assessing the susceptibility to faults due to the incidence of spatial charged particles in these devices has growth at the same rate. This work presents a comparative analysis of soft errors susceptibility in the commercial large-scale embedded microprocessor ARM Cortex-A9 single core, widely used in critical applications, performing a set of 11 applications developed for a bare metal environment and the Linux operating system. The soft errors analysis is performed by fault injection in OVPSim simulation platform along with the OVPSim-FIM fault injector, able to randomly select the time and place to inject the fault. The fault injection campaign reproduces thousands of bit-flips in the microprocessor register file during the execution of the benchmarks set, with a diverse code behavior ranging from control flow dependency to data intensive applications. The analysis method is based on comparing applications executions where faults were injected with a fault-free implementation. The results show the error rate classified by their effect as: masked (UNACE), crash or loss of control flow (HANG) and silent data corruption (SDC); and by register locations. By separating latent errors by its location in the results and exceptions detected by the operating system, one can provide new better observability for a large-scale processor. The proposed method and the results can guide software developers in choosing different code architectures in order to improve the fault tolerance of the embedded system as a whole. Microeletrônica Sistemas operacionais Injecao : Falhas Open virtual platform (OVP) Soft error ARM cortex-A9 Bare metal Linux operating system Embedded processor
10	Detecção e compressão de distúrbios elétricos baseadas em plataforma FPGA Kapisch, Eder Barboza 18 March 2015 (has links) Submitted by Renata Lopes (renatasil82@gmail.com) on 2017-05-11T18:00:15Z No. of bitstreams: 1 ederbarbozakapisch.pdf: 4847277 bytes, checksum: 139f0b67e25b637befdb231fd5402b98 (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-05-17T15:19:44Z (GMT) No. of bitstreams: 1 ederbarbozakapisch.pdf: 4847277 bytes, checksum: 139f0b67e25b637befdb231fd5402b98 (MD5) / Made available in DSpace on 2017-05-17T15:19:44Z (GMT). No. of bitstreams: 1 ederbarbozakapisch.pdf: 4847277 bytes, checksum: 139f0b67e25b637befdb231fd5402b98 (MD5) Previous issue date: 2015-03-18 / CNPq - Conselho Nacional de Desenvolvimento Científico e Tecnológico / A presente dissertação apresenta a implementação de um Sistema de Detecção e Compressão de Distúrbios Elétricos (SDCDE), com foco nas implementações baseadas em plataforma FPGA (Field-Programmable Gate Array). Inicialmente são abordados os algoritmos de compressão e detecção. Posteriormente são mostradas as sínteses na FPGA e um protótipo desenvolvido para testes. O sistema proposto é voltado para aplicações em Sistemas Elétricos de Potência (SEPs) e prevê a aquisição e o armazenamento dos distúrbios comumente encontrados nesse campo. A partir dos dados armazenados, é possível reconstruir inteiramente o sinal registrado, para possíveis análises de oscilográfia. O processo de compressão passa por três estágios: detecção de novidade, compressão com perdas, utilizando a Transformada Wavelet Discreta (DWT), e a Compressão em termos de bit. Esses três níveis de compressão permitem uma otimização do espaço de memória utilizado e garantem que longos períodos de registros possam ser armazenados em um cartão de memória. A abordagem das sínteses em FPGA visa avaliar, dentre outros fatores, o consumo de recursos de hardware utilizado, através da implementação de um processador embarcado, criado e idealizado para aplicações de Processamento Digital de Sinais (DSP). A partir do protótipo desenvolvido, alguns resultados de sínteses e estudos de casos com testes executados em ambientes reais, são apresentados. / This dissertation presents the implementation of a System of Detection and Compression of Electrical Disturbances (SDCDE), focusing on implementations based on FPGA platform (Field-Programmable Gate Array). Initially are discussed compression and detection algorithms. Subsequently the synthesis in FPGA and a prototype that was developed for testing are shown. The proposed system is aimed at applications in Electric Power Systems (SEPs) and provides for the acquisition and storage of the disturbances commonly found in this field. From the data stored, the recorded signal can be fully reconstructed for possible oscillographic analysis. The compression process involves three stages: novelty detection, lossy compression, using the Discrete Wavelet Transform (DWT), and a bit-level compression. These three levels of compression allow an optimization of used memory space and they ensure that long periods of records can be stored on a memory card. The approach of the synthesis on FPGA aims to evaluate, among other factors, the usage of hardware resources, through the implementation of an embedded processor, created and designed for digital signal processing applications. From the prototype developed, some results of synthesis and case studies with tests performed in real environments are presented. CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA Compressão de dados Detecção de distúrbios elétricos Transformada Wavelet Processador embarcado FPGA Data compression Detection of electrical disturbances Wavelet transform Embedded processor FPGA

Search results