Global ETD Search

41	Méthodologie d'identification et d'évitement des cycles de gel du processeur pour l'optimisation de la performance du logiciel sur le matériel / Avoidance and identification methodology of processor stall cycles for software-on-hardware performance optimization Njoyah ntafam, Perrin 20 April 2018 (has links) L’un des objectifs de la microélectronique est de concevoir et fabriquer des SoCs de petites tailles, à moindre coût et visant des marchés tel que l’internet des objets. À matériel fixe sur lequel l’on ne dispose d’aucune marge de manœuvre, l’un des challenges pour un développeur de logiciels embarqués est d’écrire son programme de manière à ce qu’à l’exécution, le logiciel développé puisse utiliser au mieux les capacités de ces SoCs. Cependant, ces programmes n’utilisent pas toujours correctement les capacités de traitement disponibles sur le SoC. L’estimation et l’optimisation de la performance du logiciel devient donc une activité cruciale. A l’exécution, ces programmes sont très souvent victimes de l’apparition de cycles de gel de processeur dus à l’absence de données en mémoire cache. Il existe plusieurs approches permettant d’éviter ces cycles de gel de processeur. Par l’exemple l’utilisation des options de compilation adéquates pour la génération du meilleur code exécutable possible. Cependant les compilateurs n’ont qu’une idée abstraite (sous forme de formules analytiques) de l’architecture du matériel sur lequel le logiciel s’exécutera. Une alternative est l’utilisation des processeurs « Out–Of–Order ». Mais ces processeurs sont très couteux en terme de coût de fabrication car nécessites une surface de silicium importante pour l’implantation de ces mécanismes. Dans cette thèse, nous proposons une méthode itérative basée sur les plateformes virtuelles précises au niveau du cycle qui permet d’identifier les instructions du programme à optimiser responsables à l’exécution, de l’apparition des cycles de gel de processeur dus à l’absence de données dans le cache L1. L’objectif est de fournir au développeur des indices sur les emplacements du code source de son programme en langage de haut niveau (C/C++ typiquement) qui sont responsables de ces gels. Pour chacune de ces instructions, nous fournissons leur contribution au rallongement du temps d’exécution totale du programme. Finalement nous estimons le gain potentiel maximal qu’il est possible d’obtenir si tous les cycles de gel identifiés sont évités en insérant manuellement dans le code source du programme à optimiser, des instructions de pré–chargement de données dirigé par le logiciel. / One of microelectronics purposes is to design and manufacture small-sized, low-cost SoCs targeting markets such as the Internet of Things. With fixed hardware on which there is no possible flexibility, one of the challenges for an embedded software developer is to write his program so that, at runtime, the software developed can make the best use of these SoC capabilities. However, these programs do not always properly use the available SoC processing capabilities. Software performance estimation and optimization is then a crucial activity. At runtime, these programs are very often victims of processor data stall cycles. There are several approaches to avoiding these processor data stall cycles. For example, using the appropriate compilation options to generate the best executable code. However, the compilers have only an abstract knowledge (as analytical formulas) of the hardware architecture on which the software will be executed. Another way of solving this issue is to use Out-Of- Order processors. But these processors are very expensive in terms of manufacturing cost because they require a large silicon surface for the implementation of the Out-Of-Order mechanism. In this thesis, we propose an iterative methodology based on cycle accurate virtual platforms, which helps identifying precisely instructions of the program which are responsible of the generation of processor data stall cycles. The goal is to provide the developer with clues on the source code lignes of his program’s in high level language (C/C++ typically) which are responsible of these stalls. For each instructions, we provide their contribution to lengthening of the total program execution time. Finally, we estimate the maximum potential gain that can be achieved if all identified stall cycles are avoided by manually inserting software preloading instructions into the source code of the program to optimize. Architecture matérielle Logiciel embarqué Performance logicielle Modélisation / Simulation Hierarchie mémoire Benchmarking Hardware architecture Embedded software SW performance Modeling / Simulation Memory hierachy Benchmarking 004
42	Design and implementation of a modular controller for robotic machines Atta-Konadu, Rodney Kwaku Chapman 25 September 2006 This research focused on the design and implementation of an Intelligent Modular Controller (IMC) architecture designed to be reconfigurable over a robust network. The design incorporates novel communication, hardware, and software architectures. This was motivated by current industrial needs for distributed control systems due to growing demand for less complexity, more processing power, flexibility, and greater fault tolerance. To this end, three main contributions were made. <p>Most distributed control architectures depend on multi-tier heterogeneous communication networks requiring linking devices and/or complex middleware. In this study, first, a communication architecture was proposed and implemented with a homogenous network employing the ubiquitous Ethernet for both real-time and non real-time communication. This was achieved by a producer-consumer coordination model for real-time data communication over a segmented network, and a client-server model for point-to-point transactions. The protocols deployed use a Time-Triggered (TT) approach to schedule real-time tasks on the network. Unlike other TT approaches, the scheduling mechanism does not need to be configured explicitly when controller nodes are added or removed. An implicit clock synchronization technique was also developed to complement the architecture. Second, a reconfigurable mechanism based on an auto-configuration protocol was developed. Modules on the network use this protocol to automatically detect themselves, establish communication, and negotiate for a desired configuration. Third, the research demonstrated hardware/software co-design as a contribution to the growing discipline of mechatronics. The IMC consists of a motion controller board designed and prototyped in-house, and a Java microcontroller. An IMC is mapped to each machine/robot axis, and an additional IMC can be configured to serve as a real-time coordinator. The entire architecture was implemented in Java, thus reinforcing uniformity, simplicity, modularity, and openness. Evaluation results showed the potential of the flexible controller to meet medium to high performance machining requirements. software architecture hardware architecture communication architecture distributed control embedded systems real-time network robots Ethernet Zeroconf reconfigurable object-oriented Java modular
43	Design and implementation of a modular controller for robotic machines Atta-Konadu, Rodney Kwaku Chapman 25 September 2006 (has links) This research focused on the design and implementation of an Intelligent Modular Controller (IMC) architecture designed to be reconfigurable over a robust network. The design incorporates novel communication, hardware, and software architectures. This was motivated by current industrial needs for distributed control systems due to growing demand for less complexity, more processing power, flexibility, and greater fault tolerance. To this end, three main contributions were made. <p>Most distributed control architectures depend on multi-tier heterogeneous communication networks requiring linking devices and/or complex middleware. In this study, first, a communication architecture was proposed and implemented with a homogenous network employing the ubiquitous Ethernet for both real-time and non real-time communication. This was achieved by a producer-consumer coordination model for real-time data communication over a segmented network, and a client-server model for point-to-point transactions. The protocols deployed use a Time-Triggered (TT) approach to schedule real-time tasks on the network. Unlike other TT approaches, the scheduling mechanism does not need to be configured explicitly when controller nodes are added or removed. An implicit clock synchronization technique was also developed to complement the architecture. Second, a reconfigurable mechanism based on an auto-configuration protocol was developed. Modules on the network use this protocol to automatically detect themselves, establish communication, and negotiate for a desired configuration. Third, the research demonstrated hardware/software co-design as a contribution to the growing discipline of mechatronics. The IMC consists of a motion controller board designed and prototyped in-house, and a Java microcontroller. An IMC is mapped to each machine/robot axis, and an additional IMC can be configured to serve as a real-time coordinator. The entire architecture was implemented in Java, thus reinforcing uniformity, simplicity, modularity, and openness. Evaluation results showed the potential of the flexible controller to meet medium to high performance machining requirements. software architecture hardware architecture communication architecture distributed control embedded systems real-time network robots Ethernet Zeroconf reconfigurable object-oriented Java modular
44	UNE APPROCHE À COMPOSANT POUR L'ORCHESTRATION DE SERVICES À LARGE ÉCHELLE Legrand Contes, Virginie 15 December 2011 (has links) (PDF) Cette thèse s'intéresse à l'orchestration de services répartie, résultat (1) d'une approche explicite de découpage d'une orchestration en sous-orchestrations localisées sur des sites physiques distants à des fins de protection de données par exemple, ou (2) d'une approche constructive issue du regroupement d'orchestrations existantes potentiellement hétérogènes, afin de constituer une orchestration globale mais répartie. Les orchestrations de services reflètent des processus métiers, souvent de longue durée, et qui doivent donc pouvoir être adaptables dynamiquement à l'exécution. Cette thèse propose un support d'exécution pour des orchestrations réparties, hétérogènes, dynamiquement reconfigurables, et permettant une administration globale. Une orchestration de services peut être abordée selon ses deux dimensions : temporelle qui reflète l'enchainement des services dans le temps, spatiale qui reflète les services que l'orchestration a besoin d'invoquer afin de s'exécuter. Nous proposons ainsi un nouveau modèle à composants pour les applications orientées services, inspiré en partie de SCA et de SCA/BPEL, mais permettant de représenter ces deux dimensions. Notre approche se fonde sur un modèle de composants logiciels répartis et dynamiquement reconfigurables, et hérite donc des qualités de répartition et de reconfiguration dynamique. Nous décrivons une mise en oeuvre au dessus de l'implémentation du modèle "Grid Component Model" sur la plateforme de programmation répartie à objets actifs "ProActive". Nous validons notre approche expérimentalement via une application à services d'installation et d'administration d'un parc de passerelles basées sur OSGi. BPEL orchestrations décentralisées
45	Proposta de uma arquitetura de hardware em FPGA implementada para SLAM com multi-câmeras aplicada à robótica móvel / Proposal of an FPGA hardware architecture for SLAM using multi-cameras and applied to mobile robotics Vanderlei Bonato 30 January 2008 (has links) Este trabalho apresenta uma arquitetura de hardware, baseada em FPGA (Field-Programmable Gate Array) e com multi-câmeras, para o problema de localização e mapeamento simultâneos - SLAM (Simultaneous Localization And Mapping) aplicada a sistemas robóticos embarcados. A arquitetura é composta por módulos de hardware altamente especializados para a localização do robô e para geração do mapa do ambiente de navegação em tempo real com features extraídas de imagens obtidas diretamente de câmeras CMOS a uma velocidade de 30 frames por segundo. O sistema é totalmente embarcado em FPGA e apresenta desempenho superior em, pelo menos, uma ordem de magnitude em relaçãoo às implementações em software processadas por computadores pessoais de última geração. Esse desempenho deve-se à exploração do paralelismo em hardware junto com o processamento em pipeline e às otimizações realizadas nos algoritmos. As principais contribuições deste trabalho são as arquiteturas para o filtro de Kalman estendido - EKF (Extended Kalman Filter) e para a detecção de features baseada no algoritmo SIFT (Scale Invariant Feature Transform). A complexidade para a implementaçãoo deste trabalho pode ser considerada alta, uma vez que envolve uma grande quantidade de operações aritméticas e trigonométricas em ponto utuante e ponto fixo, um intenso processamento de imagens para extração de features e verificação de sua estabilidade e o desenvolvimento de um sistema de aquisição de imagens para quatro câmeras CMOS em tempo real. Adicionalmente, foram criadas interfaces de comunicação para o software e o hardware embarcados no FPGA e para o controle e leitura dos sensores do robô móvel. Além dos detalhes e resultados da implementação, neste trabalho são apresentados os conceitos básicos de mapeamento e o estado da arte dos algoritmos SLAM com visão monocular e estéreo / This work presents a hardware architecture for the Simultaneous Localization And Mapping (SLAM) problem applied to embedded robots. This architecture, which is based on FPGA and multi-cameras, is composed by highly specialized blocks for robot localization and feature-based map building in real time from images read directly from CMOS cameras at 30 frames per second. The system is completely embedded on an FPGA and its performance is at least one order of magnitude better than a high end PC-based implementation. This result is achieved by investigating the impact of several hardwareorientated optimizations on performance and by exploiting hardware parallelism along with pipeline processing. The main contributions of this work are the architectures for the Extended Kalman Filter (EKF) and for the feature detection system based on the SIFT (Scale Invariant Feature Transform). The complexity to implement this work can be considered high, as it involves a significant number of arithmetic and trigonometric operations in oating and fixed-point format, an intensive image processing for feature detection and stability checking, and the development of an image acquisition system from four CMOS cameras in real time. In addition, communication interfaces were created to integrate software and hardware embedded on FPGA and to control the mobile robot base and to read its sensors. Finally, besides the implementation details and the results, this work also presents basic concepts about mapping and state-of-the-art algorithms for SLAM with monocular and stereo vision. Arquitetura de hardware paralelo FPGA Robótica móvel Sistemas embarcados SLAM Visão computacional Computer vision Embedded systems FPGA Mobile robotics Parellel hardware architecture SLAM
46	Uma plataforma de hardware para processamento de imagem baseada na transformada imagem-floresta Cappabianco, Fabio Augusto Menocci 15 February 2006 (has links) Orientadores: Guido Costa Souza de Araujo, Alexandre Xavier Falcão / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-07T09:45:52Z (GMT). No. of bitstreams: 1 Cappabianco_FabioAugustoMenocci_M.pdf: 2472578 bytes, checksum: 8df546b29eccff4337413df4b5d9a7c3 (MD5) Previous issue date: 2006 / Resumo: Implementações de operadores de processamento de imagens em plataformas de hardware têm obtido ótimos resultados devido a sua atuação paralela em diversas regiões da imagem. Ao mesmo tempo, a IFT (Image Foresting Transform) tem provado ser uma técnica eficiente de reduzir problemas de processamento de imagens em um problema de floresta de caminhos de um grafo, cuja solução é obtida em tempo linear no o número de pixels. Este trabalho contém a implementação de uma plataforma, em hardware, chamada SIFT {Silicon Image Foresting Transform), que executa o algoritmo da IFT paralelamente. O modelo de processamento e armazenamento SIFT serve como base para outras arquiteturas de processamento de imagens e amplia o entendimento de alguns conceitos de mapas de predecessores e rótulos utilizados pela IFT. / Abstract: Great results had been achieved by the use of hardware platforms to implement image processing operators. This success was reached due to the use of multiple processors working parallel in several regions of the image. On the other hand, IFT (Image Foresting Transform), a software technique to reduce image processing problems into a graph path forest problem, performs image operations in linear time in the number of pixels in most of applications. The main goal of this work was to generate a hardware platform, that implements the an algorithm based on the IFT in a fast and efficient way. / Mestrado / Mestre em Ciência da Computação Processamento de imagens FPGA (Field Programmable Gate Array) Hardware - Arquitetura Interpretação de imagens Image processing Field programmable gate arrays Hardware architecture Picture interpretation
47	Connected component tree construction for embedded systems / Contruction d'arbre des composantes connexes pour les systèmes embarqués Matas, Petr 30 June 2014 (has links) L'objectif du travail présenté dans cette thèse est de proposer un avancement dans la construction des systèmes embarqués de traitement d'images numériques, flexibles et puissants. La proposition est d'explorer l'utilisation d'une représentation d'image particulière appelée « arbre des composantes connexes » (connected component tree – CCT) en tant que base pour la mise en œuvre de l'ensemble de la chaîne de traitement d'image. Cela est possible parce que la représentation par CCT est à la fois formelle et générale. De plus, les opérateurs déjà existants et basés sur CCT recouvrent tous les domaines de traitement d'image : du filtrage de base, passant par la segmentation jusqu'à la reconnaissance des objets. Une chaîne de traitement basée sur la représentation d'image par CCT est typiquement composée d'une cascade de transformations de CCT où chaque transformation représente un opérateur individuel. A la fin, une restitution d'image pour visualiser les résultats est nécessaire. Dans cette chaîne typique, c'est la construction du CCT qui représente la tâche nécessitant le plus de temps de calcul et de ressources matérielles. C'est pour cette raison que ce travail se concentre sur la problématique de la construction rapide de CCT. Dans ce manuscrit, nous introduisons le CCT et ses représentations possibles dans la mémoire de l'ordinateur. Nous présentons une partie de ses applications et analysons les algorithmes existants de sa construction. Par la suite, nous proposons un nouvel algorithme de construction parallèle de CCT qui produit le « parent point tree » représentation de CCT. L'algorithme est conçu pour les systèmes embarqués, ainsi notre effort vise la minimisation de la mémoire occupée. L'algorithme en lui-même se compose d'un grand nombre de tâches de la « construction » et de la « fusion ». Une tâche de construction construit le CCT d'une seule ligne d'image, donc d'un signal à une dimension. Les tâches de fusion construisent progressivement le CCT de l'ensemble. Pour optimiser la gestion des ressources de calcul, trois différentes stratégies d'ordonnancement des tâches sont développées et évaluées. Également, les performances des implantations de l'algorithme sont évaluées sur plusieurs ordinateurs parallèles. Un débit de 83 Mpx/s pour une accélération de 13,3 est réalisé sur une machine 16-core avec Opteron 885 processeurs. Les résultats obtenus nous ont encouragés pour procéder à une mise en œuvre d'une nouvelle implantation matérielle parallèle de l'algorithme. L'architecture proposée contient 16 blocs de base, chacun dédié à la transformation d'une partie de l'image et comprenant des unités de calcul et la mémoire. Un système spécial d'interconnexions est conçu pour permettre à certaines unités de calcul d'accéder à la mémoire partagée dans d'autres blocs de base. Ceci est nécessaire pour la fusion des CCT partiels. L'architecture a été implantée en VHDL et sa simulation fonctionnelle permet d'estimer une performance de 145 Mpx/s à fréquence d'horloge de 120 MHz / The aim of this work is to enable construction of embedded digital image processing systems, which are both flexible and powerful. The thesis proposal explores the possibility of using an image representation called connected component tree (CCT) as the basis for implementation of the entire image processing chain. This is possible, because the CCT is both simple and general, as CCT-based implementations of operators spanning from filtering to segmentation and recognition exist. A typical CCT-based image processing chain consists of CCT construction from an input image, a cascade of CCT transformations, which implement the individual operators, and image restitution, which generates the output image from the modified CCT. The most time-demanding step is the CCT construction and this work focuses on it. It introduces the CCT and its possible representations in computer memory, shows some of its applications and analyzes existing CCT construction algorithms. A new parallel CCT construction algorithm producing the parent point tree representation of the CCT is proposed. The algorithm is suitable for an embedded system implementation due to its low memory requirements. The algorithm consists of many building and merging tasks. A building task constructs the CCT of a single image line, which is treated as a one-dimensional signal. Merging tasks fuse the CCTs together. Three different task scheduling strategies are developed and evaluated. Performance of the algorithm is evaluated on multiple parallel computers. A throughput 83 Mpx/s at speedup 13.3 is achieved on a 16-core machine with Opteron 885 CPUs. Next, the new algorithm is further adapted for hardware implementation and implemented as a new parallel hardware architecture. The architecture contains 16 basic blocks, each dedicated to processing of an image partition and consisting of execution units and memory. A special interconnection switch is designed to allow some executions units to access memory in other basic blocks. The algorithm requires this for the final merging of the CCTs constructed by different basic blocks together. The architecture is implemented in VHDL and its functional simulation shows performance 145 Mpx/s at clock frequency 120 MHz Architecture matérielle Arbre des composantes connexes Traitement d'image Parent point tree Attributs Fpga Hardware architecture Connected component tree Image processing Parent point tree Attributes Fpga
48	Design and Control of a Two-Wheeled Robotic Walker da Silva, Airton R., Jr. 07 November 2014 (has links) This thesis presents the design, construction, and control of a two-wheeled inverted pendulum (TWIP) robotic walker prototype for assisting mobility-impaired users with balance and fall prevention. A conceptual model of the robotic walker is developed and used to illustrate the purpose of this study. A linearized mathematical model of the two-wheeled system is derived using Newtonian mechanics. A control strategy consisting of a decoupled LQR controller and three state variable controllers is developed to stabilize the platform and regulate its behavior with robust disturbance rejection performance. Simulation results reveal that the LQR controller is capable of stabilizing the platform and rejecting external disturbances while the state variable controllers simultaneously regulate the system’s position with smooth and minimum jerk control. A prototype for the two-wheeled system is fabricated and assembled followed by the implementation and tuning of the control algorithms responsible for stabilizing the prototype and regulating its position with optimal performance. Several experiments are conducted, confirming the ability of the decoupled LQR controller to robustly balance the platform while the state variable controllers regulate the platform’s position with smooth and minimum jerk control. Two-wheeled inverted pendulum linear quadratic regulator control hardware architecture circuit design software development prototype development Computer-Aided Engineering and Design Electro-Mechanical Systems
49	Design Space Exploration and Architecture Design for Inference and Training Deep Neural Networks Qi, Yangjie January 2021 (has links) No description available. Electrical Engineering Computer Engineering Artificial Intelligence deep neural network DNN computer architecture DNN accelerator design space exploration edge computing hardware architecture
50	Advances in Modelling, Animation and Rendering Vince, J.A., Earnshaw, Rae A. January 2002 (has links) No / This volume contains the papers presented at Computer Graphics International 2002, in July, at the University of Bradford, UK. These papers represent original research in computer graphics from around the world. Real-time computer animation Image-based rendering Non photo-realistic rendering Virtual reality Avatars Modelling Computational geometry Graphics hardware architecture Data visualisation Data compression

Search results