891 |
Detecção e compressão de distúrbios elétricos baseadas em plataforma FPGAKapisch, Eder Barboza 18 March 2015 (has links)
Submitted by Renata Lopes (renatasil82@gmail.com) on 2017-05-11T18:00:15Z
No. of bitstreams: 1
ederbarbozakapisch.pdf: 4847277 bytes, checksum: 139f0b67e25b637befdb231fd5402b98 (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-05-17T15:19:44Z (GMT) No. of bitstreams: 1
ederbarbozakapisch.pdf: 4847277 bytes, checksum: 139f0b67e25b637befdb231fd5402b98 (MD5) / Made available in DSpace on 2017-05-17T15:19:44Z (GMT). No. of bitstreams: 1
ederbarbozakapisch.pdf: 4847277 bytes, checksum: 139f0b67e25b637befdb231fd5402b98 (MD5)
Previous issue date: 2015-03-18 / CNPq - Conselho Nacional de Desenvolvimento Científico e Tecnológico / A presente dissertação apresenta a implementação de um Sistema de Detecção e Compressão de Distúrbios Elétricos (SDCDE), com foco nas implementações baseadas em plataforma FPGA (Field-Programmable Gate Array). Inicialmente são abordados os algoritmos de compressão e detecção. Posteriormente são mostradas as sínteses na FPGA e um protótipo desenvolvido para testes. O sistema proposto é voltado para aplicações em Sistemas Elétricos de Potência (SEPs) e prevê a aquisição e o armazenamento dos distúrbios comumente encontrados nesse campo. A partir dos dados armazenados, é possível reconstruir inteiramente o sinal registrado, para possíveis análises de oscilográfia. O processo de compressão passa por três estágios: detecção de novidade, compressão com perdas, utilizando a Transformada Wavelet Discreta (DWT), e a Compressão em termos de bit. Esses três níveis de compressão permitem uma otimização do espaço de memória utilizado e garantem que longos períodos de registros possam ser armazenados em um cartão de memória. A abordagem das sínteses em FPGA visa avaliar, dentre outros fatores, o consumo de recursos de hardware utilizado, através da implementação de um processador embarcado, criado e idealizado para aplicações de Processamento Digital de Sinais (DSP). A partir do protótipo desenvolvido, alguns resultados de sínteses e estudos de casos com testes executados em ambientes reais, são apresentados. / This dissertation presents the implementation of a System of Detection and Compression of Electrical Disturbances (SDCDE), focusing on implementations based on FPGA platform (Field-Programmable Gate Array). Initially are discussed compression and detection algorithms. Subsequently the synthesis in FPGA and a prototype that was developed for testing are shown. The proposed system is aimed at applications in Electric Power Systems (SEPs) and provides for the acquisition and storage of the disturbances commonly found in this field. From the data stored, the recorded signal can be fully reconstructed for possible oscillographic analysis. The compression process involves three stages: novelty detection, lossy compression, using the Discrete Wavelet Transform (DWT), and a bit-level compression. These three levels of compression allow an optimization of used memory space and they ensure that long periods of records can be stored on a memory card. The approach of the synthesis on FPGA aims to evaluate, among other factors, the usage of hardware resources, through the implementation of an embedded processor, created and designed for digital signal processing applications. From the prototype developed, some results of synthesis and case studies with tests performed in real environments are presented.
|
892 |
Técnicas de detecção e implementação em FPGA de modulações QAM de ordem elevadaLemos, Gléverson Fabner Condé 12 September 2011 (has links)
Submitted by isabela.moljf@hotmail.com (isabela.moljf@hotmail.com) on 2017-05-30T12:08:23Z
No. of bitstreams: 1
gléversonfabnercondelemos.pdf: 2102819 bytes, checksum: e934ec8e8bf0daaaa39a52749b708828 (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-05-30T13:52:51Z (GMT) No. of bitstreams: 1
gléversonfabnercondelemos.pdf: 2102819 bytes, checksum: e934ec8e8bf0daaaa39a52749b708828 (MD5) / Made available in DSpace on 2017-05-30T13:52:51Z (GMT). No. of bitstreams: 1
gléversonfabnercondelemos.pdf: 2102819 bytes, checksum: e934ec8e8bf0daaaa39a52749b708828 (MD5)
Previous issue date: 2011-09-12 / A presente dissertação versa sobre técnicas de baixo custo para detecção, modulação e demodulação de constelações M-QAM (quadrature amplitude modulation) de ordem elevada, ou seja, M = 2n, n = {2,3, · · · ,16}. Al´em disso, s˜ao propostas constelações alternativas para M-QAM, M = 22n, n = {1,2, · · · ,8}, que buscam minimizar a PAPR (peak to average power ratio) quando um sistema OFDM (orthogonal frequency division multiplexing) ´e utilizado para a transmissão de dados. Uma implementação, de baixo
custo e em dispositivo FPGA (field programmable gate array), de um esquema de modulação constante e adaptativa para sistemas OFDM, quando a modulação é MQAM, M = 22n, n = {1,2, · · · ,8}, é descrita e analisada. O desempenho das técnicas de detecção propostas é avaliado através de simulações
computacionais quando o ruído é AWGN (additive white Gaussian noise) e AIGN (additive impulsive Gaussian noise). Os resultados em termos de BER × Eb/N0 indicam que as perdas de desempenho geradas com as técnicas propostas não são significativas e, portanto, tais técnicas são candidatas adequadas para a implementação de um sistema OFDM com elevada eficiência espectral. Os resultados computacionais revelam ainda que as propostas alternativas para constelações M-QAM reduzem a PAPR, mas, em contrapartida, degradam consideravelmente a BER. Finalmente, a análise da complexidade computacional das técnicas de detecção e demodulação, as quais foram implementadas em dispositivo FPGA, indica que há uma redução do custo computacional, ou seja, redução do uso de recursos de hardware do dispositivo FPGA quando tais técnicas são implementadas para a demodulação e detecção de símbolos M-QAM de ordem elevada. / This dissertation deals with low-cost techniques for detection, modulation and demodulation of high order M-QAM (quadrature amplitude modulation) constellations, i.e., M = 2n, n = {2,3, · · · ,16}. In addition, alternative constellations are proposed to M-QAM, M = 22n, n = {1,2, · · · ,8}, which seek to minimize the PAPR (peak to average power ratio) when an OFDM (orthogonal frequency division multiplexing) system
is used for data transmission. A low-cost implementation using a FPGA (field programmable gate array) device of a modulation scheme for constant and adaptive OFDM systems when the modulation is M-QAM, M = 22n, n = {1,2, · · · ,8}, is described and analyzed. The performance of the proposed detection techniques is evaluated through computer simulations when the noise is AWGN (additive white Gaussian noise) and AIGN (additive impulsive Gaussian noise). The results in terms of BER × Eb/N0 indicate
that the performance losses generated by the proposed techniques are not significant and, therefore, such techniques are appropriate candidates for the implementation of an OFDM system with high spectral efficiency. The computational results reveal that the alternative proposals for M-QAM constellations reduce the PAPR, but, considerably degrade the BER. Finally, the analysis of computational complexity of detection and demodulation techniques, which were implemented in a FPGA device, indicates that
there is a computational cost reduction, i.e., a reduction of resource usage of hardware device such as FPGA when these techniques are implemented for the demodulation and detection of high-order M-QAM symbols.
|
893 |
Implementação em FPGA de algoritmos de sincronismo para OFDM / FPGA implementation of synchronization algorithms for OFDMBarragán Guerrero, Diego Orlando, 1984- 23 August 2018 (has links)
Orientador: Luís Geraldo Pedroso Meloni / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação / Made available in DSpace on 2018-08-23T18:38:54Z (GMT). No. of bitstreams: 1
BarraganGuerrero_DiegoOrlando_M.pdf: 4412718 bytes, checksum: fd7daf7712cace2d176bf47e3bd792dd (MD5)
Previous issue date: 2013 / Resumo: Os sistemas OFDM são intrinsecamente sensíveis a erros de sincronismo de tempo e frequência. O sincronismo é uma etapa fundamental para a correta recepção de pacotes. Esta dissertação descreve como se implementar vários algoritmos de sincronismo para OFDM em FPGA usando os símbolos do preâmbulo definidos no padrão IEEE 802.11a. Além disso, foi implementado o algoritmo CORDIC (necessário para a etapa de estimação e compensação de desvio de portadora) em modo rotacional e vetorial para um sistema coordenado circular, comparando o desempenho de várias arquiteturas com o intuito de otimizar a frequência de operação e relacionar o erro do resultado com o número de iterações realizadas. Conforme mostrado nos resultados, são obtidas estimativas com boas aproximações para desvios de 0, 100 e 200 kHz. Os resultados obtidos constituem um instrumento importante para a melhor escolha de implementação de algoritmos de sincronismo em FPGA. Verificou-se que os diferentes algoritmos não apenas possuem valores de variância distintos, mas também frequências de operação diferentes e consumo de recursos da FPGA. Ao longo do projeto foi considerado um modelo de canal tapped-delay / Abstract: OFDM systems are intrinsically sensitive to errors of synchronization in time and frequency. Synchronization is a key step for correct packet reception. This thesis describes how to implement in FPGA several synchronization algorithms for OFDM using the symbols of the preamble defined in IEEE 802.11a. In addition, the CORDIC algorithm is implemented (step required for carrier frequency offset estimation and compensation) in rotational and vectoring mode for a circular coordinate system, comparing the performance of various architectures in order to optimize the operating frequency and relate the error of the result with the number of iterations performed. As shown in the results, estimates are obtained with good approximations for offsets of 0, 100 and 200 kHz. The obtained results are an important instrument for the best choice of synchronization algorithm for implementation in FPGA. It was found that the different algorithms have not only different values of variance, but also different operating frequency and consumption of the FPGA resources. Throughout the project a tapped-delay channel model was considered in the analysis / Mestrado / Telecomunicações e Telemática / Mestre em Engenharia Elétrica
|
894 |
Connected component tree construction for embedded systems / Contruction d'arbre des composantes connexes pour les systèmes embarquésMatas, Petr 30 June 2014 (has links)
L'objectif du travail présenté dans cette thèse est de proposer un avancement dans la construction des systèmes embarqués de traitement d'images numériques, flexibles et puissants. La proposition est d'explorer l'utilisation d'une représentation d'image particulière appelée « arbre des composantes connexes » (connected component tree – CCT) en tant que base pour la mise en œuvre de l'ensemble de la chaîne de traitement d'image. Cela est possible parce que la représentation par CCT est à la fois formelle et générale. De plus, les opérateurs déjà existants et basés sur CCT recouvrent tous les domaines de traitement d'image : du filtrage de base, passant par la segmentation jusqu'à la reconnaissance des objets. Une chaîne de traitement basée sur la représentation d'image par CCT est typiquement composée d'une cascade de transformations de CCT où chaque transformation représente un opérateur individuel. A la fin, une restitution d'image pour visualiser les résultats est nécessaire. Dans cette chaîne typique, c'est la construction du CCT qui représente la tâche nécessitant le plus de temps de calcul et de ressources matérielles. C'est pour cette raison que ce travail se concentre sur la problématique de la construction rapide de CCT. Dans ce manuscrit, nous introduisons le CCT et ses représentations possibles dans la mémoire de l'ordinateur. Nous présentons une partie de ses applications et analysons les algorithmes existants de sa construction. Par la suite, nous proposons un nouvel algorithme de construction parallèle de CCT qui produit le « parent point tree » représentation de CCT. L'algorithme est conçu pour les systèmes embarqués, ainsi notre effort vise la minimisation de la mémoire occupée. L'algorithme en lui-même se compose d'un grand nombre de tâches de la « construction » et de la « fusion ». Une tâche de construction construit le CCT d'une seule ligne d'image, donc d'un signal à une dimension. Les tâches de fusion construisent progressivement le CCT de l'ensemble. Pour optimiser la gestion des ressources de calcul, trois différentes stratégies d'ordonnancement des tâches sont développées et évaluées. Également, les performances des implantations de l'algorithme sont évaluées sur plusieurs ordinateurs parallèles. Un débit de 83 Mpx/s pour une accélération de 13,3 est réalisé sur une machine 16-core avec Opteron 885 processeurs. Les résultats obtenus nous ont encouragés pour procéder à une mise en œuvre d'une nouvelle implantation matérielle parallèle de l'algorithme. L'architecture proposée contient 16 blocs de base, chacun dédié à la transformation d'une partie de l'image et comprenant des unités de calcul et la mémoire. Un système spécial d'interconnexions est conçu pour permettre à certaines unités de calcul d'accéder à la mémoire partagée dans d'autres blocs de base. Ceci est nécessaire pour la fusion des CCT partiels. L'architecture a été implantée en VHDL et sa simulation fonctionnelle permet d'estimer une performance de 145 Mpx/s à fréquence d'horloge de 120 MHz / The aim of this work is to enable construction of embedded digital image processing systems, which are both flexible and powerful. The thesis proposal explores the possibility of using an image representation called connected component tree (CCT) as the basis for implementation of the entire image processing chain. This is possible, because the CCT is both simple and general, as CCT-based implementations of operators spanning from filtering to segmentation and recognition exist. A typical CCT-based image processing chain consists of CCT construction from an input image, a cascade of CCT transformations, which implement the individual operators, and image restitution, which generates the output image from the modified CCT. The most time-demanding step is the CCT construction and this work focuses on it. It introduces the CCT and its possible representations in computer memory, shows some of its applications and analyzes existing CCT construction algorithms. A new parallel CCT construction algorithm producing the parent point tree representation of the CCT is proposed. The algorithm is suitable for an embedded system implementation due to its low memory requirements. The algorithm consists of many building and merging tasks. A building task constructs the CCT of a single image line, which is treated as a one-dimensional signal. Merging tasks fuse the CCTs together. Three different task scheduling strategies are developed and evaluated. Performance of the algorithm is evaluated on multiple parallel computers. A throughput 83 Mpx/s at speedup 13.3 is achieved on a 16-core machine with Opteron 885 CPUs. Next, the new algorithm is further adapted for hardware implementation and implemented as a new parallel hardware architecture. The architecture contains 16 basic blocks, each dedicated to processing of an image partition and consisting of execution units and memory. A special interconnection switch is designed to allow some executions units to access memory in other basic blocks. The algorithm requires this for the final merging of the CCTs constructed by different basic blocks together. The architecture is implemented in VHDL and its functional simulation shows performance 145 Mpx/s at clock frequency 120 MHz
|
895 |
Modélisation et commande d’un système à trois phases indépendantes à double fonctionnalité : Traction Électrique et Chargeur Forte Puissance pour application automobile / Modeling and control of a three-phase open-end winding drive integrating two functionalities : electric traction and fast battery charger for automotive applicationSandulescu, Paul 06 September 2013 (has links)
La topologie onduleur à six bras associé à une machine triphasée à phases indépendantes a la propriété de d'offrir, dans le cadre applicatif de l'automobile, une double fonctionnalité, traction et chargeur forte puissance. Cet avantage nécessite, par contre, le contrôle des composantes homopolaires classiquement nulles lors de la présence d'un couplage en étoile. Cette thèse propose alors, d'une part une étude et une modélisation des onduleurs multi-bras et, d'autre part, développe des stratégies de contrôle-commande adaptées à la présence des grandeurs homopolaires. Les algorithmes de commande classiques de l'onduleur sont comparés et une stratégie vectorielle originale, dite Z-SVM permettant d'annuler le courant homopolaire haute fréquence, est développée. Enfin, il est montré comment la gestion des composantes homopolaires aux valeurs moyennes permet d'accroître les performances de l'ensemble à faible comme à haute vitesse, en jouant sur les zones avant et après défluxage des caractéristiques couple-vitesse. Les solutions proposées sont validées sur un banc expérimental composé d'une machine prototype spécialement développée pour une application automobile et alimentée par un onduleur six-bras commandé par des composants de type FPGA. Les stratégies proposées sont comparées en termes de performances et de complexité algorithmique. / For an automotive application, a six leg-VSI connected to a three-phase open-end winding machine has the ability to offer a dual-function. In this case, an additional zero-sequence component, usually absent when a star-coupling is used, needs to be controlled. Firstly, a study, modeling and control of a multi-leg inverter are proposed. Secondly, control structures capable of handling the presence of zero-sequence components are investigated. The conventional control algorithms applied to the inverter are analyzed and an original vector control strategy, called Z-SVM, capable to cancel the high frequency zero-sequence current is developed. Finally, it is shown how the management of the zero-sequence components enhances the performance of the drive at low as well as at high-speed, corresponding on the areas of the torque-speed characteristics before and after flux weakening. The proposed solutions are validated on an experimental test bench consisting of a machine prototype especially developed for automotive application and powered by a six-leg inverter controlled by an FPGA-based device. The proposed strategies are compared in terms of performance and computational complexity.
|
896 |
Intelligent GripperÖstberg, Micael, Norgren, Mikael January 2013 (has links)
The human hand is a great generic gripper as it can grasp objects of unknown shapes, weights and surfaces. Most robotic grippers in today's industry have to be custom made and tuned for each application by engineers, thus many man hours are required to get the desired behavior and repeatability. To be able to adapt some of the capabilities of the human hand into robust industrial robotic grippers would enhance their usability and ease the tuning by engineers once installed. This thesis discusses the development of a robust intelligent gripper for industrial use, based on piezo sensors which have the ability to both sense slippage and detect objects. First, an experimental sensor prototype was developed successfully using an amplification circuit and algorithms implemented in LabView. Secondly, a final prototype containing a signal board, an FPGA board, a simple gripper with linear units and more robust sensor modules where developed. The thesis further discusses which parts of the intelligent gripper that have been successfully implemented within the project time frame and which parts that needs to be further implemented, tested and improved. / Den mänskliga handen är en fantastisk universiell gripklo då den kan greppa objekt av okänd form, vikt och yta. De flesta gripklor i dagens industri måste vara specialgjorda och anpassas för varje applikation av ingenjörer och därmed behövs otaliga mantimmar för att få önskat beteende och repeterbarhet. Att kunna anpassa vissa av den mänskliga handens egenskaper till en robust industriell robotgripklo skulle utöka dess användarområde och lätta upp anpassningen för ingenjörer när den väl är installerad. Detta examensarbete diskuterar hur en robust intelligent gripklo har blivit utvecklat for industriellt bruk baserad på piezo sensorer som har förmågan att känna av glidning och initiell kontakt av objekt. Först, en experimentiell fungerande sensorprototyp utvecklades med hjälp av en förstärkningskrets och algoritmer implementerade i LabView. Därefter utvecklades en slutlig prototyp innehållandes ett signalkort, ett FPGA-kort, en enkel gripklo med linjärenheter och mer robusta sensorer. Examensarbetet tar vidare upp vilka delar som framgångsrikt blivit implementerade och vilka delar som behöver utvecklas ytterligare, testas och förbättras.
|
897 |
Systèmes neuromorphiques temps réel : contribution à l’intégration de réseaux de neurones biologiquement réalistes avec fonctions de plasticitéBelhadj-Mohamed, Bilel 22 July 2010 (has links)
Cette thèse s’intègre dans le cadre du projet Européen FACETS. Pour ce projet, des systèmes matériels mixtes analogique-numérique effectuant des simulations en temps réel des réseaux de neurones doivent être développés. Le but est d’aider à la compréhension des phénomènes d’apprentissage dans le néocortex. Des circuits intégrés spécifiques analogiques ont préalablement été conçus par l’équipe pour simuler le comportement de plusieurs types de neurones selon le formalisme de Hodgkin-Huxley. La contribution de cette thèse consiste à la conception et la réalisation des circuits numériques permettant de gérer la connectivité entre les cellules au sein du réseau de neurones, suivant les règles de plasticité configurées par l’utilisateur. L’implantation de ces règles est réalisée sur des circuits numériques programmables (FPGA) et est optimisée pour assurer un fonctionnement temps réel pour des réseaux de grande taille. Des nouvelles méthodes de calculs et de communication ont été développées pour satisfaire les contraintes temporelles et spatiales imposées par le degré de réalisme souhaité. Entre autres, un protocole de communication basé sur la technique anneau à jeton a été conçu pour assurer le dialogue entre plusieurs FPGAs situés dans un système multicarte tout en garantissant l’aspect temps-réel des simulations. Les systèmes ainsi développés seront exploités par les laboratoires partenaires, neurobiologistes ou informaticiens. / This work has been supported by the European FACETS project. Within this project, we contribute in developing hardware mixed-signal devices for real-time spiking neural network simulation. These devices may potentially contribute to an improved understanding of learning phenomena in the neo-cortex. Neuron behaviours are reproduced using analog integrated circuits which implement Hodgkin-Huxley based models. In this work, we propose a digital architecture aiming to connect many neuron circuits together, forming a network. The inter-neuron connections are reconfigurable and can be ruled by a plasticity model. The architecture is mapped onto a commercial programmable circuit (FPGA). Many methods are developed to optimize the utilisation of hardware resources as well as to meet real-time constraints. In particular, a token-passing communication protocol has been designed and developed to guarantee real-time aspects of the dialogue between several FPGAs in a multiboard system allowing the integration of a large number of neurons. The global system is able to run neural simulations in biological real-time with high degree of realism, and then can be used by neurobiologists and computer scientists to carry on neural experiments.
|
898 |
Conception et exploitation d'un banc d'auto-caractérisation pour la prévision de la fiabilité des circuits numériques programmables / Design and operation of an auto-characterization test bench for predicting the reliability of programmable digital circuits.Naouss, Mohammad 20 October 2016 (has links)
Les circuits logiques programmables (FPGA) bénéficient des technologies les plus avancés de noeuds CMOS, afin de répondre aux demandes croissantes de haute performance et de faible puissance des circuits intégrés numériques. Cela les rend sensibles aux différents mécanismes de dégradations à l'échelle nanométrique. Dans cette thèse, nous nous concentrons sur le vieillissements des tables de correspondances (LUT) sur FPGA. L'utilisation de la dernière technologie d'échelle réduite et la flexibilité de l'architecture du FPGA, permettent de développer un nouveau banc de test à faible coût pour évaluer la fiabilité en fonction de conditions d'utilisations. Ce banc de test peut-être implanté sur plusieurs véhicules du tests et suivis en temps réel par un logiciel de surveillance développé pendant cette thèse. Nous avons caractérisé la dégradation de temps de propagation de la LUT en fonction du rapport cyclique et la fréquence des vecteurs de stress. Nous avons identifié également que le rapport cyclique affecte fortement le temps en descente et modérément le temps en montée de LUT en raison du mécanisme de vieillissement NBTI, tandis que HCI affecte à la fois les deux temps de propagation. En outre, deux modèles semi-empiriques de la dégradation du temps de propagation de la LUT en raison de NBTI et HCI sont proposés dans ce travail. D'autre part, nous avons analysé l'influence de la tension de seuil et la mobilité du transistor sur la dégradation de temps de propagation de la LUT en utilisant le modèle de simulation du transistor. Enfin, un modèle de dégradation de la LUT prenant en compte l'architecture supposée de la LUT est proposé. Ce travail est idéal pour modéliser la dégradation des FPGA au niveau des portes. / Field-Programmable Gate Arrays (FPGAs) benefit from the most advanced CMOS technology nodes, in order to meet the increasing demands of high performance and low power digital integrated cricuits. This makes tem sensible to various aging mechanisms at nanao-scale. In this thesis we focus on aging degradation of the Look-Up Table (LUT) on FPGAs. Benefits from the latest downscaling technology and the flexibility of the FPGAs architecture, allow to develop a new low cost test bench to assess reliabilty depending on the operation condition. This test bench can be implemented on up to 32 FPGAs ans monitored in real time by a supervisory software we developed in this work. We have characterized the delay degradation of LUT depending on the duty cycle and the frequency of stress vectors. We have identified also that the duty cycle affects strongly the fall and moderately the rise delay of LUT due to the NBTI aging mechanisme, while HCI affects both delays. Furthermore, two semiempirical models of the degradation of LUT timing due to NBTI and HCI are proposed in this work. Moreover, we analyzed the influence of threshokd voltage and the mobility of transistor on the timing degradation of LUT using the simulation model of transistor. Finally a model of degradationof LUT taking into account the supposed LUT architecture has been proposed. This work is edeal to model the degradation of FPGA at gate level.
|
899 |
Approche de conception haut-niveau pour l'accélération matérielle de calcul haute performance en finance / High-level approach for hardware acceleration of high-performance computing in financeMena morales, Valentin 12 July 2017 (has links)
Les applications de calcul haute-performance (HPC) nécessitent des capacités de calcul conséquentes, qui sont généralement atteintes à l'aide de fermes de serveurs au détriment de la consommation énergétique d'une telle solution. L'accélération d'applications sur des plateformes hétérogènes, comme par exemple des FPGA ou des GPU, permet de réduire la consommation énergétique et correspond donc à un compromis architectural plus séduisant. Elle s'accompagne cependant d'un changement de paradigme de programmation et les plateformes hétérogènes sont plus complexes à prendre en main pour des experts logiciels. C'est particulièrement le cas des développeurs de produits financiers en finance quantitative. De plus, les applications financières évoluent continuellement pour s'adapter aux demandes législatives et concurrentielles du domaine, ce qui renforce les contraintes de programmabilité de solutions d'accélérations. Dans ce contexte, l'utilisation de flots haut-niveaux tels que la synthèse haut-niveau (HLS) pour programmer des accélérateurs FPGA n'est pas suffisante. Une approche spécifique au domaine peut fournir une réponse à la demande en performance, sans que la programmabilité d'applications accélérées ne soit compromise.Nous proposons dans cette thèse une approche de conception haut-niveau reposant sur le standard de programmation hétérogène OpenCL. Cette approche repose notamment sur la nouvelle implémentation d'OpenCL pour FPGA introduite récemment par Altera. Quatre contributions principales sont apportées : (1) une étude initiale d'intégration de c'urs de calculs matériels à une librairie logicielle de calcul financier (QuantLib), (2) une exploration d'architectures et de leur performances respectives, ainsi que la conception d'une architecture dédiée pour l'évaluation d'option américaine et l'évaluation de volatilité implicite à partir d'un flot haut-niveau de conception, (3) la caractérisation détaillée d'une plateforme Altera OpenCL, des opérateurs élémentaires, des surcouches de contrôle et des liens de communication qui la compose, (4) une proposition d'un flot de compilation spécifique au domaine financier, reposant sur cette dernière caractérisation, ainsi que sur une description des applications financières considérées, à savoir l'évaluation d'options. / The need for resources in High Performance Computing (HPC) is generally met by scaling up server farms, to the detriment of the energy consumption of such a solution. Accelerating HPC application on heterogeneous platforms, such as FPGAs or GPUs, offers a better architectural compromise as they can reduce the energy consumption of a deployed system. Therefore, a change of programming paradigm is needed to support this heterogeneous acceleration, which trickles down to an increased level of programming complexity tackled by software experts. This is most notably the case for developers in quantitative finance. Applications in this field are constantly evolving and increasing in complexity to stay competitive and comply with legislative changes. This puts even more pressure on the programmability of acceleration solutions. In this context, the use of high-level development and design flows, such as High-Level Synthesis (HLS) for programming FPGAs, is not enough. A domain-specific approach can help to reach performance requirements, without impairing the programmability of accelerated applications.We propose in this thesis a high-level design approach that relies on OpenCL, as a heterogeneous programming standard. More precisely, a recent implementation of OpenCL for Altera FPGA is used. In this context, four main contributions are proposed in this thesis: (1) an initial study of the integration of hardware computing cores to a software library for quantitative finance (QuantLib), (2) an exploration of different architectures and their respective performances, as well as the design of a dedicated architecture for the pricing of American options and their implied volatility, based on a high-level design flow, (3) a detailed characterization of an Altera OpenCL platform, from elemental operators, memory accesses, control overlays, and up to the communication links it is made of, (4) a proposed compilation flow that is specific to the quantitative finance domain, and relying on the aforementioned characterization and on the description of the considered financial applications (option pricing).
|
900 |
UMA PROPOSTA DE ARQUITETURA DE PILHA DE COMUNICAÇÃO EM REDE COM UM NÚMERO REDUZIDO DE CAMADAS / A NOVELL NETWORK STACK ARCHITECTURE WITH REDUCED NUMBER OF LAYERSFreitas, Josué Paulo José de 22 August 2009 (has links)
This work presents a network stack architecture proposal with a reduced number of layers. The reduction in number of layers aim to provided a simpler and efficient communication method to embedded systems by allowing the microprocessor, where usually application is implemented, run just application code and not running code related to network communication. The architetucture was implemented on and FPGA board and show, in average, throughput
results around 27 times better in comparision with a network stack implemented in software and running over an embedded microprocessor. / Este trabalho apresenta uma proposta arquitetura de pilha de comunicação em rede com número reduzido de camadas. A redução do número de camadas visa fornecer um método de
comunicação simples e eficaz para sistemas embarcados permitindo que o microprocessador, onde geralmente a Camada de Aplicação é implementada, execute apenas código de aplicação isentando-se assim de tarefas de comunicação em rede. A arquitetura foi implementada em placa de desenvolvimento FPGA e apresentou, em média, vazão cerca de 27 vezes superior em comparação com uma pilha de comunicação implementada em software e executada sobre um
microprocessador embarcado.
|
Page generated in 0.0212 seconds