1581 |
Resilient regular expression matching on FPGAs with fast error repair / Avaliação resiliente de expressões regulares em FPGAs com rápida correção de errosLeipnitz, Marcos Tomazzoli January 2017 (has links)
O paradigma Network Function Virtualization (NFV) promete tornar as redes de computadores mais escaláveis e flexíveis, através do desacoplamento das funções de rede de hardware dedicado e fornecedor específico. No entanto, funções de rede computacionalmente intensivas podem ser difíceis de virtualizar sem degradação de desempenho. Neste contexto, Field-Programmable Gate Arrays (FPGAs) têm se mostrado uma boa opção para aceleração por hardware de funções de rede virtuais que requerem alta vazão, sem se desviar do conceito de uma infraestrutura NFV que visa alta flexibilidade. A avaliação de expressões regulares é um mecanismo importante e computacionalmente intensivo, usado para realizar Deep Packet Inpection, que pode ser acelerado por FPGA para atender aos requisitos de desempenho. Esta solução, no entanto, apresenta novos desafios em relação aos requisitos de confiabilidade. Particularmente para FPGAs baseados em SRAM, soft errors na memória de configuração são uma ameaça de confiabilidade significativa. Neste trabalho, apresentamos um mecanismo de tolerância a falhas abrangente para lidar com falhas de configuração na funcionalidade de módulos de avaliação de expressões regulares baseados em FPGA. Além disso, é introduzido um mecanismo de correção de erros que considera o posicionamento desses módulos no FPGA para reduzir o tempo de reparo do sistema, melhorando a confiabilidade e a disponibilidade. Os resultados experimentais mostram que a taxa de falha geral e o tempo de reparo do sistema podem ser reduzidos em 95% e 90%, respectivamente, com custos de área e performance admissíveis. / The Network Function Virtualization (NFV) paradigm promises to make computer networks more scalable and flexible by decoupling the network functions (NFs) from dedicated and vendor-specific hardware. However, network and compute intensive NFs may be difficult to virtualize without performance degradation. In this context, Field-Programmable Gate Arrays (FPGAs) have been shown to be a good option for hardware acceleration of virtual NFs that require high throughput, without deviating from the concept of an NFV infrastructure which aims at high flexibility. Regular expression matching is an important and compute intensive mechanism used to perform Deep Packet Inspection, which can be FPGA-accelerated to meet performance constraints. This solution, however, introduces new challenges regarding dependability requirements. Particularly for SRAM-based FPGAs, soft errors on the configuration memory are a significant dependability threat. In this work we present a comprehensive fault tolerance mechanism to deal with configuration faults on the functionality of FPGA-based regular expression matching engines. Moreover, a placement-aware scrubbing mechanism is introduced to reduce the system repair time, improving the system reliability and availability. Experimental results show that the overall failure rate and the system mean time to repair can be reduced in 95% and 90%, respectively, with manageable area and performance costs.
|
1582 |
Etude et modélisation de stratégies de régulation linéaires découplantes appliquées à un convertisseur multicellulaire parallèleGarreau, Clement 01 June 2018 (has links) (PDF)
Les structures de conversion multi-niveaux parallèles permettent de faire transiter de fortscourants tout en gardant une bonne puissance massique ; celles-ci sont réalisées en parallélisantdes cellules de commutation. Cette parallélisation permet de réduire le courant dans chaquecellule et ainsi de revenir dans des gammes plus standard de composants de puissance. Laparallélisation, en utilisant une commande adaptée, améliore les formes d’onde en sortie duconvertisseur. Ce manuscrit se focalisera sur une structure de conversion multiniveaux parallèlespécifique constituée de bras de hacheur dévolteur en parallèles couplés magnétiquement. Eneffet du fait de la commande entrelacée mise en place, l’ondulation du courant de sortie se voitréduite mais en contrepartie l’utilisation d’inductances séparées sur chaque bras entraine uneaugmentation de l’ondulation des courants de bras, directement liée au nombre de cellules decommutation, en fonction de l’ondulation du courant de sortie. Afin de palier à ce problème cesinductances sont remplacées par un (ou plusieurs) coupleur(s) magnétique(s) qui permet(tent) deréduire l’ondulation de courant dans chaque bras. Cependant dans le but de garantir la nonsaturation ainsi qu’une bonne intégration des coupleurs il est nécessaire de s’assurer del’équilibrage des courants de chaque bras malgré une différence entre les paramètres. Ainsi cemanuscrit s’est axé vers la détermination de différentes méthodes de modélisation découplant lesystème permettant le maintien de l’égale répartition des courants en utilisant des différences derapports cycliques. Ces méthodes de modélisation ont été généralisées afin de réaliser unalgorithme permettant de générer des lois de commande quel que soit le nombre de cellules enparallèle. Dans une dernière partie ces lois de commande ont été testées sur un prototype en lesimplémentant sur FPGA afin de procéder à une vérification expérimentale
|
1583 |
Novel scalable and real-time embedded transceiver systemMohammed, Rand Basil January 2017 (has links)
Our society increasingly relies on the transmission and reception of vast amounts of data using serial connections featuring ever-increasing bit rates. In imaging systems, for example, the frame rate achievable is often limited by the serial link between camera and host even when modern serial buses with the highest bit rates are used. This thesis documents a scalable embedded transceiver system with a bandwidth and interface standard that can be adapted to suit a particular application. This new approach for a real-time scalable embedded transceiver system is referred to as a Novel Reference Model (NRM), which connects two or more applications through a transceiver network in order to provide real-time data to a host system. Different transceiver interfaces for which the NRM model has been tested include: LVDS, GIGE, PMA-direct, Rapid-IO and XAUI, one support a specific range for transceiver speed that suites a special type for transceiver physical medium. The scalable serial link approach has been extended with loss-less data compression with the aim of further increasing dataflow at a given bit rate. Two lossless compression methods were implemented, based on Huffman coding and a novel method called Reduced Lossless Compression Method (RLCM). Both methods are integrated into the scalable transceivers providing a comprehensive solution for optimal data transmission over a variety of different interfaces. The NRM is implemented on a field programmable gate array (FPGA) using a system architecture that consists of three layers: application, transport and physical. A Terasic DE4 board was used as the main platform for implementing and testing the embedded system, while Quartus-II software and tools were used to design and debug the embedded hardware systems.
|
1584 |
Proposta e implementação de uma Camada de Integração de Serviços de Segurança (CISS) em SoC e multiplataforma. / Proposal and Implementation of an Security Services Integration Layer (ISSL) in SoC and multiplatform.Fábio Dacêncio Pereira 09 November 2009 (has links)
As redes de computadores são ambientes cada vez mais complexos e dotados de novos serviços, usuários e infra-estruturas. A segurança e a privacidade de informações tornam-se fundamentais para a evolução destes ambientes. O anonimato, a fragilidade e outros fatores muitas vezes estimulam indivíduos mal intencionados a criar ferramentas e técnicas de ataques a informações e a sistemas computacionais. Isto pode gerar desde pequenas inconveniências até prejuízos financeiros e morais. Nesse sentido, a detecção de intrusão aliada a outras ferramentas de segurança pode proteger e evitar ataques maliciosos e anomalias em sistemas computacionais. Porém, considerada a complexidade e robustez de tais sistemas, os serviços de segurança muitas vezes não são capazes de analisar e auditar todo o fluxo de informações, gerando pontos falhos de segurança que podem ser descobertos e explorados. Neste contexto, esta tese de doutorado propõe, projeta, implementa e analisa o desempenho de uma camada de integração de serviços de segurança (CISS). Na CISS foram implementados e integrados serviços de segurança como Firewall, IDS, Antivírus, ferramentas de autenticação, ferramentas proprietárias e serviços de criptografia. Além disso, a CISS possui como característica principal a criação de uma estrutura comum para armazenar informações sobre incidentes ocorridos em um sistema computacional. Estas informações são consideradas como a fonte de conhecimento para que o sistema de detecção de anomalias, inserido na CISS, possa atuar com eficiência na prevenção e proteção de sistemas computacionais detectando e classificando prematuramente situações anômalas. Para isso, foram criados modelos comportamentais com base nos conceitos de Modelo Oculto de Markov (HMM) e modelos de análise de seqüências anômalas. A CISS foi implementada em três versões: (i) System-on-Chip (SoC), (ii) software JCISS em Java e (iii) simulador. Resultados como desempenho temporal, taxas de ocupação, o impacto na detecção de anomalias e detalhes de implementação são apresentados, comparados e analisados nesta tese. A CISS obteve resultados expressivos em relação às taxas de detecção de anomalias utilizando o modelo MHMM, onde se destacam: para ataques conhecidos obteve taxas acima de 96%; para ataques parciais por tempo, taxas acima de 80%; para ataques parciais por seqüência, taxas acima de 96% e para ataques desconhecidos, taxas acima de 54%. As principais contribuições da CISS são a criação de uma estrutura de integração de serviços de segurança e a relação e análise de ocorrências anômalas para a diminuição de falsos positivos, detecção e classificação prematura de anormalidades e prevenção de sistemas computacionais. Contudo, soluções foram criadas para melhorar a detecção como o modelo seqüencial e recursos como o subMHMM, para o aprendizado em tempo real. Por fim, as implementações em SoC e Java permitiram a avaliação e utilização da CISS em ambientes reais. / Computer networks are increasingly complex environments and equipped with new services, users and infrastructure. The information safety and privacy become fundamental to the evolution of these environments. The anonymity, the weakness and other factors often encourage people to create malicious tools and techniques of attacks to information and computer systems. It can generate small inconveniences or even moral and financial damage. Thus, the detection of intrusion combined with other security tools can protect and prevent malicious attacks and anomalies in computer systems. Yet, considering the complexity and robustness of these systems, the security services are not always able to examine and audit the entire information flow, creating points of security failures that can be discovered and explored. Therefore, this PhD thesis proposes, designs, implements and analyzes the performance of an Integrated Security Services Layer (ISSL). So several security services were implemented and integrated to the ISSL such as Firewall, IDS, Antivirus, authentication tools, proprietary tools and cryptography services. Furthermore, the main feature of our ISSL is the creation of a common structure for storing information about incidents in a computer system. This information is considered to be the source of knowledge so that the system of anomaly detection, inserted in the ISSL, can act effectively in the prevention and protection of computer systems by detecting and classifying early anomalous situations. In this sense, behavioral models were created based on the concepts of the Hidden Markov Model (MHMM) and models for analysis of anomalous sequences. The ISSL was implemented in three versions: (i) System-on-Chip (SoC), (ii) JCISS software in Java and (iii) one simulator. Results such as the time performance, occupancy rates, the impact on the detection of anomalies and details of implementation are presented, compared and analyzed in this thesis. The ISSL obtained significant results regarding the detection rates of anomalies using the model MHMM, which are: for known attacks, rates of over 96% were obtained; for partial attacks by a time, rates above 80%, for partial attacks by a sequence, rates were over 96% and for unknown attacks, rates were over 54%. The main contributions of ISSL are the creation of a structure for the security services integration and the relationship and analysis of anomalous occurrences to reduce false positives, early detection and classification of abnormalities and prevention of computer systems. Furthermore, solutions were figured out in order to improve the detection as the sequential model, and features such as subMHMM for learning at real time. Finally, the SoC and Java implementations allowed the evaluation and use of the ISSL in real environments.
|
1585 |
Development of an artificial neural network architecture using programmable logicCottens, Pablo Eduardo Pereira de Araujo 07 March 2016 (has links)
Submitted by Silvana Teresinha Dornelles Studzinski (sstudzinski) on 2016-06-29T14:42:16Z
No. of bitstreams: 1
Pablo Eduardo Pereira de Araujo Cottens_.pdf: 1315690 bytes, checksum: 78ac4ce471c2b51e826c7523a01711bd (MD5) / Made available in DSpace on 2016-06-29T14:42:16Z (GMT). No. of bitstreams: 1
Pablo Eduardo Pereira de Araujo Cottens_.pdf: 1315690 bytes, checksum: 78ac4ce471c2b51e826c7523a01711bd (MD5)
Previous issue date: 2016-03-07 / Nenhuma / Normalmente Redes Neurais Artificiais (RNAs) necessitam estações de trabalho para o seu processamento, por causa da complexidade do sistema. Este tipo de arquitetura de processamento requer que instrumentos de campo estejam localizados na vizinhança da estação de trabalho, caso exista a necessidade de processamento em tempo real, ou que o dispositivo de campo possua como única tarefa a de coleta de dados para processamento futuro. Este projeto visa criar uma arquitetura em lógica programável para um neurônio genérico, no qual as RNAs podem fazer uso da natureza paralela de FPGAs para executar a aplicação de forma rápida. Este trabalho mostra que a utilização de lógica programável para a implementação de RNAs de baixa resolução de bits é viável e as redes neurais, devido à natureza paralelizável, se beneficiam pela implementação em hardware, podendo obter resultados de forma muito rápida. / Currently, modern Artificial Neural Networks (ANN), according to their complexity, require a workstation for processing all their input data. This type of processing architecture requires that the field device is located somewhere in the vicintity of a workstation, in case real-time processing is required, or that the field device at hand will have the sole task of collecting data for future processing, when field data is required. This project creates a generic neuron architecture in programmabl logic, where Artifical Neural Networks can use the parallel nature of FPGAs to execute applications in a fast manner, albeit not using the same resolution for its otputs. This work shows that the utilization of programmable logic for the implementation of low bit resolution ANNs is not only viable, but the neural network, due to its parallel nature, benefits greatly from the hardware implementation, giving fast and accurate results.
|
1586 |
Online scheduling for real-time multitasking on reconfigurable hardware devicesWassi-Leupi, Guy January 2011 (has links)
Nowadays the ever increasing algorithmic complexity of embedded applications requires the designers to turn towards heterogeneous and highly integrated systems denoted as SoC (System-on-a-Chip). These architectures may embed CPU-based processors, dedicated datapaths as well as recon gurable units. However, embedded SoCs are submitted to stringent requirements in terms of speed, size, cost, power consumption, throughput, etc. Therefore, new computing paradigms are required to ful l the constraints of the applications and the requirements of the architecture. Recon gurable Computing is a promising paradigm that provides probably the best trade-o between these requirements and constraints. Dynamically recon gurable architectures are their key enabling technology. They enable the hardware to adapt to the application at runtime. However, these architectures raise new challenges in SoC design. For example, on one hand, designing a system that takes advantage of dynamic recon guration is still very time consuming because of the lack of design methodologies and tools. On the other hand, scheduling hardware tasks di ers from classical software tasks scheduling on microprocessor or multiprocessors systems, as it bears a further complicated placement problem. This thesis deals with the problem of scheduling online real-time hardware tasks on Dynamically Recon gurable Hardware Devices (DRHWs). The problem is addressed from two angles : (i) Investigating novel algorithms for online real-time scheduling/placement on DRHWs. (ii) Scheduling/Placement algorithms library for RTOS-driven Design Space Exploration (DSE). Regarding the first point, the thesis proposes two main runtime-aware scheduling and placement techniques and assesses their suitability for online real-time scenarios. The first technique discusses the impact of synthesizing, at design time, several shapes and/or sizes per hardware task (denoted as multi-shape task), in order to ease the online scheduling process. The second technique combines a looking-ahead scheduling approach with a slots-based recon gurable areas management that relies on a 1D placement. The results show that in both techniques, the scheduling and placement quality is improved without signi cantly increasing the algorithm time complexity. Regarding the second point, in the process of designing SoCs embedding recon gurable parts, new design paradigms tend to explore and validate as early as possible, at system level, the architectural design space. Therefore, the RTOS (Real-Time Operating System) services that manage the recon gurable parts of the SoC can be re fined. In such a context, gathering numerous hardware tasks scheduling and placement algorithms of various complexity vs performance trade-o s in a kind of library is required. In this thesis, proposed algorithms in addition to some existing ones are purposely implemented in C++ language, in order to insure the compatibility with any C++/SystemC based SoC design methodology.
|
1587 |
Architecture générique pour le système de vision sur FPGA - Application à la détection de trait laser / Generic architecture for real time vision system on FPGA – Application to laser line detectionColak, Seher 19 April 2018 (has links)
Cette thèse s’inscrit dans le cadre d’une convention industrielle de formation par la recherche (CIFRE) entre le laboratoire Hubert Curien et l’entreprise Pattyn Bakery Division. L’objectif de ces travaux est le développement d’un système de détection de trait laser sur FPGA (Field Programmable Gate Array) qui soit plus performant que système actuel de l’entreprise. Dans l’industrie, les concepteurs de systèmes de vision doivent pouvoir créer et modifier facilement leurs systèmes afin de pouvoir les adapter aux besoins de leurs clients et aux évolutions technologiques. Ainsi les opérateurs développés doivent être génériques afin de permettre aux concepteurs de modifier le système de vision sans nécessairement avoir de compétences matérielles. Les concepteurs doivent également pouvoir être en mesure d’estimer quelles seront les ressources utilisées par l’opérateur en cas modifications du système : paramètres de l’application, capteur, famille de FPGA... Dans ce manuscrit, les principaux algorithmes de détection de trait laser ainsi que leurs propriétés ont été étudiés. Un opérateur de détection de trait laser a été choisi et développé. L’implantation de cet opérateur sur une caméra-FPGA du marché a permis d’obtenir un premier prototype fonctionnel. Les performances temporelles de ce nouveau système sont quatre fois supérieures à celles du système actuellement utilisé par l’entreprise. Le nouveau système est capable de traiter jusqu’à 2500 images par seconde. Enfin, les modèles de la consommation des ressources permettent de dimensionner une architecture à partir d’un ensemble de paramètres prédéfinis de manière rapide et sans faire de synthèses. Le paramètre auquel les concepteurs doivent prêter le plus d’attention est le niveau de parallélisme des données. Ce paramètre permet d’exploiter les capacités de parallélisme du FPGA en consommant plus de ressources. Cependant, les ressources du FPGA sont limitées et augmenter le niveau de parallélisme peut induire la nécessité de changer de FPGA. Le système et les données fournies permettront à l’entreprise d’adapter le système de vision selon les besoins futurs des clients en les guidant vers le choix du matériel / This thesis is part of an industrial research training agreement (CIFRE) between the Hubert Curien laboratory and the company Pattyn Bakery Division. The goal of this work is the development of an FPGA laser line detection system that is more efficient than the current system of the company. In the industry, vision system designers need to be able to easily create and modify their systems in order to adapt them to their customers’ needs and technological developments. Thus developed operators must be generic to allow designers to change the vision system without necessarily having material skills. Designers must also be able to estimate what resources will be used by the operator in case of system changes : application parameters, sensor, family of FPGAs ... In this manuscript, the main laser line detection algorithms and their properties have been studied. A laser line detection operator was chosen and developed. The implementation of this operator on an FPGA-camera from market has resulted in a first functional prototype. The time performance of this new system is four times that of the system currently used by the company. The new system is able to process up to 2500 frames per second. Finally, resource consumption models makes it possible to size an architecture from a set of predefined parameters quickly and without synthesizing. The parameter to which designers must pay the most attention is the level of parallelism of the data. This parameter makes it possible to exploit the parallelism capabilities of the FPGA by consuming more resources. However, the resources of the FPGA are limited and increasing the level of parallelism can induce the need to change the family of FPGAs. The system and the data provided will enable the company to adapt the vision system to the future needs of customers by guiding the choice of equipment.
|
1588 |
Design of an Adaptable Run-Time Reconfigurable Software-Defined Radio Processing ArchitectureTemplin, Joshua R. 01 December 2010 (has links)
Processing power is a key technical challenge holding back the development of a high-performance software defined radio (SDR). Traditionally, SDR has utilized digital signal processors (DSPs), but increasingly complex algorithms, higher data rates, and multi-tasking needs have exceed the processing capabilities of modern DSPs. Reconfigurable computers, such as field-programmable gate arrays (FPGAs), are popular alternatives because of their performance gains over software for streaming data applications like SDR. However, FPGAs have not yet realized the ideal SDR because architectures have not fully utilized their partial reconfiguration (PR) capabilities to bring needed flexibility. A reconfigurable processor architecture is proposed that utilizes PR in reconfigurable computers to achieve a more sophisticated SDR. The proposed processor contains run-time swappable blocks whose parameters and interconnects are programmable. The architecture is analyzed for performance and flexibility and compared with available alternate technologies. For a sample QPSK algorithm, hardware performance gains of at least 44x are seen over modern desktop processors and DSPs while most of their flexibility and extensibility is maintained.
|
1589 |
Low Power Technology Mapping and Performance Driven Placement for Field Programmable Gate ArraysLi, Hao, 09 November 2004 (has links)
As technology geometries have shrunk to the deep sub-micron (DSM) region, the chip density and clock frequency of FPGAs have increased significantly. This makes computer-aided design (CAD) for FPGAs very important and challenging. Due to the increasing demands of portable devices and mobile computing, low power design is crucial in CAD nowadays. In this dissertation, we present a framework to optimize power consumption for technology mapping onto FPGAs. We propose a low-power technology mapping scheme which is able to predict the impact of choosing a subnetwork covering on the ultimate mapping solution. We dynamically update the power estimation for a sequence of options and choose the one that yields the least power consumption. This technique outperforms the best low-power mapping algorithms reported in the literature. We further extend this work to generate mapping solutions with optimal delay.
We also propose placement algorithms to optimize the performance of the placed circuit. Net cluster based methodology is designed to ensure closely connected nets will be routed in the same region. Net cluster is obtained by clique partitioning on the net dependency graph. Net positions and consequent cell positions are computed with a force-directed approach which drags nets connected to closer positions. We further study the performance-driven placement problem for high level synthesis. We use the Automatic Design Instantiation (AUDI) high level synthesis system to generate a register-transistor level (RTL) netlist. This RTL netlist is fed into a CAD tool for physical synthesis. We do not necessarily go through the entire physical design process which is usually quite time-consuming. Instead, we have created an accurate wirelength/timing estimator working on the floorplan. If the estimated timing information does not meet the constraints, a guidance is generated and provided to AUDI system. The guidance consists of the estimated timing information and instructions to produce a new netlist in order to improve the performance. Finally the circuit is placed and routed on a satisfying design. This performance-driven placement framework yields better results as compared to a commercial CAD tool.
|
1590 |
Traitement parallèle des comparaisons intensives de séquences génomiquesNguyen, Van Hoa 12 November 2009 (has links) (PDF)
La comparaison de séquences est une des tâches fondamentales de la bioinformatique. Les nouvelles technologies de séquençage conduisent à une production accélérée des données génomiques et renforcent les besoins en outils rapides et efficaces pour effectuer cette tâche. Dans cette thèse, nous proposons un nouvel algorithme de comparaison intensive de séquences, explicitement conçu pour exploiter toutes les formes de parallélisme présentes dans les microprocesseurs de dernière génération (instruction SIMD, architecture multi-coeurs). Cet algorithme s'adapte également à un parallélisme massif que l'on peut trouver sur des accélérateurs de type FPGA ou GPU. Cet algorithme a été mis en oeuvre à travers le logiciel PLAST (Parallel Local Alignment Search Tool). Différentes versions sont disponibles suivant les données à traiter (protéine et/ou ADN). Une version MPI a également été mise au point pour un déploiement sur un cluster de PCs. En fonction de la nature des données et des technologies employées des accélérations de 3 à 20 ont été mesurées par rapport à la référence du domaine, le logiciel BLAST, pour un niveau de qualité équivalent.
|
Page generated in 0.0377 seconds