Global ETD Search

61	LALP+ : um framework para o desenvolvimento de aceleradores de hardware em FPGAs / LALP+ : a framework for developing FPGA-based hardware accelerators Cristiano Bacelar de Oliveira 21 December 2015 (has links) Considerando a crescente demanda por desempenho em sistemas computacionais, a implementação de algoritmos diretamente em hardware com o uso de FPGAs (Field-programmable Gate Arrays) é uma alternativa que tem apresentado bons resultados. Porém, os desafios de programação envolvidos no uso de FPGAs, de tal forma a explorar eficientemente seus recursos, limita o número de desenvolvedores em função da predominância do paradigma de programação tradicionalmente sequencial, imposto pelas linguagens imperativas. Assim, este trabalho busca desenvolver mecanismos que facilitem o desenvolvimento com FPGAs, otimizando o uso de memória e explorando o paralelismo das operações. Este documento apresenta a tese de doutorado de título LALP+ : um framework para o desenvolvimento de aceleradores de hardware em FPGAs. Dado que a latência para leitura e escrita de dados têm sido um gargalo para algumas aplicações de alto desempenho, este trabalho trata do desenvolvimento de técnicas para geração de arquiteturas de hardware, considerando aspectos relativos ao mapeamento, gerenciamento e acesso à memória em arquiteturas reconfiguráveis. Para isto, o projeto desenvolvido utiliza como base a linguagem LALP, cujo foco é o tratamento de loops com a técnica de loop pipelining. As técnicas descritas nesta tese são empregadas no desenvolvimento do framework LALP+, o qual estende LALP com a implementação de novas características e funcionalidades, de forma a contribuir para o aumento do seu nível de abstração. As arquiteturas criadas utilizando LALP+ foram comparadas às geradas por ferramentas comerciais e acadêmicas, tendo apresentado, em média, um melhor desempenho, com redução do tempo de execução de 10;01, no melhor caso. Espera-se, por meio das contribuições aqui apresentadas, facilitar a implementação de produtos e projetos relacionados a aplicações de computação de alto desempenho que envolvam o uso de arquiteturas reconfiguráveis, promovendo uma maior absorção desta tecnologia. / Considering the demand for high-performance in computer systems, the implementation of algorithms directly in hardware by using FPGAs (Field-programmable Gate Arrays) is an alternative that has shown good results. However, the number of developers is limited due to the challenges faced for efficiently programming FPGAs. In addition to that, developers are more used to the traditional sequential programming paradigm imposed by the imperative languages. This work seeks to develop mechanisms to facilitate the development with FPGAs, by optimizing memory usage and exploiting the parallelism of operations inside a loop. This document presents the doctoral thesis entitled LALP+ : a framework for developing FPGA-based hardware accelerators. Since the latency for reading and writing data have been a bottleneck for high performance applications, this work deals with the development of techniques for generation of hardware architectures, considering aspects related to mapping, management and memory access in reconfigurable architectures, using as basis the LALP language, which focuses on the treatment of loops with the technique of loop pipelining. The techniques described in this thesis are employed in the development of the LALP+ framework, which extends LALP by implementing new features and functionalities, in order to contribute to increase its abstraction level. LALP+ architectures were compared to ones generated by using academical and commercial tools, having presented, on average, better performance, with a execution time speedup of 10;01 for the best case. Thus, it is expected that the hereby presented contributions facilitate the implementation of products and projects related to high-performance computing applications with reconfigurable architectures, contributing for the use of such technology. Compiladores Computação reconfigurável FPGA HLS Memória Compilers FPGA HLS Memory Reconfigurable Computing
62	Implementação de um módulo Ethernet 10/100Mbps com interface Avalon para o processador Nios II da Altera / Implementation of an Ethernet 10/100Mbps core with Avalon interface for Nios II processor from Altera Ricardo Menotti 06 May 2005 (has links) O presente trabalho apresenta a implementação de um core de rede Ethernet 10/100Mbps com interface para o barramento Avalon para utilização em conjunto com o processador Nios II da Altera. A tecnologia Ethernet foi implementada em computação reconfigurável e utilizou-se como base um módulo disponível na Internet denominado OpenCores MAC 10/100. O projeto foi desenvolvido para ser aplicado em sistemas embarcados, mais especificamente para o uso em um robô móvel em desenvolvimento no Laboratório de Computação Reconfigurável do ICMC/USP. O core foi incorporado à biblioteca da ferramenta SoPC Builder da Altera, visando uma fácil integração do mesmo em outros projetos. Foram utilizadas as ferramentas Quartus II e ModelSim para o desenvolvimento e testes do sistema, além de dois kits Nios versão Stratix para a validação do projeto, sendo as placas interligadas ponto-a-ponto sem a utilizaçao de transceivers analógicos. / This work presents the implementation of a network Ethernet 10/100Mbps core with interfaces to Avalon bus for using with the Nios II processor from Altera. The Ethernet technology was implemented in reconfigurable computing and was based in the OpenCores MAC 10/100 available on Internet. The project was developed for embedded systems applications, more specifically for a mobile robot in development at Reconfigurable Computing Laboratory from ICMC/USP. The core was incorporated to SoPC Builder tools library from Altera, aiming to facilitate the integration with others projects. To development and system tests were used Quartus II and ModelSim, and two Nios Development kit Statix Edition for project validation. The boards were linked peer-to-peer, without use analog transceivers. Computação Reconfigurável Ethernet FPGA Sistemas Embutidos SoC SoPC Embedded Systems Ethernet FPGA Reconfigurable Computing SoC SoPC
63	Co-projeto de hardware e software de um escalonador de processos para arquiteturas multicore heterogêneas baseadas em computação reconfigurável / Hardware and software co-design of a process scheduler for heterogeneous multicore architectures based on reconfigurable computing Maikon Adiles Fernandez Bueno 05 November 2013 (has links) As arquiteturas multiprocessadas heterogêneas têm como objetivo principal a extração de maior desempenho da execução dos processos, por meio da utilização de núcleos apropriados às suas demandas. No entanto, a extração de maior desempenho é dependente de um mecanismo eficiente de escalonamento, capaz de identificar as demandas dos processos em tempo real e, a partir delas, designar o processador mais adequado, de acordo com seus recursos. Este trabalho tem como objetivo propor e implementar o modelo de um escalonador para arquiteturas multiprocessadas heterogêneas, baseado em software e hardware, aplicado ao sistema operacional Linux e ao processador SPARC Leon3, como prova de conceito. Nesse sentido, foram implementados monitores de desempenho dentro dos processadores, os quais identificam as demandas dos processos em tempo real. Para cada processo, sua demanda é projetada para os demais processadores da arquitetura e em seguida é realizado um balanceamento visando maximizar o desempenho total do sistema, distribuindo os processos entre processadores, de modo a diminuir o tempo total de processamento de todos os processos. O algoritmo de maximização Hungarian, utilizado no balanceamento do escalonador, foi desenvolvido em hardware, proporcionando paralelismo e maior desempenho na execução do algoritmo. O escalonador foi validado por meio da execução paralela de diversos benchmarks, resultando na diminuição dos tempos de execução em relação ao escalonador sem suporte à heterogeneidade / Heterogeneous multiprocessor architectures have as main objective the extraction of higher performance from processes through the use of appropriate cores to their demands. However, the extraction of higher performance is dependent on an efficient scheduling mechanism, able to identify in real-time the demands of processes and to designate the most appropriate processor according to their resources. This work aims at design and implementations of a model of a scheduler for heterogeneous multiprocessor architectures based on software and hardware, applied to the Linux operating system and the SPARC Leon3 processor as proof of concept. In this sense, performance monitors have been implemented within the processors, which in real-time identifies the demands of processes. For each process, its demand is projected for the other processors in the architecture and then it is performed a balancing to maximize the total system performance by distributing processes among processors. The Hungarian maximization algorithm, used in balancing scheduler was developed in hardware, providing greater parallelism and performance in the execution of the algorithm. The scheduler has been validated through the parallel execution of several benchmarks, resulting in decreased execution times compared to the scheduler without the heterogeneity support Computação reconfigurável Escalonador Multi-core Multiprocessamento heterogêneo Heterogeneous multiprocessing Multicore Reconfigurable computing Scheduler
64	Projeto de um sistema embarcado de predição de colisão e pedestres baseado em computação reconfigurável / Design of an embedded system of pedestrian collision prediction based on reconfigurable computing Leandro Andrade Martinez 02 December 2011 (has links) Este trabalho apresenta a construção de um sistema embarcado para detectar pedestres, utilizando computação reconfigurável com captura de imagens através de uma única câmera acoplada a um veículo que trafega em ambiente urbano. A principal motivação é a necessidade de reduzir o número vítimas causadas por acidentes de trânsito envolvendo pedestres. Uma das causas está relacionada com a velocidade de resposta do cérebro humano para reconhecer situações de perigo e tomar decisões. Como resultando, há um interesse mundial de cientistas para elaborar soluções economicamente viáveis que venham a contribuir com inovações tecnológicas direcionadas a auxiliar motoristas na condução de veículos. A implementação em hardware deste sistema foi desenvolvida em FPGA e dividida em blocos interconectados. Primeiramente, no pré-tratamento do vídeo, foi construído um bloco para conversão de dados da câmera para escala de cinza, em seguida, um bloco simplificado para a estabilização vertical dinâmica de vídeo. Para a detecção foram construídos dois blocos, um para detecção binária de movimento e um bloco de detecção BLOB. Para fazer a classificação, foi construído um bloco para identificação do tamanho do objeto em movimento e fazendo a seleção pela proporcionalidade. Os testes em ambiente real deste sistema demonstraram ótimos resultados para uma velocidade máxima de 30 km/h / This work proposes an embedded system to detect pedestrians using reconfigurable computing making the image acquisition through a mono-camera attached to a vehicle in an urban environment. This work is motivated by the need to reduce the number of traffic accidents, even with government support, each year hundreds of people become victims thus bringing great damage to the economy. As a result, there is also a global concern of scientists to promote economically viable solutions that will contribute to reducing these accidents. A significant issue is related to the speed of response of the human brain to recognize and or to make decisions in situations of danger. This feature generates a demand for technological solutions aimed at helping people to drive vehicles in several respects. The system hardware was developed in FPGA and divided into interconnected blocks. First, for the pretreatment of the video, was built a block for data conversion from the camera to grayscale, then a simplified block for vertical stabilization dynamic video. To detection, two blocks were built, one for binary motion detection and one for a BLOB detection. To classify, was built one block to identify the size of the object in motion by the proportionality and making the selection. The tests in real environment of this system showed great results for a maximum speed of 30 km / h Circuitos FPGA Computação reconfigurável Sistemas embarcados Circuits FPGA Embedded systems Reconfigurable computing
65	ChipCFlow - Partição e protocolo de comunicação no grafo a fluxo de dados dinâmico / ChipCFlow - partioning and communication protocol in the dynamic dataflow graph Lucas Barbosa Sanches 14 May 2010 (has links) Este trabalho descreve a prova de conceito de uma abordagem que utiliza o modelo de computação a fluxo de dados, inerentemente paralelo, associado ao modelo de computação reconfigurável parcial e dinamicamente, visando à obtenção de sistemas computacionais de alto desempenho. Mais especificamente, trata da obtenção de um modelo para o particionamento dos grafos a fluxo de dados dinâmicos e de um protocolo de comunicação entre suas partes, a fim de permitir a sua implementação em arquiteturas dinamicamente reconfiguráveis, em especial em FGPAs Virtex da Xilinx. Enquadra-se no contexto do projeto ChipCFlow, de escopo mais amplo, que pretende obter uma ferramenta para geração automática de descrição de hardware sintetizável, a partir de código em alto nível, escrito em linguagem C, fazendo uso da abordagem a fluxo de dados para extrair o paralelismo implícito nas aplicações originais. O modelo proposto é aplicado em um grafo a fluxo de dados dinâmico, e através de simulações sua viabilidade é discutida / This work describes the concept of an approach that uses data ow computational model, inherently parallel, associated with de reconfigurable computing model, partial and dynamic, in order to obtain high performance computational systems. More specifically, it is about a model to the partitioning and communication between partitioned sectors of a CDFG (Control Data Flow Graph) in order to map these graphs on a partial reconfiguration FPGA fabric, in special Virtex II/II-Pro from Xilinx. It is part of the ChipCFlow project, that has a bigger scope, and that aims to automatically obtain syntetisable hardware descriptions, from high level code written in C and, by using a data flow approach to extract implicit parallelism in original applications. The model obtained is extensively explained and applied to an example of CDFG, where by means of simulations its feasibility is discussed Computação reconfigurável Máquinas a fluxo de dados particionamento reconfiguração parcial Dataflow machines Partial reconfiguration Partitioning Reconfigurable computing
66	Otimização de memória cache em tempo de execução para o processador embarcado LEON3 / Optimization of cache memory at runtime for embedded processor LEON3 Lucas Albers Cuminato 28 April 2014 (has links) O consumo de energia é uma das questões mais importantes em sistemas embarcados. Estudos demonstram que neste tipo de sistema a cache é responsável por consumir a maior parte da energia fornecida ao processador. Na maioria dos processadores embarcados, os parâmetros de configuração da cache são fixos e não permitem mudanças após sua fabricação/síntese. Entretanto, este não é o cenário ideal, pois a configuração da cache pode não ser adequada para uma determinada aplicação, tendo como consequência menor desempenho na execução e consumo excessivo de energia. Neste contexto, este trabalho apresenta uma implementação em hardware, utilizando computação reconfigurável, capaz de reconfigurar automática, dinâmica e transparentemente a quantidade de ways e por consequência o tamanho da cache de dados do processador embarcado LEON3, de forma que a cache se adeque à aplicação em tempo de execução. Com esta técnica, espera-se melhorar o desempenho das aplicações e reduzir o consumo de energia do sistema. Os resultados dos experimentos demonstram que é possível reduzir em até 5% o consumo de energia das aplicações com degradação de apenas 0.1% de desempenho / Energy consumption is one of the most important issues in embedded systems. Studies have shown that in this type of system the cache consumes most of the power supplied to the processor. In most embedded processors, the cache configuration parameters are fixed and do not allow changes after manufacture/synthesis. However, this is not the ideal scenario, since the configuration of the cache may not be suitable for a particular application, resulting in lower performance and excessive energy consumption. In this context, this project proposes a hardware implementation, using reconfigurable computing, able to reconfigure the parameters of the LEON3 processor\'s cache in run-time improving applications performance and reducing the power consumption of the system. The result of the experiment shows it is possible to reduce the processor\'s power consumption up to 5% with only 0.1% degradation in performance Computação reconfigurável FPGA LEON3 Memória cache Sistemas embarcados Cache memory Embedded systems FPGA LEON3 Reconfigurable computing
67	"Implementação do barramento on-chip AMBA baseada em computação reconfigurável" / Implementation of on-chip AMBA bus based on Reconfigurable Computing Daniel Cruz de Queiroz 04 February 2005 (has links) A computação reconfigurável está se fortalecendo cada vez mais devido ao grande avanço dos dispositivos reprogramáveis e ferramentas de projeto de hardware utilizadas atualmente. Isso possibilita que o desenvolvimento de hardware torne-se bem menos trabalhoso e complicado, facilitando assim a vida do desenvolvedor. A tecnologia utilizada atualmente em projetos de computação reconfigurável é denominada FPGA (Field Programmable Gate Array), que une algumas características tanto de software (flexibilidade), como de hardware (desempenho). Isso fornece um ambiente bastante propício para desenvolvimento de aplicações que precisam de um bom desempenho, sem que estas devam possuir uma configuração definitiva. O objetivo deste trabalho foi implementar um barramento eficiente para possibilitar a comunicação entre diferentes CORES de um robô reconfigurável, que podem estar dispersos em diferentes dispositivos FPGAs. Tal barramento seguirá o padrão AMBA (Advanced Microcontroller Bus Architecture), pertencente à ARM. Todo o desenvolvimento do core completo do AMBA foi realizado utilizando-se a linguagem VHDL (Very High Speed Integrated Circuit Hardware Description Language) e ferramentas EDAs (Electronic Design Automation) apropriadas. É importante notar que, embora o barramento tenha sido projetado para ser utilizado em um robô, o mesmo pode ser usado em qualquer sistema on-chip. / The reconfigurable computing is each time more fortified, what leads to a great advance of reprogrammable devices and hardware design tools. This has become hardware development less laborious and complicated, thus, facilitating the life of the designer. The technology currently used in projects of reconfigurable computing is called FPGA (Field Programmable Gate Array), which combines some characteristics of software (flexibility) and hardware (performance). This technology provides a propitious environment to the development of applications that need a good performance. Those that dont need a definitive configuration. The purpose of this work was to implement an efficient bus to make possible the communication among different modules of a reconfigurable robot. This bus is based on a bus standard called AMBA (Advanced Microcontroller Bus Architecture), which belongs to ARM. All the development of full AMBA core was carried through using VHDL (Very High Speed Integrated Circuit the Hardware Description Language) language and appropriated EDA (Electronic Design Automation) tools. It is important to notice that, even so the bus have been projected to be used in a robot, it could be used in any system on-chip. AMBA barramento on-chip computação reconfigurável FPGA AMBA FPGA on-chip bus reconfigurable computing
68	P2l - Uma ferramenta de profiling a nível de instrução para o processador softcore LEON3 / P2L - A instruction level profiling tool for LEON3 softcore Carlos Roberto Pereira Almeida Júnior 20 May 2016 (has links) A maioria dos sistemas embarcados hoje desenvolvidos utilizam complexos sistemas eletrônicos integrados em um único chip, os Systems-on-a-Chip (SoC). A análise do comportamento de uma aplicação em execução, ou seja, o profiling nesses sistemas não é uma tarefa trivial em virtude da complexidade dos SoCs e pela restrição de ferramentas de profiling adequadas. Neste contexto, este trabalho apresenta o P2L, uma ferramenta de profiling que se baseia em métricas de nível de instrução e função para o processador LEON3. O P2L fornece estatísticas detalhadas de uso do processador, memórias e barramento de programas em execução sem uso de instrumentação. A ferramenta é composta por um componente em hardware e drivers e aplicativos em software. Os resultados mostram que o P2L fornece medidas com erro inferior a 1% e overhead desprezível quando comparado ao tempo de execução nativa do programa e ao do profiler GNU gprof. / Most embedded systems developed today use complex electronic systems integrated into a single chip, the Systems-on-a-Chip (SoC). The analysis of the behavior of a running application or profiling in these systems is not a trivial task due to the complexity of the SoC and the restriction of appropriate profiling tools. In this context, this work presents P2L - a profiling tool that is based on instruction and function level metrics for the LEON3 processor. P2L provides detailed usage statistics of the processor, memories, and bus of running programs without the use of instrumentation. The tool consists of a component in hardware, drivers and applications software. The results show that P2L provides measures with an error less than 1% and negligible overhead compared to native runtime program and the GNU profiler gprof. Computação reconfigurável Desempenho FPGA LEON3 Profiling FPGA LEON3 Performance Profiling Reconfigurable computing
69	High-performance communication infrastructure design on FPGA-centric clusters Yang, Chen 29 September 2019 (has links) FPGA-Centric Clusters (FCCs) with the FPGAs directly linked through their Multi-Gigabit Transceivers (MGTs) have a proven advantage over other commodity architectures for communication bound applications. To date, however, communication infrastructure for such clusters has generally only taken one of two simple approaches: nearest-neighbor-only, which is fast but of limited utility, and processor-based, which is general but slow. The overall problem addressed in this dissertation is the architecture, design, and implementation of communication networks for FCCs. These network designs should take advantage of the decades of design experience in networks for High-Performance Computing (HPC) clusters, but should also account for, and take advantage of, unique characteristics of FCCs, in particular, the configurability of the FPGAs themselves. This dissertation has seven parts. We begin with in-depth implementations of two model applications, Directional Dark Matter (DM) Detection, and Molecular Dynamics (MD). These implementations expose the necessary characteristics of FCC networks from physical through application layers. The second is the systematic exploration of communication microarchitecture for FCCs, as has been done previously for HPC clusters and for Networks on Chips (NoCs) on both FPGAs and ASICs. One outcome of this part is to find the properties of FCCs that substantially influence the router design space. Another outcome is to create a selection of candidate routers and generalize it so that it is parameterized by routing algorithm, arbitration policy, number of virtual channels (VCs), and other parameters. The third part is to use the proposed application-aware framework to evaluate the resulting design space with respect to a number of common communication patterns and packet sizes. The results from this part enable two sets of designs. One is the selection of an optimal router for a given resource budget that accounts for all the workloads. The other is to take advantage of FPGA reconfigurability to select the optimal router accounting for both resource budget and a particular workload. The fourth part is to evaluate the advantages of this approach of adapting the router design to the application. We find that the optimality of the router design varies significantly with workloads. We observe that compared with the router configuration with the best average performance, application-aware router selection can lead to substantial improvement in performance or reduction in resources required. The fifth part is application-specific optimizations in which we develop several modules and functional units that can provide specific optimizations for certain types of communication workloads depending on the application it going to serve. The sixth part explores topology emulation, e.g., when a three-dimensional network is used in the computation of an application that is logically two dimensional. We propose a generalized fold-and-cut mechanism that both preserves the locality in logical mapping, while also making use of the extra links provided by our 3D-torus fixture. The seventh part is a table-based static-scheduled router for applications with a static or persistent communication pattern. The router supports various cases, including unicast, multicast, and reduction. By making routing decisions a priori, we can bring better load-balance to network links and reduce congestion. Computer engineering Communication Dark matter detection FPGA-cluster High-performance computing Molecular dynamics Reconfigurable computing
70	High Performance Applications on Reconfigurable Clusters Nakad, Zahi Samir 14 November 2000 (has links) Many problems faced in the engineering world are computationally intensive. Filtering using FIR (Finite Impulse Response) filters is an example to that. This thesis discusses the implementation of a fast, reconfigurable, and scalable FIR (Finite Impulse Response) digital filter. Constant coefficient multipliers and a Fast FIFO implementation are also discussed in connection with the FIR filter. This filter is used in two of its structures: the direct-form and the lattice structure. The thesis describes several configurations that can be created with the different components available and reports the testing results of these configurations. / Master of Science FIR filters Field programmable gate arrays reconfigurable computing constant coefficient multipliers

Search results