• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 4
  • 3
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 15
  • 15
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Estratégias de teste aplicadas à rede de interconexão de FPGAS

Pereira, Igor Gadelha 26 February 2014 (has links)
Made available in DSpace on 2015-05-08T14:57:17Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 3191819 bytes, checksum: 142a338f10b5b1c73f589237be4728c2 (MD5) Previous issue date: 2014-02-26 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / This work aims to carry out an analysis of the main existing testing strategies for FPGA, and propose a new strategy applied to the interconnection network of the Xilinx Spartan 3E FPGA based on linear feedback shift register synthesized by Berlekamp Massey Algorithm that can accurately localize the failure. For this, we used softwares from Xilinx manufacturer (specifically, XDL and FPGA_editor) to determine the FPGA based configuration and than create a new proposal and evaluate their employability. As a result of the proposed strategy, it was possible to route 7 WUTs (Wires Under Test) of total of 8 for the FPGA under investigation. Thus, it was necessary 24 test configurations to test and locate the failure on all hexlines and doublelines. The results show that this strategy is able to test 7 WUTs at a time and needs 24 test configurations to test and diagnose precisely the failure location. / Este trabalho objetiva realizar uma análise das principais estratégias de teste já existentes para FPGA, e propor uma nova estratégia aplicada à rede de interconexão do FPGA Xilinx Spartan 3E baseada em registradores de deslocamento com realimentação linear sintetizável pelo Algoritmo de Berlekamp-Massey e que possa diagnósticar com precisão o local da falha. Para isso, foram utilizados softwares da fabricante de FPGAs Xilinx (especificamente, XDL e FPGA_editor) para determinar precisamente a configuração do FPGA e, assim, criar uma nova proposta e avaliar sua empregabilidade. Como resultado, a partir da estratégia adotada foi possível rotear 7 WUTs (Wires Under Test) em um total de 8 para o FPGA em questão. Sendo assim, foram necessárias 24 configurações de teste para testar e diagnósticar todas as linhas do tipo HexLine e DoubleLine. Os resultados obtidos mostram que a estratégia proposta é capaz de testar 7 WUTs por vez e necessita de 24 configurações para testar e diagnósticar precisamente o local da falha na rede de interconexão.
12

Flexible Constraint Length Viterbi Decoders On Large Wire-area Interconnection Topologies

Garga, Ganesh 07 1900 (has links)
To achieve the goal of efficient ”anytime, anywhere” communication, it is essential to develop mobile devices which can efficiently support multiple wireless communication standards. Also, in order to efficiently accommodate the further evolution of these standards, it should be possible to modify/upgrade the operation of the mobile devices without having to recall previously deployed devices. This is achievable if as much functionality of the mobile device as possible is provided through software. A mobile device which fits this description is called a Software Defined Radio (SDR). Reconfigurable hardware-based solutions are an attractive option for realizing SDRs as they can potentially provide a favourable combination of the flexibility of a DSP or a GPP and the efficiency of an ASIC. The work presented in this thesis discusses the development of efficient reconfigurable hardware for one of the most energy-intensive functionalities in the mobile device, namely, Forward Error Correction (FEC). FEC is required in order to achieve reliable transfer of information at minimal transmit power levels. FEC is achieved by encoding the information in a process called channel coding. Previous studies have shown that the FEC unit accounts for around 40% of the total energy consumption of the mobile unit. In addition, modern wireless standards also place the additional requirement of flexibility on the FEC unit. Thus, the FEC unit of the mobile device represents a considerable amount of computing ability that needs to be accommodated into a very small power, area and energy budget. Two channel coding techniques have found widespread use in most modern wireless standards -namely convolutional coding and turbo coding. The Viterbi algorithm is most widely used for decoding convolutionally encoded sequences. It is possible to use this algorithm iteratively in order to decode turbo codes. Hence, this thesis specifically focusses on developing architectures for flexible Viterbi decoders. Chapter 2 provides a description of the Viterbi and turbo decoding techniques. The flexibility requirements placed on the Viterbi decoder by modern standards can be divided into two types -code rate flexibility and constraint length flexibility. The code rate dictates the number of received bits which are handled together as a symbol at the receiver. Hence, code rate flexibility needs to be built into the basic computing units which are used to implement the Viterbi algorithm. The constraint length dictates the number of computations required per received symbol as well as the manner of transfer of results between these computations. Hence, assuming that multiple processing units are used to perform the required computations, supporting constraint length flexibility necessitates changes in the interconnection network connecting the computing units. A constraint length K Viterbi decoder needs 2K−1computations to be performed per received symbol. The results of the computations are exchanged among the computing units in order to prepare for the next received symbol. The communication pattern according to which these results are exchanged forms a graph called a de Bruijn graph, with 2K−1nodes. This implies that providing constraint length flexibility requires being able to realize de Bruijn graphs of various sizes on the interconnection network connecting the processing units. This thesis focusses on providing constraint length flexibility in an efficient manner. Quite clearly, the topology employed for interconnecting the processing units has a huge effect on the efficiency with which multiple constraint lengths can be supported. This thesis aims to explore the usefulness of interconnection topologies similar to the de Bruijn graph, for building constraint length flexible Viterbi decoders. Five different topologies have been considered in this thesis, which can be discussed under two different headings, as done below: De Bruijn network-based architectures The interconnection network that is of chief interest in this thesis is the de Bruijn interconnection network itself, as it is identical to the communication pattern for a Viterbi decoder of a given constraint length. The problem of realizing flexible constraint length Viterbi decoders using a de Bruijn network has been approached in two different ways. The first is an embedding-theoretic approach where the problem of supporting multiple constraint lengths on a de Bruijn network is seen as a problem of embedding smaller sized de Bruijn graphs on a larger de Bruijn graph. Mathematical manipulations are presented to show that this embedding can generally be accomplished with a maximum dilation of, where N is the number of computing nodes in the physical network, while simultaneously avoiding any congestion of the physical links. In this case, however, the mapping of the decoder states onto the processing nodes is assumed fixed. Another scheme is derived based on a variable assignment of decoder states onto computing nodes, which turns out to be more efficient than the embedding-based approach. For this scheme, the maximum number of cycles per stage is found to be limited to 2 irrespective of the maximum contraint length to be supported. In addition, it is also found to be possible to execute multiple smaller decoders in parallel on the physical network, for smaller constraint lengths. Consequently, post logic-synthesis, this architecture is found to be more area-efficient than the architecture based on the embedding theoretic approach. It is also a more efficiently scalable architecture. Alternative architectures There are several interconnection topologies which are closely connected to the de Bruijn graph, and hence could form attractive alternatives for realizing flexbile constraint length Viterbi decoders. We consider two more topologies from this class -namely, the shuffle-exchange network and the flattened butterfly network. The variable state assignment scheme developed for the de Bruijn network is found to be directly applicable to the shuffle-exchange network. The average number of clock cycles per stage is found to be limited to 4 in this case. This is again independent of the constraint length to be supported. On the flattened butterfly (which is actually identical to the hypercube), a state scheduling scheme similar to that of bitonic sorting is used. This architecture is found to offer the ideal throughput of one decoded bit every clock cycle, for any constraint length. For comparison with a more general purpose topology, we consider a flexible constraint length Viterbi decoder architecture based on a 2D-mesh, which is a popular choice for general purpose applications, as well as many signal processing applications. The state scheduling scheme used here is also similar to that used for bitonic sorting on a mesh. All the alternative architectures are capable of executing multiple smaller decoders in parallel on the larger interconnection network. Inferences Following logic synthesis and power estimation, it is found that the de Bruijn network-based architecture with the variable state assignment scheme yields the lowest (area)−(time) product, while the flattened butterfly network-based architecture yields the lowest (area) - (time)2product. This means, that the de Bruijn network-based architecture is the best choice for moderate throughput applications, while the flattened butterfly network-based architecture is the best choice for high throughput applications. However, as the flattened butterfly network is less scalable in terms of size compared to the de Bruijn network, it can be concluded that among the architectures considered in this thesis, the de Bruijn network-based architecture with the variable state assignment scheme is overall an attractive choice for realizing flexible constraint length Viterbi decoders.
13

Arquitetura de uma rede de interconexão com memória compartilhada baseada na topologia crossbar / Architecture of an interconnection network with shared memory based on the topology crossbar.

Fábio Gonçalves Pessanha 22 March 2013 (has links)
Multi-Processor System-on-Chip (MPSoC) possui vários processadores, em um único chip. Várias aplicações podem ser executadas de maneira paralela ou uma aplicação paralelizável pode ser particionada e alocada em cada processador, a fim de acelerar a sua execução. Um problema em MPSoCs é a comunicação entre os processadores, necessária para a execução destas aplicações. Neste trabalho, propomos uma arquitetura de rede de interconexão baseada na topologia crossbar, com memória compartilhada. Esta arquitetura é parametrizável, possuindo N processadores e N módulos de memórias. A troca de informação entre os processadores é feita via memória compartilhada. Neste tipo de implementação cada processador executa a sua aplicação em seu próprio módulo de memória. Através da rede, todos os processadores têm completo acesso a seus módulos de memória simultaneamente, permitindo que cada aplicação seja executada concorrentemente. Além disso, um processador pode acessar outros módulos de memória, sempre que necessite obter dados gerados por outro processador. A arquitetura proposta é modelada em VHDL e seu desempenho é analisado através da execução paralela de uma aplicação, em comparação à sua respectiva execução sequencial. A aplicação escolhida consiste na otimização de funções objetivo através do método de Otimização por Enxame de Partículas (Particle Swarm Optimization - PSO). Neste método, um enxame de partículas é distribuído igualmente entre os processadores da rede e, ao final de cada interação, um processador acessa o módulo de memória de outro processador, a fim de obter a melhor posição encontrada pelo enxame alocado neste. A comunicação entre processadores é baseada em três estratégias: anel, vizinhança e broadcast. Essa aplicação foi escolhida por ser computacionalmente intensiva e, dessa forma, uma forte candidata a paralelização. / Multi-Processor System-on-Chip (MPSoC) has multiple processors in a single chip. Multiple applications can be executed in parallel or a parallelizable application can be partitioned and allocated to each processor in order to accelerate their execution. One problem in MPSoCs is the communication between the processors required to implement these applications. In this work, we propose the architecture of an interconnection network based on the crossbar topology, with shared memory. This architecture is parameterizable, having N processors and N memory modules. The exchange of information between processors is done via shared memory. In this type of implementation each processor executes its application stored in its own memory module. Through the network, all processors have complete access to their own memory modules simultaneously allowing each application to run concurrently. Moreover, a processor can access other memory modules, whenever it needs to retrieve data generated by another processor. The proposed architecture is modelled in VHDL and its performance is analysed by the execution of a parallel aplication, in comparison to its sequencial one. The chosen application consists of optimizing some objetive functions by using the Particle Swarm Optimization method. In this method, particles of a swarm are distributed among the processors and, at the end of each iteration, a processor accesses the memory module of another one in order to obtain the best position found in the swarm. The communication between processors is based on three strategies: ring, neighbourhood and broadcast. This application was chosen due to its computational intensive characteristic and, therefore, a strong candidate for parallelization.
14

Arquitetura de uma rede de interconexão com memória compartilhada baseada na topologia crossbar / Architecture of an interconnection network with shared memory based on the topology crossbar.

Fábio Gonçalves Pessanha 22 March 2013 (has links)
Multi-Processor System-on-Chip (MPSoC) possui vários processadores, em um único chip. Várias aplicações podem ser executadas de maneira paralela ou uma aplicação paralelizável pode ser particionada e alocada em cada processador, a fim de acelerar a sua execução. Um problema em MPSoCs é a comunicação entre os processadores, necessária para a execução destas aplicações. Neste trabalho, propomos uma arquitetura de rede de interconexão baseada na topologia crossbar, com memória compartilhada. Esta arquitetura é parametrizável, possuindo N processadores e N módulos de memórias. A troca de informação entre os processadores é feita via memória compartilhada. Neste tipo de implementação cada processador executa a sua aplicação em seu próprio módulo de memória. Através da rede, todos os processadores têm completo acesso a seus módulos de memória simultaneamente, permitindo que cada aplicação seja executada concorrentemente. Além disso, um processador pode acessar outros módulos de memória, sempre que necessite obter dados gerados por outro processador. A arquitetura proposta é modelada em VHDL e seu desempenho é analisado através da execução paralela de uma aplicação, em comparação à sua respectiva execução sequencial. A aplicação escolhida consiste na otimização de funções objetivo através do método de Otimização por Enxame de Partículas (Particle Swarm Optimization - PSO). Neste método, um enxame de partículas é distribuído igualmente entre os processadores da rede e, ao final de cada interação, um processador acessa o módulo de memória de outro processador, a fim de obter a melhor posição encontrada pelo enxame alocado neste. A comunicação entre processadores é baseada em três estratégias: anel, vizinhança e broadcast. Essa aplicação foi escolhida por ser computacionalmente intensiva e, dessa forma, uma forte candidata a paralelização. / Multi-Processor System-on-Chip (MPSoC) has multiple processors in a single chip. Multiple applications can be executed in parallel or a parallelizable application can be partitioned and allocated to each processor in order to accelerate their execution. One problem in MPSoCs is the communication between the processors required to implement these applications. In this work, we propose the architecture of an interconnection network based on the crossbar topology, with shared memory. This architecture is parameterizable, having N processors and N memory modules. The exchange of information between processors is done via shared memory. In this type of implementation each processor executes its application stored in its own memory module. Through the network, all processors have complete access to their own memory modules simultaneously allowing each application to run concurrently. Moreover, a processor can access other memory modules, whenever it needs to retrieve data generated by another processor. The proposed architecture is modelled in VHDL and its performance is analysed by the execution of a parallel aplication, in comparison to its sequencial one. The chosen application consists of optimizing some objetive functions by using the Particle Swarm Optimization method. In this method, particles of a swarm are distributed among the processors and, at the end of each iteration, a processor accesses the memory module of another one in order to obtain the best position found in the swarm. The communication between processors is based on three strategies: ring, neighbourhood and broadcast. This application was chosen due to its computational intensive characteristic and, therefore, a strong candidate for parallelization.
15

Evoluční návrh kolektivních komunikací akcelerovaný pomocí GPU / Evolutionary Design of Collective Communications Accelerated by GPUs

Tyrala, Radek January 2012 (has links)
This thesis provides an analysis of the application for evolutionary scheduling of collective communications. It proposes possible ways to accelerate the application using general purpose computing on graphics processing units (GPU). This work offers a theoretical overview of systems on a chip, collective communications scheduling and more detailed description of evolutionary algorithms. Further, the work provides a description of the GPU architecture and its memory hierarchy using the OpenCL memory model. Based on the profiling, the work defines a concept for parallel execution of the fitness function. Furthermore, an estimation of the possible level of acceleration is presented. The process of implementation is described with a closer insight into the optimization process. Another important point consists in comparison of the original CPU-based solution and the massively parallel GPU version. As the final point, the thesis proposes distribution of the computation among different devices supported by OpenCL standard. In the conclusion are discussed further advantages, constraints and possibilities of acceleration using distribution on heterogenous computing systems.

Page generated in 0.1328 seconds