• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 21
  • 21
  • 12
  • 12
  • 12
  • 12
  • 7
  • 7
  • 5
  • 4
  • 3
  • 3
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

VLSI Implementation of a Run-time Configurable Computing Integrated Circuit - The Stallion Chip

He, Yingchun 22 July 1998 (has links)
Reconfigurable computing architectures are gaining popularity as a replacement for general-purpose architectures for many high performance embedded applications. These machines support parallel computation and direct the data from the producers of an intermediate result to the consumers over custom pathways. The Wormhole Run-time Reconfigurable (RTR) computing architecture is a concept developed at Virginia Tech to address the weaknesses of contemporary FPGAs for configurable computing. The Stallion chip is a full-custom configurable computing "FPGA"-like integrated circuit with a coarse grained nature. Based on the result of the first generation device, the Colt chip, the Stallion chip is a follow-up configurable computing chip. This thesis focuses on the VLSI layout implementation of the Stallion chip. Effort has been made to explain many facts and advantages of the Wormhole Configurable Computing Machine (CCM). Design techniques, strategies, circuit characterization, performance estimation, and ways to solve problems when using CAD layout design tools are illustrated. / Master of Science
2

The Design and Implementation of a Spatial Partitioner for use in a Runtime Reconfigurable System

Moye, Charles David 12 August 1999 (has links)
Microprocessors have difficulties addressing the demands of today's high-performance embedded applications. ASICs are a good solution to the speed concerns, but their cost and time to market can make them impractical for some needs. Configurable Computing Machines (CCMs) provide a cost-effective way of creating custom components; however, oftentimes it would be better if there were a way to change the configuration of the CCM as a program is executing. An efficient way of doing this is with Runtime Reconfigurable (RTR) computing architectures. In an RTR system, one challenging problem is the assignment of operators onto the array of processing elements (PEs) in a way as to simultaneously minimize both the number of PEs used and the number of interconnections between them for each configuration. This job is automated through the use of a software program referred to as the Spatial Partitioner. The design and implementation of the Spatial Partitioner is the subject of this work. The Spatial Partitioner developed herein uses an iterative, recursive algorithm along with cluster refinement to find a reasonably efficient allocation of operators onto the target platform in a reasonable amount of time. Information about the topology of the target platform is used throughout the execution of the algorithm to ensure that the resulting solution is legal in terms of layout. / Master of Science
3

Using Workload Characterization to Guide High Performance Graph Processing

Hassan, Mohamed Wasfy Abdelfattah 24 May 2021 (has links)
Graph analytics represent an important application domain widely used in many fields such as web graphs, social networks, and Bayesian networks. The sheer size of the graph data sets combined with the irregular nature of the underlying problem pose a significant challenge for performance, scalability, and power efficiency of graph processing. With the exponential growth of the size of graph datasets, there is an ever-growing need for faster more power efficient graph solvers. The computational needs of graph processing can take advantage of the FPGAs' power efficiency and customizable architecture paired with CPUs' general purpose processing power and sophisticated cache policies. CPU-FPGA hybrid systems have the potential for supporting performant and scalable graph solvers if both devices can work coherently to make up for each other's deficits. This study aims to optimize graph processing on heterogeneous systems through interdisciplinary research that would impact both the graph processing community, and the FPGA/heterogeneous computing community. On one hand, this research explores how to harness the computational power of FPGAs and how to cooperatively work in a CPU-FPGA hybrid system. On the other hand, graph applications have a data-driven execution profile; hence, this study explores how to take advantage of information about the graph input properties to optimize the performance of graph solvers. The introduction of High Level Synthesis (HLS) tools allowed FPGAs to be accessible to the masses but they are yet to be performant and efficient, especially in the case of irregular graph applications. Therefore, this dissertation proposes automated frameworks to help integrate FPGAs into mainstream computing. This is achieved by first exploring the optimization space of HLS-FPGA designs, then devising a domain-specific performance model that is used to build an automated framework to guide the optimization process. Moreover, the architectural strengths of both CPUs and FPGAs are exploited to maximize graph processing performance via an automated framework for workload distribution on the available hardware resources. / Doctor of Philosophy / Graph processing is a very important application domain, which is emphasized by the fact that many real-world problems can be represented as graph applications. For instance, looking at the internet, web pages can be represented as the graph vertices while hyper links between them represent the edges. Analyzing these types of graphs is used for web search engines, ranking websites, and network analysis among other uses. However, graph processing is computationally demanding and very challenging to optimize. This is due to the irregular nature of graph problems, which can be characterized by frequent indirect memory accesses. Such a memory access pattern is dependent on the data input and impossible to predict, which renders CPUs' sophisticated caching policies useless to performance. With the rise of heterogeneous computing that enabled using hardware accelerators, a new research area was born, attempting to maximize performance by utilizing the available hardware devices in a heterogeneous ecosystem. This dissertation aims to improve the efficiency of utilizing such heterogeneous systems when targeting graph applications. More specifically, this research focuses on the collaboration of CPUs and FPGAs (Field Programmable Gate Arrays) in a CPU-FPGA hybrid system. Innovative ideas are presented to exploit the strengths of each available device in such a heterogeneous system, as well as addressing some of the inherent challenges of graph processing. Automated frameworks are introduced to efficiently utilize the FPGA devices, in addition to distributing and scheduling the workload across multiple devices to maximize the performance of graph applications.
4

Unstructured Finite Element Computations on Configurable Computers

Ramachandran, Karthik 18 August 1998 (has links)
Scientific solutions to physical problems are computationally intensive. With the increasing emphasis in the area of Custom Computing Machines, many physical problems are being solved using configurable computers. The Finite Element Method (FEM) is an efficient way of solving physical problems such as heat equations, stress analysis and two- and three-dimensional Poisson's equations. This thesis presents the solution to physical problems using the FEM on a configurable platform. The core computational unit in an iterative solution to the FEM, the matrix-by-vector multiplication, is developed in this thesis along with the framework necessary for implementing the FEM solution. The solutions for 2-D and 3-D Poisson's equations are implemented with the use of an adaptive mesh refinement method. The dominant computation in the method is matrix-by-vector multiplication and is performed on the Wildforce board, a configurable platform. The matrix-by-vector multiplication units developed in this thesis are basic mathematical units implemented on a configurable platform and can be used to accelerate any mathematical solution that involves such an operation. / Master of Science
5

Implementation of a Turbo Decoder on a Configurable Computing Platform

Hess, Jason Richard 22 September 1999 (has links)
Turbo codes are a new class of codes that can achieve exceptional error performance and energy efficiency at low signal-to-noise ratios. Decoding turbo codes is a complicated procedure that often requires custom hardware if it is to be performed at acceptable speeds. Configurable computing machines are able to provide the performance advantages of custom hardware while maintaining the flexibility of general-purpose microprocessors and DSPs. This thesis presents an implementation of a turbo decoder on an FPGA-based configurable computing platform. Portability and flexibility are emphasized in the implementation so that the decoder can be used as part of a configurable software radio. The system presented performs turbo decoding for a variable block size with a variable number of decoding iterations while using only a single FPGA. When six iterations are performed, the decoder operates at an information bit rate greater than 32 kbps. / Master of Science
6

A Genetic Algorithm-Based Place-and-Route Compiler For A Run-time Reconfigurable Computing System

Kahne, Brian C. 14 May 1997 (has links)
Configurable Computing is a technology which attempts to increase computational power by customizing the computational platform to the specific problem at hand. An experimental computing model known as wormhole run-time reconfiguration allows for partial reconfiguration and is highly scalable. In this approach, configuration information and data are grouped together in a computing unit called a stream, which can tunnel through the chip creating a series of interconnected pipelines. The Colt/Stallion project at Virginia Tech implements this computing model into integrated circuits. In order to create applications for this platform, a compiler is needed which can convert a human readable description of an algorithm into the sequences of configuration information understood by the chip itself. This thesis covers two compilers which perform this task. The first compiler, Tier1, requires a programmer to explicitly describe placement and routing inside of the chip. This could be considered equivalent to an assembler for a traditional microprocessor. The second compiler, Tier2, allows the user to express a problem as a dataflow graph. Actual placing and routing of this graph onto the physical hardware is taken care of through the use of a genetic algorithm. A description of the two languages is presented, followed by example applications. In addition, experimental results are included which examine the behavior of the genetic algorithm and how alterations to various genetic operator probabilities affects performance. / Master of Science
7

Matching Genetic Sequences in Distributed Adaptive Computing Systems

Worek, William J. 22 August 2002 (has links)
Distributed adaptive computing systems (ACS) allow developers to design applications using multiple programmable devices. The ACS API, an API created for distributed adaptive com-puting, gives developers the ability to design scalable ACS systems in a cluster networking environment for large applications. One such application, found in the field of bioinformatics, is the DNA sequence alignment problem. This thesis presents a runtime reconfigurable FPGA implementation of the Smith-Waterman similarity comparison algorithm. Additionally, this thesis presents tools designed for the ACS API that assist developers creating applications in a heterogeneous distributed adaptive computing environment. / Master of Science
8

Design and Verification of SOPC FDP2009 and Research of Reconfigurable Applications

Zhang, Fanjiong January 2011 (has links)
In recent years, reconfigurable devices are developing fast because of its flexibility and less development cost. But intrinsic shortcomings of reconfigurable devices, for example, high power, low speed, etc. induce difficulties in complex designs realizations. So people began to consider combination of ASIC (Application-Specific Integrated Circuit) and reconfigurable device on a single chip, which is SOPC (System on Programmable Chip). SOPC can not only decrease development risk and timing to market, but also be used in different applications, especially of products that keep varying, for example, communication and network products. Dynamically reconfiguration means reconfigurable device of the chip can be reconfigured repeatable, and performs different functions at different times. Compared with static reconfiguration, dynamic reconfiguration can use the reconfigurable device more thoroughly. It‟s a hot spot of research in the world, especially in reconfigurable computing. This paper mainly concludes my research work in reconfigurable SOPC in 3 major parts: hardware, software and application. The following works and innovations are completed: 1. SOPC hardware system architecture design and discussion. Helps to define the system architecture and design goals. The design of EBI controller which is used in the SOPC. The integration of the blocks in the system. 2. The building-up of the SOPC system-level verification and block-level verification environment. The set-up of the hardware-software co-simulation environment. The post-layout simulation and formal verification tasks. We propose an innovative automated regression system. The system helps to achieve the same simulation coverage (95%) and the total simulation time is reduced by approximately 30%. 3. SOPC software design, including the OS kernel porting, drivers design and application design. The design of the PowerPC initialization program and UART (Universal Asynchronous Receiver/Transmitter), reconfiguring communication driver programs. Writing the test-cases which are specialized for the system verification and hardware testing. 4. Being the co-designer of the novel bus macro based on the FDP reconfigurable logic core. And we realize the whole reconfigurable system based on this bus macro. 5. The reconfigurable application research based on Reconfigurable Logic Core. The reconfigurable image filter designed implemented on FDP300K Reconfigurable Logic Core device. Using self-design Reconfigurable Logic Core internal bus macro to implement the partial reconfigurable system. The test results showed that the reconfigurable filter has the feature of fast configuration speed and good output image quality.
9

Developing an Automated Explosives Detection Prototype Based on the AS&E 101ZZ System

Arvanitis, Panagiotis Jason 07 October 1997 (has links)
This thesis describes the development of a multi-sensor, multi-energy x-ray prototype for automated explosives detection. The system is based on the American Science and Engineering model 101ZZ x-ray system. The 101ZZ unit received was an early model and lacked documentation of the many specialized electronic components. X-ray image quality was poor. The system was significantly modified and almost all AS&E system electronics bypassed: the x-ray source controller and conveyor belt motor were made computer controllable; the x-ray detectors were re-positioned to provide forward scatter detection capabilities; new hardware was developed to interface to the AS&E pre-amplifier boards, to collect image data from all three x-ray detectors, and to transfer the data to a personal computer. This hardware, the Differential Pair Interface Board (DPIB), is based on a Field Programmable Gate Array (FPGA) and can be dynamically re-configured to serve as a general purpose data collection device in a variety of applications. Software was also developed for the prototype system. A Windows NT device driver was written for the DPIB and a custom bus master DMA collection device. These drivers are portable and can be used as a basis for the development of other Windows NT drivers. A graphical user interface (GUI) was also developed. The GUI automates the data collection tasks and controls all the prototype system components. It interfaces with the image processing software for explosives detection and displays the results. Suspicious areas are color coded and presented to the operator for further examination. / Master of Science
10

A Design Assembly Technique for FPGA Back-End Acceleration

Frangieh, Tannous 19 October 2012 (has links)
Long wait times constitute a bottleneck limiting the number of compilation runs performed in a day, thus risking to restrict Field-Programmable Gate Array (FPGA) adaptation in modern computing platforms. This work presents an FPGA development paradigm that exploits logic variance and hierarchy as a means to increase FPGA productivity. The practical tasks of logic partitioning, placement and routing are examined and a resulting assembly framework, Quick Flow (qFlow), is implemented. Experiments show up to 10x speed-ups using the proposed paradigm compared to vendor tool flows. / Ph. D.

Page generated in 0.0671 seconds