• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 15
  • 8
  • 4
  • 3
  • 3
  • Tagged with
  • 33
  • 33
  • 10
  • 9
  • 9
  • 7
  • 7
  • 6
  • 6
  • 6
  • 5
  • 5
  • 5
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Adaptive Coarse-grain Reconfigurable Protocol Processing Architecture

Badawi, Mohammad January 2016 (has links)
Digital signal processors and their variants have provided significant benefit to efficient implementation of Physical Layer (PHY) of Open Systems Interconnection (OSI) model’s seven-layer protocol processing stack compared to the general purpose processors. Protocol processors promise to provide a similar advantage for implementing higher layers in the (OSI)'s seven-layer model. This thesis addresses the problem of designing customizable coarse-grain reconfigurable protocol processing fabrics as a solution to achieving high performance and computational efficiency. A key requirement that this thesis addresses is the ability to not only adapt to varying applications and standards, and different modes in each standard but also to time varying load and performance demands while maintaining quality of service.This thesis presents a tile-based multicore protocol processing architecture that can be customized at design time to meet the requirements of the target application. The architecture can then be reconfigured at boot time and tuned to suit the desired use-case. This architecture includes a packet-oriented memory system that has deterministic access time and access energy costs, and hence can be accurately dimensioned to fulfill the requirements of the desired use-case. Moreover, to maintain quality of service as predicted, while minimizing the use of energy and resources, this architecture encompasses an elastic management scheme that controls run-time configuration to deploy processing resources based on use-case and traffic demands.To evaluate the architecture presented in this thesis, different case studies were conducted while quantitative and qualitative metrics were used for assessment. Energy-delay product, energy efficiency, area efficiency and throughput show the improvements that were achieved using the processing cores and the memory of the presented architecture, compared with other solutions. Furthermore, the results show the reduction in latency and power consumption required to evaluate controlling states when using the elastic management scheme. The elasticity of the scheme also resulted in reducing the total area required for the controllers that serve multiple processing cores in comparison with other designs. Finally, the results validate the ability of the presented architecture to support quality of service without misutilizing available energy during a real-life case study of a multi-participant Voice Over Internet Protocol (VOIP) call. / <p>QC 20161028</p>
2

A High-performance Architecture for Training Viola-Jones Object Detectors

Lo, Charles 20 November 2012 (has links)
The object detection framework developed by Viola and Jones has become very popular due to its high quality and detection speed. However, the complexity of the computation required to train a detector makes it difficult to develop and test potential improvements to this algorithm or train detectors in the field. In this thesis, a configurable, high-performance FPGA architecture is presented to accelerate this training process. The architecture, structured as a systolic array of pipelined compute engines, is constructed to provide high throughput and make efficient use of the available external memory bandwidth. Extensions to the Viola-Jones detection framework are implemented to demonstrate the flexibility of the architecture. The design is implemented on a Xilinx ML605 development platform running at 200~MHz and obtains a 15-fold speed-up over a multi-threaded OpenCV implementation running on a high-end processor.
3

A High-performance Architecture for Training Viola-Jones Object Detectors

Lo, Charles 20 November 2012 (has links)
The object detection framework developed by Viola and Jones has become very popular due to its high quality and detection speed. However, the complexity of the computation required to train a detector makes it difficult to develop and test potential improvements to this algorithm or train detectors in the field. In this thesis, a configurable, high-performance FPGA architecture is presented to accelerate this training process. The architecture, structured as a systolic array of pipelined compute engines, is constructed to provide high throughput and make efficient use of the available external memory bandwidth. Extensions to the Viola-Jones detection framework are implemented to demonstrate the flexibility of the architecture. The design is implemented on a Xilinx ML605 development platform running at 200~MHz and obtains a 15-fold speed-up over a multi-threaded OpenCV implementation running on a high-end processor.
4

FleXilicon: a New Coarse-grained Reconfigurable Architecture for Multimedia and Wireless Communications

Lee, Jong-Suk Mark 23 March 2010 (has links)
High computing power and flexibility are important design factors for multimedia and wireless communication applications due to the demand for high quality services and frequent evolution of standards. The ASIC (Application Specific Integrated Circuit) approach provides an area efficient, high performance solution, but is inflexible. In contrast, the general purpose processor approach is flexible, but often fails to provide sufficient computing power. Reconfigurable architectures, which have been introduced as a compromise between the two extreme solutions, have been applied successfully for multimedia and wireless communication applications. In this thesis, we investigated a new coarse-grained reconfigurable architecture called FleXilicon which is designed to execute critical loops efficiently, and is embedded in an SOC with a host processor. FleXilicon improves resource utilization and achieves a high degree of loop level parallelism (LLP). The proposed architecture aims to mitigate major shortcomings with existing architectures through adoption of three schemes, (i) wider memory bandwidth, (ii) adoption of a reconfigurable controller, and (iii) flexible wordlength support. Increased memory bandwidth satisfies memory access requirement in LLP execution. New design of reconfigurable controller minimizes overhead in reconfiguration and improves area efficiency and reconfiguration overhead. Flexible word-length support improves LLP by increasing the number of processing elements executable. The simulation results indicate that FleXilicon reduces the number of clock cycles and increases the speed for all five applications simulated. The speedup ratios compared with conventional architectures are as large as two orders of magnitude for some applications. VLSI implementation of FleXilicon in 65 nm CMOS process indicates that the proposed architecture can operate at a high frequency up to 1 GHz with moderate silicon area. / Ph. D.
5

A High-performance, Reconfigurable Architecture for Restricted Boltzmann Machines

Ly, Daniel Le 15 February 2010 (has links)
Despite the popularity and success of neural networks in research, the number of resulting commercial or industrial applications have been limited. A primary cause of this lack of adoption is due to the fact that neural networks are usually implemented as software running on general-purpose processors. Hence, a hardware implementation that can take advantage of the inherent parallelism in neural networks is desired. This thesis investigates how the Restricted Boltzmann machine, a popular type of neural network, can be effectively mapped to a high-performance hardware architecture on FPGA platforms. The proposed, modular framework is designed to reduce the time complexity of the computations through heavily customized hardware engines. The framework is tested on a platform of four Xilinx Virtex II-Pro XC2VP70 FPGAs running at 100MHz through a variety of different configurations. The maximum performance was obtained by instantiating a Restricted Boltzmann Machine of 256x256 nodes distributed across four FPGAs, which results in a computational speed of 3.13 billion connection-updates-per-second and a speed-up of 145-fold over an optimized C program running on a 2.8GHz Intel processor.
6

A High-performance, Reconfigurable Architecture for Restricted Boltzmann Machines

Ly, Daniel Le 15 February 2010 (has links)
Despite the popularity and success of neural networks in research, the number of resulting commercial or industrial applications have been limited. A primary cause of this lack of adoption is due to the fact that neural networks are usually implemented as software running on general-purpose processors. Hence, a hardware implementation that can take advantage of the inherent parallelism in neural networks is desired. This thesis investigates how the Restricted Boltzmann machine, a popular type of neural network, can be effectively mapped to a high-performance hardware architecture on FPGA platforms. The proposed, modular framework is designed to reduce the time complexity of the computations through heavily customized hardware engines. The framework is tested on a platform of four Xilinx Virtex II-Pro XC2VP70 FPGAs running at 100MHz through a variety of different configurations. The maximum performance was obtained by instantiating a Restricted Boltzmann Machine of 256x256 nodes distributed across four FPGAs, which results in a computational speed of 3.13 billion connection-updates-per-second and a speed-up of 145-fold over an optimized C program running on a 2.8GHz Intel processor.
7

Data Path Implementation for a Spatially Programmable Architecture Customized for Image Processing Applications

January 2016 (has links)
abstract: The last decade has witnessed a paradigm shift in computing platforms, from laptops and servers to mobile devices like smartphones and tablets. These devices host an immense variety of applications many of which are computationally expensive and thus are power hungry. As most of these mobile platforms are powered by batteries, energy efficiency has become one of the most critical aspects of such devices. Thus, the energy cost of the fundamental arithmetic operations executed in these applications has to be reduced. As voltage scaling has effectively ended, the energy efficiency of integrated circuits has ceased to improve within successive generations of transistors. This resulted in widespread use of Application Specific Integrated Circuits (ASIC), which provide incredible energy efficiency. However, these are not flexible and have high non-recurring engineering (NRE) cost. Alternatively, Field Programmable Gate Arrays (FPGA) offer flexibility to implement any application, but at the cost of higher area and energy compared to ASIC. In this work, a spatially programmable architecture customized for image processing applications is proposed. The intent is to bridge the efficiency gap between ASICs and FPGAs, by offering FPGA-like flexibility and ASIC-like energy efficiency. This architecture minimizes the energy overheads in FPGAs, which result from the use of fine-grained programming style and global interconnect. It is flexible compared to an ASIC and can accommodate multiple applications. The main contribution of the thesis is the feasibility analysis of the data path of this architecture, customized for image processing applications. The data path is implemented at the register transfer level (RTL), and the synthesis results are obtained in 45nm technology cell library from a leading foundry. The results of image-processing applications demonstrate that this architecture is within a factor of 10x of the energy and area efficiency of ASIC implementations. / Dissertation/Thesis / Masters Thesis Computer Science 2016
8

Reducing Cache Access Time in Multicore Architectures Using Hardware and Software Techniques

Avakian, Annie 27 September 2012 (has links)
No description available.
9

Dynamically reconfigurable architecture for third generation mobile systems

Alsolaim, Ahmad M. January 2002 (has links)
No description available.
10

A New N-way Reconfigurable Data Cache Architecture for Embedded Systems

Bani, Ruchi Rastogi 12 1900 (has links)
Performance and power consumption are most important issues while designing embedded systems. Several studies have shown that cache memory consumes about 50% of the total power in these systems. Thus, the architecture of the cache governs both performance and power usage of embedded systems. A new N-way reconfigurable data cache is proposed especially for embedded systems. This thesis explores the issues and design considerations involved in designing a reconfigurable cache. The proposed reconfigurable data cache architecture can be configured as direct-mapped, two-way, or four-way set associative using a mode selector. The module has been designed and simulated in Xilinx ISE 9.1i and ModelSim SE 6.3e using the Verilog hardware description language.

Page generated in 0.0872 seconds