1481 |
Energy-efficientSpatio-temporalComputing FrameworkQian, Wenchao 31 May 2016 (has links)
No description available.
|
1482 |
FPGA Design of a Multicore Neuromorphic Processing SystemZhang, Bin 18 May 2016 (has links)
No description available.
|
1483 |
A Verilog Description and Efficient Hardware Implementation of the Baillie-PSW Primality TestKasarabada, Yasaswy 20 October 2016 (has links)
No description available.
|
1484 |
Sparsity Analysis of Deep Learning Models and Corresponding Accelerator Design on FPGAYou, Yantian January 2016 (has links)
Machine learning has achieved great success in recent years, especially the deep learning algorithms based on Artificial Neural Network. However, high performance and large memories are needed for these models , which makes them not suitable for IoT device, as IoT devices have limited performance and should be low cost and less energy-consuming. Therefore, it is necessary to optimize the deep learning models to accommodate the resource-constrained IoT devices. This thesis is to seek for a possible solution of optimizing the ANN models to fit into the IoT devices and provide a hardware implementation of the ANN accelerator on FPGA. The contribution of this thesis mainly lies in two aspects: 1). analyze the sparsity in the two mainstream deep learning models – DBN and CNN. The DBN model consists of two hidden layers with Restricted Boltzmann Machines while the CNN model consists of 2 convolutional layers and 2 sub-sampling layer. Experiments have been done on the MNIST data set with the sparsity of 75%. The ratio of the multiplications resulting in near-zero values has been tested. 2). FPGA implementation of an ANN accelerator. This thesis designed a hardware accelerator for the inference process in ANN models on FPGA (Stratix IV: EP4SGX530KH40C2). The main part of hardware design is the processing array consists of 256 Multiply-Accumulators array, which can conduct multiply-accumulate operations of 256 synaptic connections simultaneously. 16-bit fixed point computation is used to reduce the hardware complexity, thus saving power and area. Based on the evaluation results, it is found that the ratio of the multiplications under the threshold of 2-5 is 75% for CNN with ReLU activation function, and is 83% for DBN with sigmoid activation function, respectively. Therefore, there still exists large space for complex ANN models to be optimized if the sparsity of data is fully utilized. Meanwhile, the implemented hardware accelerator is verified to provide correct results through 16-bit fixed point computation, which can be used as a hardware testing platform for evaluating the ANN models.
|
1485 |
A High-end Reconfigurable Computation Platform for Particle Physics ExperimentsLiu, Ming January 2008 (has links)
Modern nuclear and particle physics experiments run at a very high reaction rate and are able to deliver a data rate of up to hundred GBytes/s. This data rate is far beyond the storage and on-line analysis capability. Fortunately physicists have only interest in a very small proportion among the huge amounts of data. Therefore in order to select the interesting data and reject the background by sophisticated pattern recognition processing, it is essential to realize an efficient data acquisition and trigger system which results in a reduced data rate by several orders of magnitude. Motivated by the requirements from multiple experiment applications, we are developing a high-end reconfigurable computation platform for data acquisition and triggering. The system consists of a scalable number of compute nodes, which are fully interconnected by high-speed communication channels. Each compute node features 5 Xilinx Virtex-4 FX60 FPGAs and up to 10 GBytesDDR2 memory. A hardware/software co-design approach is proposed to develop custom applications on the platform, partitioning performance-critical calculation to the FPGA hardware fabric while leaving flexible and slow controls to the embedded CPU plus the operating system. The system is expected to be high-performance and general-purpose for various applications especially in the physics experiment domain. As a case study, the particle track reconstruction algorithm for HADES has been developed and implemented on the computation platform in the format of processing engines. The Tracking Processing Unit (TPU) recognizes peak bins on the projection plane and reconstructs particle tracks in realtime. Implementation results demonstrate its acceptable resource utilization and the feasibility to implement the module together with the sys-tem design on the FPGA. Experimental results show that the online track reconstruction computation achieves 10.8 - 24.3 times performance acceleration per TPU module when compared to the software solution on a Xeon2.4 GHz commodity server. / QC 20101118
|
1486 |
Formal Verification of FPGA Based SystemsDeng, Honghan 10 1900 (has links)
<p>In design verication, although simulation is still a widely used verication</p> <p>technique in FPGA design, formal verication is obtaining greater acceptance</p> <p>as the complexity of designs increases. In the simulation method, for a circuit</p> <p>with n inputs and m registers an exhaustive test vector will have as many as</p> <p>2<sup>(m+n)</sup> elements making it impractical for many modern circuits. Therefore</p> <p>this method is incomplete, i.e., it may fail to catch some design errors due to</p> <p>the lack of complete test coverage. Formal verication can be introduced as a</p> <p>complement to traditional verication techniques.</p> <p>The primary objectives of this thesis are determining: (i) how to for-</p> <p>malize FPGA implementations at dierent levels of abstraction, and (ii) how</p> <p>to prove their functional correctness. This thesis explores two variations of a</p> <p>formal verication framework by proving the functional correctness of several</p> <p>FPGA implementations of commonly used safety subsystem components us-</p> <p>ing the theorem prover PVS. We formalize components at the netlist level and</p> <p>the Verilog Register Transfer HDL level, preserving their functional semantics.</p> <p>Based on these formal models, we prove correctness conditions for the com-</p> <p>ponents using PVS. Finally, we present some techniques which can facilitate</p> <p>the proving process and describe some general strategies which can be used to</p> <p>prove properties of a synchronous circuit design.</p> / Master of Applied Science (MASc)
|
1487 |
Real Time Sorting of Plastic Recyclables Using an FPGA based SVMHouse, Bryan W. 10 1900 (has links)
<p>The amount of recyclable material being processed worldwide is increasing. There is a demand for new technologies that can quickly sort these materials for maxi-mum purity while maintaining high throughput. This thesis proposes a method toautomatically sort two materials: Polycoat containers and Polyethylene terephtha-late (PET) bottles. This method utilizes a visible light camera and does not relyon Near-Infrared spectrometry. A high-speed method to automatically locate re-gions that likely contain these materials within the image and remove them from thebackground is presented. These regions are merged into whole containers and are classified as either a Polycoat container or PET bottle. This is accomplished using alinear support vector machine (SVM) trained on the histogram of pixel intensities. Anovel graph theoretic based region growing technique is proposed and experimental results are provided to characterize the system. The proposed method obtained a93% recognition rate while running in real-time on an FPGA.</p> / Master of Applied Science (MASc)
|
1488 |
A Scalable Framework for Monte Carlo Simulation Using FPGA-based Hardware Accelerators with Application to SPECT ImagingKinsman, Phillip J. 04 1900 (has links)
<p>As the number of transistors that are integrated onto a silicon die continues to in- crease, the compute power is becoming a commodity. This has enabled a whole host of new applications that rely on high-throughput computations. Recently, the need for faster and cost-effective applications in form-factor constrained environments has driven an interest in on-chip acceleration of algorithms based on Monte Carlo simula- tions. Though Field Programmable Gate Arrays (FPGAs), with hundreds of on-chip arithmetic units, show significant promise for accelerating these embarrassingly paral- lel simulations, a challenge exists in sharing access to simulation data amongst many concurrent experiments. This thesis presents a compute architecture for accelerating Monte Carlo simulations based on the Network-on-Chip (NoC) paradigm for on-chip communication. We demonstrate through the complete implementation of a Monte Carlo-based image reconstruction algorithm for Single-Photon Emission Computed Tomography (SPECT) imaging that this complex problem can be accelerated by two orders of magnitude on even a modestly-sized FPGA over a 2GHz Intel Core 2 Duo Processor. Futhermore, we have created a framework for further increasing paral- lelism by scaling our architecture across multiple compute devices and by extending our original design to a multi-FPGA system nearly linear increase in acceleration with logic resources was achieved.</p> / Master of Applied Science (MASc)
|
1489 |
Performance of the Xilinx Zynq System-on-Chip Interconnect with Asymmetric MultiprocessingPowell, Andrew Andre January 2014 (has links)
For many applications, embedded designers need to construct systems that facilitate real-time constraints and thus require complete information on a processor's performance under specified parameters. An important and limiting factor in any processor's performance is how quickly components are able to intercommunicate over the system's bus. However, another important constraint, specific to real-time systems, is knowing precisely how long the data communication will require. A highly integrated system composed of multiple processing cores, referred to as a System-on-Chip (SoC) device, contains a bus known as an on-chip interconnect. Specifically, this thesis research presents how rapidly the AMBA AXI on-chip interconnect of Xilinx Zynq-7000 Extensible Processing Platform (EPP) SoC device functions by measuring the time required to communicate between memory and the two major device components of the SoC device. The memory is either internal or external. The two major device components include the processing system (PS) and programmable logic (PL). The PS contains a dual-core ARM Cortex-A9 processor that executes FreeRTOS in Asymmetric Multiprocessing. Communication between the PL and memory is through the PS-PL interfaces; the Accelerator Coherency Port AXI interface, High Performance AXI interface, and the General Purpose AXI interface. The benchmarking is performed under several, changing parameters; such as the payload size and the number of devices executing in the PL. The embedded design is implemented with Xilinx Vivado Design Suite, which includes the Vivado IDE and the SDK, and is executed on the Avnet ZedBoard and Xilinx ZC702 Evaluation Kit. / Electrical and Computer Engineering
|
1490 |
FPGA Platform for Real-Time Simulation of Tissue DeformationAjagunmo, Samson January 2008 (has links)
<p> The simulation of soft tissue deformations has many practical uses in the medical field
such as diagnosing medical conditions, training medical professionals and surgical planning. While there are many good computational models that are used in these simulations, carrying out the simulations is time consuming especially for large systems. This is because most simulators are based on software, which are run on general-purpose computers (GPC) that are not optimized to carry out the operations needed for simulation. In order to improve the performance of these simulators, field-programmable-gate-arrays (FPGA) based accelerators for carrying out Matrix-by-Vector multiplications (MVM) have been implemented by Ramachandran in 1998 and Zhuo et. al. in 2005. Zhuo et. al. also looked at the best ways to store a matrix in memory, and how this is affected by certain properties of the matrix.</p> <p> A better approach is to implement an accelerator for carrying out all operations required
for simulation on hardware. In this study we propose a hardware accelerator for simulating soft-tissue deformation using finite-difference approximation of elastodynamics equations based on conjugate-gradient inversion of sparse matrices. We designed and implemented the accelerator, which is optimized for use with sparse matrices, on FPGA. We also conducted performance and resource requirements analysis for the accelerator. Our results show this approach is capable of achieving sufficiently high computational rate for carrying out real-time simulation; even with large grids or meshes. Finally, we developed computational models for carrying out real-time simulation of tissue deformation.</p> / Thesis / Master of Applied Science (MASc)
|
Page generated in 0.0291 seconds