• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 30
  • 13
  • 6
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 69
  • 19
  • 16
  • 13
  • 13
  • 11
  • 11
  • 10
  • 10
  • 9
  • 9
  • 8
  • 8
  • 8
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Design and Implementation of Single Issue DSP Processor Core

Ravinath, Vinodh January 2007 (has links)
Micro processors built specifically for digital signal processing are DSP processors. DSP is one of the core technologies in rapidly growing applications like communications and audio processing. The estimated growth of DSP processors in the last 6 years is over 40%. The variety of DSP capable processors for various applications also increased with the rising popularity of DSP processors. The design flow and architecture of such processors are not commonly available to students for learning. This report is a structured approach to design and implementation of an embedded DSP processor core for voice, audio and video codec. The report focuses on the design requirement specification, senior instruction set and assembly manual release, micro architecture design and implementation of the core. Details about the core verification are also included in this report. The instruction set of this processor supports running basic kernels of BDTI benchmarking.
32

System-on-a-Chip (SoC) based Hardware Acceleration in Register Transfer Level (RTL) Design

Niu, Xinwei 08 November 2012 (has links)
Today, modern System-on-a-Chip (SoC) systems have grown rapidly due to the increased processing power, while maintaining the size of the hardware circuit. The number of transistors on a chip continues to increase, but current SoC designs may not be able to exploit the potential performance, especially with energy consumption and chip area becoming two major concerns. Traditional SoC designs usually separate software and hardware. Thus, the process of improving the system performance is a complicated task for both software and hardware designers. The aim of this research is to develop hardware acceleration workflow for software applications. Thus, system performance can be improved with constraints of energy consumption and on-chip resource costs. The characteristics of software applications can be identified by using profiling tools. Hardware acceleration can have significant performance improvement for highly mathematical calculations or repeated functions. The performance of SoC systems can then be improved, if the hardware acceleration method is used to accelerate the element that incurs performance overheads. The concepts mentioned in this study can be easily applied to a variety of sophisticated software applications. The contributions of SoC-based hardware acceleration in the hardware-software co-design platform include the following: (1) Software profiling methods are applied to H.264 Coder-Decoder (CODEC) core. The hotspot function of aimed application is identified by using critical attributes such as cycles per loop, loop rounds, etc. (2) Hardware acceleration method based on Field-Programmable Gate Array (FPGA) is used to resolve system bottlenecks and improve system performance. The identified hotspot function is then converted to a hardware accelerator and mapped onto the hardware platform. Two types of hardware acceleration methods – central bus design and co-processor design, are implemented for comparison in the proposed architecture. (3) System specifications, such as performance, energy consumption, and resource costs, are measured and analyzed. The trade-off of these three factors is compared and balanced. Different hardware accelerators are implemented and evaluated based on system requirements. 4) The system verification platform is designed based on Integrated Circuit (IC) workflow. Hardware optimization techniques are used for higher performance and less resource costs. Experimental results show that the proposed hardware acceleration workflow for software applications is an efficient technique. The system can reach 2.8X performance improvements and save 31.84% energy consumption by applying the Bus-IP design. The Co-processor design can have 7.9X performance and save 75.85% energy consumption.
33

Approximate Computing: From Circuits to Software

Younghoon Kim (10184063) 01 March 2021 (has links)
<div>Many modern workloads such as multimedia, recognition, mining, search, vision, etc. possess the characteristic of intrinsic application resilience: The ability to produce acceptable-quality outputs despite their underlying computations being performed in an approximate manner. Approximate computing has emerged as a paradigm that exploits intrinsic application resilience to design systems that produce outputs of acceptable quality with significant performance/energy improvement. The research community has proposed a range of approximate computing techniques spanning across circuits, architecture, and software over the last decade. Nevertheless, approximate computing is yet to be incorporated into mainstream HW/SW design processes largely due to the deviation from the conventional design flow and the lack of runtime approximation controllability by the user.</div><div><br></div><div>The primary objective of this thesis is to provide approximate computing techniques across different layers of abstraction that possess the two following characteristics: (i) They can be applied with minimal change to the conventional design flow, and (ii) the approximation is controllable at runtime by the user with minimal overhead. To this end, this thesis proposes three novel approximate computing techniques: Clock overgating which targets HW design at the Register Transfer Level (RTL), value similarity extensions which enhance general-purpose processors with a set of microarchitectural and ISA extensions, and data subsetting which targets SW executing for commodity platforms.</div><div><br></div><div>The thesis first explores clock overgating, which extends the concept of clock gating: A conventional low-power technique that turns off the clock to a Flip-Flop (FF) when the value remains unchanged. In contrast to traditional clock gating, in clock overgating the clock signals to selected FFs in the circuit are gated even when the circuit functionality is sensitive to their state. This saves additional power in the clock tree, the gated FFs and in their downstream logic, while a quality loss occurs if the erroneous FF states propagate to the circuit outputs. This thesis develops a systematic methodology to identify an energy-efficient clock overgating configuration for any given circuit and quality constraint. Towards this end, three key strategies for efficiently pruning the large space of possible overgating configurations are proposed: Significance-based overgating, grouping FFs into overgating islands, and utilizing internal signals of the circuit as triggers for overgating. Across a suite of 6 machine learning accelerators, energy benefits of 1.36X on average are achieved at the cost of a very small (<0.5%) loss in classification accuracy.</div><div><br></div><div>The thesis also explores value similarity extensions, a set of lightweight micro-architectural and ISA extensions for general-purpose processors that provide performance improvements for computations on data structures with value similarity. The key idea is that programs often contain repeated instructions that are performed on very similar inputs (e.g., neighboring pixels within a homogeneous region of an image). In such cases, it may be possible to skip an instruction that operates on data similar to a previously executed instruction, and approximate the skipped instruction's result with the saved result of the previous one. The thesis provides three key strategies for realizing this approach: Identifying potentially skippable instructions from user annotations in SW, obtaining similarity information for future load values from the data cache line currently being accessed, and a mechanism for saving & reusing results of potentially skippable instructions. As a further optimization, the thesis proposes to replace multiple loop iterations that produce similar results with a specialized instruction sequence. The proposed extensions are modeled on the gem5 architectural simulator, achieving speedup of 1.81X on average across 6 machine-learning benchmarks running on a microcontroller-class in-order processor.</div><div><br></div><div>Finally, the thesis explores a data-centric approach to approximate computing called data subsetting that shifts the focus of approximation from computations to data. The key idea is to restrict the application's data accesses to a subset of its elements so that the overall memory footprint becomes smaller. Constraining the accesses to lie within a smaller memory footprint renders the memory accesses more cache-friendly, thereby improving performance. This thesis presents a C++ data structure template called SubsettableTensor, which embodies mechanisms to define an accessible subset of data and redirect accesses away from non-subset elements, for realizing data subsetting in SW. The proposed concept is evaluated on parallel SW implementations of 7 machine learning applications on a 48-core AMD Opteron server. Experimental results indicate that 1.33X-4.44X performance improvement can be achieved within a <0.5% loss in classification accuracy.</div><div><br></div><div>In summary, the proposed approximation techniques have shown significant efficiency improvements for various machine learning applications in circuits, architecture and SW, underscoring their promise as designer-friendly approaches to approximate computing.</div>
34

Analys med qPCR för bedömning av telomerlängd

Johansson, Alva January 2022 (has links)
No description available.
35

Improving Bio-Inspired Frameworks

Varadarajan, Aravind Krishnan 05 October 2018 (has links)
In this thesis, we provide solutions to two different bio-inspired algorithms. The first is enhancing the performance of bio-inspired test generation for circuits described in RTL Verilog, specifically for branch coverage. We seek to improve upon an existing framework, BEACON, in terms of performance. BEACON is an Ant Colony Optimization (ACO) based test generation framework. Similar to other ACO frameworks, BEACON also has a good scope in improving performance using parallel computing. We try to exploit the available parallelism using both multi-core Central Processing Units (CPUs) and Graphics Processing Units(GPUs). Using our new multithreaded approach we can reduce test generation time by a factor of 25 — compared to the original implementation for a wide variety of circuits. We also provide a 2-dimensional factoring method for BEACON to improve available parallelism to yield some additional speedup. The second bio-inspired algorithm we address is for Deep Neural Networks. With the increasing prevalence of Neural Nets in artificial intelligence and mission-critical applications such as self-driving cars, questions arise about its reliability and robustness. We have developed a test-generation based technique and metric to evaluate the robustness of a Neural Nets outputs based on its sensitivity to its inputs. This is done by generating inputs which the neural nets find difficult to classify but at the same time is relatively apparent to human perception. We measure the degree of difficulty for generating such inputs to calculate our metric. / MS / High-level Hardware Design Languages (HDLs) has allowed designers to implement complicated hardware designs with considerably lesser effort. Unfortunately, design verification for the same circuits has failed to scale gracefully in terms of time and effort. Not only has it become more difficult for formal methods due to exponential complexity from increasing path explosion, but concrete test generation frameworks also face new issues such as the increased requirement in the volume of simulations. The advent of parallel computing using General Purpose Graphics Processing Units (GPGPUs) has led to improved performance for various applications. We propose to leverage both the multi-core CPU and the GPGPU for RTL test generation. This is achieved by implementing a test generation framework that can utilize the SIMD type parallelism available in GPGPUs and task level parallelism available on CPUs. The speedup achieved is extracted from both the test generation framework itself and also from refactoring the hardware model for multi-threaded test generation. For this purpose, we translate the RTL Verilog to a C++ and a CUDA compilable program. Experimental results show that considerable speedup can be achieved for test generation without loss of coverage. In recent years, machine learning and artificial intelligence have taken a substantial leap forward with the discovery of Deep Neural Networks(DNN). Unfortunately, apart from Accuracy and FTest numbers, there exist very few metrics to qualify a DNN. This becomes a reliability issue as DNNs are quite frequently used in safety-critical applications. It is difficult to interpret how the parameters of a trained DNN help store the knowledge from the training inputs. Therefore it is also difficult to infer whether a DNN has learned parameters which might cause an output neuron to misfire wrongly, a bug. An exhaustive search of the input space of the DNN is not only infeasible but is also misleading. Thus, in our work, we try to apply test generation techniques to generate new test inputs based on existing training and testing set to qualify the underlying robustness. Attempts to generate these inputs are guided only by the prediction probability values at the final output layer. We observe that depending on the amount of perturbation and time needed to generate these inputs we can differentiate between DNNs of varying quality.
36

Improving Branch Coverage in RTL Circuits with Signal Domain Analysis and Restrictive Symbolic Execution

Bagri, Sharad 18 March 2015 (has links)
Considerable research has been directed towards efficient test stimuli generation for Register Transfer Level (RTL) circuits. However, stimuli generation frameworks are still not capable of generating effective stimuli for all circuits. Some of the limiting factors are 1) It is hard to ascertain if a branch in the RTL code is reachable, and 2) Some hard-to-reach branches require intelligent algorithms to reach them. Since unreachable branches cannot be reached by any test sequence, we propose a method to deduce unreachability of a branch by looking for the possible values which a signal can take in an RTL code without explicit unrolling of the design. To the best of our knowledge, this method has been able to identify more unreachable branches than any method published in this domain, while being computationally less expensive. Moreover, some branches require very specific values on input signals in specific cycles to reach them. Conventional symbolic execution can generate those values but is computationally expensive. We propose a cycle-by-cycle restrictive symbolic execution that analyzes only a selected subset of program statements to reduce the computational cost. Our proposed method gathers information from an initial execution trace generated by any technique, to intelligently decide specific cycles where the application of this method will be helpful. This method can hybrid with simulation-based test stimuli generation methods to reduce the cost of formal verification. With this method, we were able to reach some previously unreached branches in ITC99 benchmark circuits. / Master of Science
37

Reachability Analysis of RTL Circuits Using k-Induction Bounded Model Checking and Test Vector Compaction

Roy, Tonmoy 05 September 2017 (has links)
In the first half of this thesis, a novel approach for k-induction bounded model checking using signal domain constraints and property partitioning for proving unreachability of branches in Verilog RTL code is presented. To do this, it approach uses program slicing with respect to the variables of the property under test to generate small-sized SMT formulas that describe the change of variable values between consecutive cycles. Variable substitution is then used on these variables to generate the formula for the subsequent cycles without traversing the abstract syntax tree of the entire design. To reduce the approximation on the induction step, an addition of signal domain constraints is proposed. Moreover, we present the technique for splitting up the property in question to get a better model of the system. The later half of the thesis is concerned with presenting a technique for doing sequential vector compaction on test set generated during simulation based ATPG. Starting with a compaction framework for storing metadata and about the test vectors during generation, this work presented to methods for findind the solution of this compaction problem. The first of these two methods generate the optimum solution by converting the problem appropriate for an optimization solver. The latter method utilizes a heuristics based approach for solving the same problem which generates a comparable but sub-optimal solution while having magnitudes better time and computational efficiency. / Master of Science
38

Implementation of Low-Bit Rate Audio Codec, Codec2, in Verilog on Modern FPGAS

Sampath Kumar, Santhiya 30 April 2020 (has links)
No description available.
39

FORMAL CORRECTNESS AND COMPLETENESS FOR A SET OF UNINTERPRETED RTL TRANSFORMATIONS

TEICA, ELENA 11 October 2001 (has links)
No description available.
40

AUTOMATED CORRECTNESS CONDITION GENERATION FOR FORMAL VERIFICATION OF SYNTHESIZED RTL DESIGNS

MANSOURI, NAZANIN 11 October 2001 (has links)
No description available.

Page generated in 0.0192 seconds