Global ETD Search

31	Design and Implementation of Single Issue DSP Processor Core Ravinath, Vinodh January 2007 (has links) Micro processors built specifically for digital signal processing are DSP processors. DSP is one of the core technologies in rapidly growing applications like communications and audio processing. The estimated growth of DSP processors in the last 6 years is over 40%. The variety of DSP capable processors for various applications also increased with the rising popularity of DSP processors. The design flow and architecture of such processors are not commonly available to students for learning. This report is a structured approach to design and implementation of an embedded DSP processor core for voice, audio and video codec. The report focuses on the design requirement specification, senior instruction set and assembly manual release, micro architecture design and implementation of the core. Details about the core verification are also included in this report. The instruction set of this processor supports running basic kernels of BDTI benchmarking. DSP processor codec Instruction set Microarchitecture FSM RTL Computer Engineering Datorteknik
32	System-on-a-Chip (SoC) based Hardware Acceleration in Register Transfer Level (RTL) Design Niu, Xinwei 08 November 2012 (has links) Today, modern System-on-a-Chip (SoC) systems have grown rapidly due to the increased processing power, while maintaining the size of the hardware circuit. The number of transistors on a chip continues to increase, but current SoC designs may not be able to exploit the potential performance, especially with energy consumption and chip area becoming two major concerns. Traditional SoC designs usually separate software and hardware. Thus, the process of improving the system performance is a complicated task for both software and hardware designers. The aim of this research is to develop hardware acceleration workflow for software applications. Thus, system performance can be improved with constraints of energy consumption and on-chip resource costs. The characteristics of software applications can be identified by using profiling tools. Hardware acceleration can have significant performance improvement for highly mathematical calculations or repeated functions. The performance of SoC systems can then be improved, if the hardware acceleration method is used to accelerate the element that incurs performance overheads. The concepts mentioned in this study can be easily applied to a variety of sophisticated software applications. The contributions of SoC-based hardware acceleration in the hardware-software co-design platform include the following: (1) Software profiling methods are applied to H.264 Coder-Decoder (CODEC) core. The hotspot function of aimed application is identified by using critical attributes such as cycles per loop, loop rounds, etc. (2) Hardware acceleration method based on Field-Programmable Gate Array (FPGA) is used to resolve system bottlenecks and improve system performance. The identified hotspot function is then converted to a hardware accelerator and mapped onto the hardware platform. Two types of hardware acceleration methods – central bus design and co-processor design, are implemented for comparison in the proposed architecture. (3) System specifications, such as performance, energy consumption, and resource costs, are measured and analyzed. The trade-off of these three factors is compared and balanced. Different hardware accelerators are implemented and evaluated based on system requirements. 4) The system verification platform is designed based on Integrated Circuit (IC) workflow. Hardware optimization techniques are used for higher performance and less resource costs. Experimental results show that the proposed hardware acceleration workflow for software applications is an efficient technique. The system can reach 2.8X performance improvements and save 31.84% energy consumption by applying the Bus-IP design. The Co-processor design can have 7.9X performance and save 75.85% energy consumption. SoC Hardware Acceleration RTL Video Codec Computer and Systems Architecture Electrical and Electronics
33	Approximate Computing: From Circuits to Software Younghoon Kim (10184063) 01 March 2021 (has links) <div>Many modern workloads such as multimedia, recognition, mining, search, vision, etc. possess the characteristic of intrinsic application resilience: The ability to produce acceptable-quality outputs despite their underlying computations being performed in an approximate manner. Approximate computing has emerged as a paradigm that exploits intrinsic application resilience to design systems that produce outputs of acceptable quality with significant performance/energy improvement. The research community has proposed a range of approximate computing techniques spanning across circuits, architecture, and software over the last decade. Nevertheless, approximate computing is yet to be incorporated into mainstream HW/SW design processes largely due to the deviation from the conventional design flow and the lack of runtime approximation controllability by the user.</div><div><br></div><div>The primary objective of this thesis is to provide approximate computing techniques across different layers of abstraction that possess the two following characteristics: (i) They can be applied with minimal change to the conventional design flow, and (ii) the approximation is controllable at runtime by the user with minimal overhead. To this end, this thesis proposes three novel approximate computing techniques: Clock overgating which targets HW design at the Register Transfer Level (RTL), value similarity extensions which enhance general-purpose processors with a set of microarchitectural and ISA extensions, and data subsetting which targets SW executing for commodity platforms.</div><div><br></div><div>The thesis first explores clock overgating, which extends the concept of clock gating: A conventional low-power technique that turns off the clock to a Flip-Flop (FF) when the value remains unchanged. In contrast to traditional clock gating, in clock overgating the clock signals to selected FFs in the circuit are gated even when the circuit functionality is sensitive to their state. This saves additional power in the clock tree, the gated FFs and in their downstream logic, while a quality loss occurs if the erroneous FF states propagate to the circuit outputs. This thesis develops a systematic methodology to identify an energy-efficient clock overgating configuration for any given circuit and quality constraint. Towards this end, three key strategies for efficiently pruning the large space of possible overgating configurations are proposed: Significance-based overgating, grouping FFs into overgating islands, and utilizing internal signals of the circuit as triggers for overgating. Across a suite of 6 machine learning accelerators, energy benefits of 1.36X on average are achieved at the cost of a very small (<0.5%) loss in classification accuracy.</div><div><br></div><div>The thesis also explores value similarity extensions, a set of lightweight micro-architectural and ISA extensions for general-purpose processors that provide performance improvements for computations on data structures with value similarity. The key idea is that programs often contain repeated instructions that are performed on very similar inputs (e.g., neighboring pixels within a homogeneous region of an image). In such cases, it may be possible to skip an instruction that operates on data similar to a previously executed instruction, and approximate the skipped instruction's result with the saved result of the previous one. The thesis provides three key strategies for realizing this approach: Identifying potentially skippable instructions from user annotations in SW, obtaining similarity information for future load values from the data cache line currently being accessed, and a mechanism for saving & reusing results of potentially skippable instructions. As a further optimization, the thesis proposes to replace multiple loop iterations that produce similar results with a specialized instruction sequence. The proposed extensions are modeled on the gem5 architectural simulator, achieving speedup of 1.81X on average across 6 machine-learning benchmarks running on a microcontroller-class in-order processor.</div><div><br></div><div>Finally, the thesis explores a data-centric approach to approximate computing called data subsetting that shifts the focus of approximation from computations to data. The key idea is to restrict the application's data accesses to a subset of its elements so that the overall memory footprint becomes smaller. Constraining the accesses to lie within a smaller memory footprint renders the memory accesses more cache-friendly, thereby improving performance. This thesis presents a C++ data structure template called SubsettableTensor, which embodies mechanisms to define an accessible subset of data and redirect accesses away from non-subset elements, for realizing data subsetting in SW. The proposed concept is evaluated on parallel SW implementations of 7 machine learning applications on a 48-core AMD Opteron server. Experimental results indicate that 1.33X-4.44X performance improvement can be achieved within a <0.5% loss in classification accuracy.</div><div><br></div><div>In summary, the proposed approximation techniques have shown significant efficiency improvements for various machine learning applications in circuits, architecture and SW, underscoring their promise as designer-friendly approaches to approximate computing.</div> Approximate Computing Circuit Design in RTL Computer Architecture Parallel Programming
34	Analys med qPCR för bedömning av telomerlängd Johansson, Alva January 2022 (has links) No description available. Telomerer telomerlängdsmätning relativ telomerlängd (RTL) qPCR telomerförkortning Biomedical Laboratory Science/Technology
35	Implementation of Low-Bit Rate Audio Codec, Codec2, in Verilog on Modern FPGAS Sampath Kumar, Santhiya 30 April 2020 (has links) No description available. Computer Engineering Electrical Engineering
36	A Novel Prototyping and Evaluation Framework for NoC-Based MPSoC Tatas, K., Siozios, K., Bartzas, A., Kyriacou, Costas, Soudris, D. January 2013 (has links) No / This paper presents a framework for high-level exploration, Register Transfer-Level (RTL) design and rapid prototyping of Network-on-Chip (NoC) architectures. From the high-level exploration, a selected NoC topology is derived, which is then implemented in RTL using an automated design flow. Furthermore, for verification purposes, appropriate self-checking testbenches for the verification of the RTL and architecture files for the semi-automatic implementation of the system in Xilinx EDK are also generated, significantly reducing design and verification time, and therefore Non-Recurring Engineering (NRE) cost. Simulation and FPGA implementation results are given for four case studies multimedia applications, proving the validity of the proposed approach.
37	FORMAL CORRECTNESS AND COMPLETENESS FOR A SET OF UNINTERPRETED RTL TRANSFORMATIONS TEICA, ELENA 11 October 2001 (has links) No description available. RTL Transformations Formal Synthesis Mechanized in Higher-Order Logics of PVS
38	AUTOMATED CORRECTNESS CONDITION GENERATION FOR FORMAL VERIFICATION OF SYNTHESIZED RTL DESIGNS MANSOURI, NAZANIN 11 October 2001 (has links) No description available. formal verification register transfer level (RTL) High-level synthesis correctness conditions
39	Reachability Analysis of RTL Circuits Using k-Induction Bounded Model Checking and Test Vector Compaction Roy, Tonmoy 05 September 2017 (has links) In the first half of this thesis, a novel approach for k-induction bounded model checking using signal domain constraints and property partitioning for proving unreachability of branches in Verilog RTL code is presented. To do this, it approach uses program slicing with respect to the variables of the property under test to generate small-sized SMT formulas that describe the change of variable values between consecutive cycles. Variable substitution is then used on these variables to generate the formula for the subsequent cycles without traversing the abstract syntax tree of the entire design. To reduce the approximation on the induction step, an addition of signal domain constraints is proposed. Moreover, we present the technique for splitting up the property in question to get a better model of the system. The later half of the thesis is concerned with presenting a technique for doing sequential vector compaction on test set generated during simulation based ATPG. Starting with a compaction framework for storing metadata and about the test vectors during generation, this work presented to methods for findind the solution of this compaction problem. The first of these two methods generate the optimum solution by converting the problem appropriate for an optimization solver. The latter method utilizes a heuristics based approach for solving the same problem which generates a comparable but sub-optimal solution while having magnitudes better time and computational efficiency. / Master of Science RTL Verification Reachability k-Induction Bounded Model Checking Test Vector Compaction
40	Improving Branch Coverage in RTL Circuits with Signal Domain Analysis and Restrictive Symbolic Execution Bagri, Sharad 18 March 2015 (has links) Considerable research has been directed towards efficient test stimuli generation for Register Transfer Level (RTL) circuits. However, stimuli generation frameworks are still not capable of generating effective stimuli for all circuits. Some of the limiting factors are 1) It is hard to ascertain if a branch in the RTL code is reachable, and 2) Some hard-to-reach branches require intelligent algorithms to reach them. Since unreachable branches cannot be reached by any test sequence, we propose a method to deduce unreachability of a branch by looking for the possible values which a signal can take in an RTL code without explicit unrolling of the design. To the best of our knowledge, this method has been able to identify more unreachable branches than any method published in this domain, while being computationally less expensive. Moreover, some branches require very specific values on input signals in specific cycles to reach them. Conventional symbolic execution can generate those values but is computationally expensive. We propose a cycle-by-cycle restrictive symbolic execution that analyzes only a selected subset of program statements to reduce the computational cost. Our proposed method gathers information from an initial execution trace generated by any technique, to intelligently decide specific cycles where the application of this method will be helpful. This method can hybrid with simulation-based test stimuli generation methods to reduce the cost of formal verification. With this method, we were able to reach some previously unreached branches in ITC99 benchmark circuits. / Master of Science RTL circuits Validation Verification Reachability Unreachable Branch Coverage Signal Domain Test Generation Symbolic Execution

Search results