Global ETD Search

21	Design automation methodologies for extensible processor platform Cheung, Newton, Computer Science & Engineering, Faculty of Engineering, UNSW January 2005 (has links) This thesis addresses two ubiquitous trends in the embedded system world - the increasing importance of design turnaround time as a design metric, and the move towards closing the design productivity gap. Adopting the right choice of design approach has been recognised as an integral part of the design flow in order to meet desired characteristics such as increasing software content, satisfying the growing complexities of an application, reusing off-the-shelf components, and exploring design metrics tradeoff, which closes the design productivity gap. The importance of design turnaround time is motivated by the intensive competition between manufacturers, especially makers of mainstream electronic consumer products, who shrinks the product life cycle and requires faster time-to-market to maximise economic benefits. This thesis presents a suite of design automation methodologies to automatically design embedded systems for an application in the state-of-the-art design approach - the extensible processor platform. These design automation methodologies systematise the extensible processor platform???s design flow, with particular emphasis on solving four challenging design problems: i) code segment identification; ii) instruction generation; iii) architectural customisation selection; and iv) processor evaluation. Our suite of design automation methodologies includes: i) a semi-automatic design system - to design an extensible processor that maximises the application performance while satisfying the area constraint. By specifying a fitting function to identify suitable code segments within an application, a two-level hierarchy selection algorithm is used to first select a predefined processor and then select the right instruction, and a performance estimator is used to estimate an application's performance; ii) a tool to match instructions - to automatically match the pre-designed instructions with computationally intensive code segments, reducing verification time and effort; iii) an instructions estimation model - to estimate the area overhead, latency, power consumption of extensible instructions, exploring larger design space; and iv) an instructions generation tool - to generate new extensible instructions that maximises the speedup while minimising power dissipation. A number of techniques such as system decomposition, combinational equivalence checking and regression analysis etc., have been heavily relied upon in the creation of the final design system. This thesis shows results at every stage to demonstrate the efficacy of our design methodologies in the creation of extensible processors. The methodologies and results presented in this thesis demonstrate that automating the design process for an extensible processor platform results in significant performance increase - on average, an increase of 4.74x (up to 15.71x) compared to the original base processor. Our system achieves significant design turnaround time savings (2.5% of the full simulation time for the entire design space) with majority Pareto points obtained (91% on average), and can lead to fewer and faster design iterations. Our instruction matching tool is 7.3x faster on average compared to the best known approaches to the problem (partial simulations). Our estimation model has a mean absolute error as small as 3.4% (6.7% max.) for area overhead, 5.9% (9.4% max.) for latency, and 4.2% (7.2% max.) for power consumption, compared to estimation through the time consuming synthesis and simulation steps using commercial tools. Finally, the instruction generation tool reduces energy consumption by a further 5.8% on average (up to 17.7%) compared to extensible instructions generated by previous approaches. data processing design and construction integrated circuits design automation extensible processor
22	Design and simulation of a primitive RISC architecture using VHDL / Moustakas, Evangelos. January 1991 (has links) Thesis (M.S.)--Rochester Institute of Technology, 1991. / Spine title: Design of a RISC using VHDL. Typescript. Includes bibliographical references (leaf 71).
23	Course grained low power design flow using UPF / Varanasi, Archana. January 2009 (has links) Thesis (M.S.)--Rochester Institute of Technology, 2009. / Typescript. Includes bibliographical references (leaves 67-70).
24	Design and Implementation of Single Issue DSP Processor Core Ravinath, Vinodh January 2007 (has links) Micro processors built specifically for digital signal processing are DSP processors. DSP is one of the core technologies in rapidly growing applications like communications and audio processing. The estimated growth of DSP processors in the last 6 years is over 40%. The variety of DSP capable processors for various applications also increased with the rising popularity of DSP processors. The design flow and architecture of such processors are not commonly available to students for learning. This report is a structured approach to design and implementation of an embedded DSP processor core for voice, audio and video codec. The report focuses on the design requirement specification, senior instruction set and assembly manual release, micro architecture design and implementation of the core. Details about the core verification are also included in this report. The instruction set of this processor supports running basic kernels of BDTI benchmarking. DSP processor codec Instruction set Microarchitecture FSM RTL Computer Engineering Datorteknik
25	Hardware mechanisms and their implementations for secure embedded systems Qin, Jian January 2005 (has links) Security issues appearing in one or another form become a requirement for an increasing number of embedded systems. Those systems, which will be used to capture, store, manipulate, and access data with a sensitive nature, have posed several unique and urgent challenges. The challenges to those embedded system require new approaches to security covering all aspects of embedded system design from architecture, implementation to the methodology. However, security is always treated by embedded system designer as the addition of features, such as specific cryptographic algorithm or other security protocol. This paper is intended to draw both the SW and HW designer attention to treat the security issues as a new mainstream during the design of embedded system. We intend to show why hardware option issues have been taken into consideration and how those hardware mechanisms and key features of processor architecture could be implemented in the hardware level (through modification of processor architecture, for example) to deal with various potential attacks unique to embedded systems. Informationsteknik Security Hardware machanism Instruction set Embedded system Informationsteknik Computer and Information Sciences Data- och informationsvetenskap
26	Designing and implementing a new pulsar timer for the Hartebeesthoek Radio Astronomy Observatory Youthed, Andrew David January 2008 (has links) This thesis outlines the design and implementation of a single channel, dual polarization pulsar timing instrument for the Hartebeesthoek Radio Astronomy Observatory (HartRAO). The new timer is designed to be an improved, temporary replacement for the existing device which has been in operation for over 20 years. The existing device is no longer reliable and is di±cult to maintain. The new pulsar timer is designed to provide improved functional- ity, higher sampling speed, greater pulse resolution, more °exibility and easier maintenance over the existing device. The new device is also designed to keeping changes to the observation system to a minimum until a full de-dispersion timer can be implemented at theobservatory. The design makes use of an 8-bit Reduced Instruction Set Computer (RISC) micro-processor with external Random Access Memory (RAM). The instrument includes an IEEE-488 subsystem for interfacing the pulsar timer to the observation computer system. The microcontroller software is written in assembler code to ensure optimal loop execution speed and deterministic code execution for the system. The design path is discussed and problems encountered during the design process are highlighted. Final testing of the new instrument indicates an improvement in the sam- pling rate of 13.6 times and a significant reduction in 60Hz interference over the existing instrument. Astronomical observatories Radio astronomy Pulsars Astronomical instruments Reduced instruction set computers Random access memory
27	From high level architecture descriptions to fast instruction set simulators Wagstaff, Harry January 2015 (has links) As computer systems become increasingly complex and diverse, so too do the architectures they implement. This leads to an increase in complexity in the tools used to design new hardware and software. One particularly important tool in hardware and software design is the Instruction Set Simulator, which is used to prototype new architectures and hardware features, verify hardware, and test and debug software. Many Architecture Description Languages exist which facilitate the description of new architectural or hardware features, and generate a tools such as simulators. However, these typically suffer from poor performance, are difficult to test effectively, and may be limited in functionality. This thesis considers three objectives when developing Instruction Set Simulators: performance, correctness, and completeness, and presents techniques which contribute to each of these. Performance is obtained by combining Dynamic Binary Translation techniques with a novel analysis of high level architecture descriptions. This makes use of partial evaluation techniques in order to both improve the translation system, and to improve the quality of the translated code, leading a performance improvement of over 2.5x compared to a naïve implementation. This thesis also presents techniques which contribute to the correctness objective. Each possible behaviour of each described instruction is used to guide the generation of a test case. Constraint satisfaction techniques are used to determine the necessary instruction encoding and context for each behaviour to be produced. It is shown that this is a significant improvement over benchmark-driven testing, and this technique has led to the discovery of several bugs and inconsistencies in multiple state of the art instruction set simulators. Finally, several challenges in ‘Full System’ simulation are addressed, contributing to both the performance and completeness objectives. Full System simulation generally carries significant performance costs compared with other simulation strategies. Crucially, instructions which access memory require virtual to physical address translation and can now cause exceptions. Both of these processes must be correctly and efficiently handled by the simulator. This thesis presents novel techniques to address this issue which provide up to a 1.65x speedup over a state of the art solution. 004.2
28	SNIC-DSM: SmartNIC based DSM Infrastructure for Heterogeneous-ISA Machines Ramesh, Hemanth 14 August 2023 (has links) Heterogeneous computing is increasingly used in today's datacenters to meet the increasing computational demands of applications. Heterogeneous hardware typically includes CPUs, GPUs, ASICs, and FPGAs, among others. An important emerging trend is instructionset- architecture (ISA)-heterogeneity: high-end x86 servers with attached SmartNICs and SmartSSDs that incorporate general-purpose CPUs, typically of the RISC ISA family (e.g., ARM, RISC-V). To alleviate resource congestion on server computing nodes, application workloads can be scaled-out across server x86 CPUs and SmartNIC ARM CPUs using the distributed shared memory (DSM) abstraction. We present SNIC-DSM, a SmartNIC-based DSM infrastructure for heterogeneous ISA machines. SNIC-DSM implements a low-latency messaging layer, which enables inter-node communication across multi-ISA CPUs, and a DSM protocol processor that provides memory coherency among these nodes, both implemented in SmartNIC's FPGA logic. SNIC-DSM is reconfigurable and allows the implementation of different memory consistency protocols. Our experimental studies using compute-intensive benchmarks reveal that SNIC-DSM outperforms the state-of-the-art DSM - Popcorn Linux's software DSM - when server resource congestion is high. / Master of Science / The availability of heterogeneous computing architectures has led to the development of distributed shared memory systems, which allows compute-intensive applications to run in a distributed manner on different types of computing devices such as graphics processors, reconfigurable logic devices, and custom integrated circuits. Adopting such a heterogeneous computing strategy yields better performance and improves power consumption. Generally, these DSM systems use a software-based approach, which offers great flexibility but suffers from software overheads. Hardware-based approaches are used to overcome these limitations but they generally do not offer flexibility. This thesis presents, SNIC-DSM, which is a reconfigurable implementation of the DSM framework. SNIC-DSM provides a platform for the host and smart networking devices such as SmartNICs to communicate with each other and enables application execution in a distributed manner by providing memory coherency. Our experimental evaluation using High-Performance Computing benchmarks reveals that SNIC-DSM improves performance when compared with software-based DSM. Popcorn Linux Distributed Shared Memory SmartNIC Sequential Consistency Instruction-Set-Architecture Corundum
29	f-DSM: An FPGA-Accelerated Distributed Shared Memory for Heterogeneous Instruction-Set-Architecture Hardware VSathish, Naarayanan Rao 03 March 2022 (has links) Due to the diminishing relevance of Moore's Law, traditional multi-core systems are increasingly struggling to meet the computational demands of many emerging workloads. Heterogeneous computing, which involves exploiting higher degrees of parallelism (e.g., GPUs) and application-specific specialization (e.g., FPGAs), is increasingly used to meet this demand. An important architectural trend in this space involves using instruction-set-architecture (ISA) heterogeneity. An exemplar case is emerging I/O devices that include CPU cores with ISAs (e.g., ARM, RISC-V) that differ from that of host CPUs (e.g., x86) and have physically discrete memory. Shared-memory programming of such systems requires the Dis- tributed Shared Memory (DSM) abstraction. Software DSM incurs significant OS overhead for maintaining memory coherency. Despite outperforming software predecessors, hardware DSM and cache-coherent interfaces require custom chips and lack the flexibility to experiment with different DSM consistency protocols. This thesis presents fDSM, an FPGA-accelerated DSM framework for ISA-heterogeneous hardware. fDSM implements a high-speed messaging layer to enable inter-node communication across ISA-different CPU cores and a DSM protocol processor that maintains virtual memory coherency using a multiple-reader single- writer DSM algorithm. Experimental studies reveal that fDSM outperforms prior art, including Popcorn Linux's software DSM abstraction, which uses TCP-IP and state-of-the-art Infiniband RDMA messaging layers by 2.8X and 7%, respectively. fDSM also provides reconfigurability and thereby allows implementation and experimentation of different memory consistency models. / Master of Science / Moore's Law predicts that the number of transistors in a chip will double approximately every two years. Chip vendors are increasingly observing that this law is nearing its limit when transistor sizes are shrunk to 5nm and 3nm due to power consumption and heat dissipation issues. As a result, innovation in new computing architectures has increasingly focused on heterogeneity, i.e., the use of hardware performance accelerators like graphic processors and reconfigurable logic used in confluence with a computer's CPU (host). To improve the programmability of these architectures, which usually have physically separate memory, the shared-memory programming model is usually used to provide coherent virtual memory. The shared memory model, when applied to such distributed systems, called distributed shared memory (or DSM), has been previously developed in software as well as in hardware. The former usually suffer from high latency overheads, while the latter often requires custom chips and lack programmability for implementing new memory consistency protocols. This thesis presents fDSM, a reconfigurable distributed shared memory framework that provides coherent shared memory between a host and a smart I/O device such as a SmartNIC. fDSM is implemented in FPGAs, which are increasingly available in hosts and Smart I/O devices at the commodity scale. Our prototype implementation uses ISA-heterogeneous hosts to emulate such an environment. Our experimental evaluation using applications from High- Performance Computing benchmark suites reveal that fDSM yields performance benefits over a state-of-the-art software DSM. Distributed Shared Memory Sequential Consistency Popcorn Linux Field programmable gate arrays Instruction-Set-Architecture
30	Vector Instruction Set Extensions for Efficient and Reliable Computation of Keccak Rawat, Hemendra Kumar 27 August 2016 (has links) Recent processor architectures such as Intel Westmere (and later) and ARMv8 include instruction-level support for the Advanced Encryption Standard (AES), for the Secure Hashing Standard (SHA-1, SHA2) and for carry-less multiplication. These crypto-instructions are optimized for a single algorithm and provide significant performance improvements over software written using general-purpose instruction set. However, today's secure systems and protocols do not rely on just one, but a suite of many cryptographic applications that are expected to work in a correct and reliable manner. In this work, we propose a new instruction set for supporting efficient and reliable cryptography on modern processors. For efficiency, we propose flexible instruction set extensions for Keccak, a cryptographic kernel for hashing, authenticated encryption, key-stream generation and random-number generation. Keccak is the basis of the SHA-3 standard and the newly proposed Keyak and Ketje authenticated ciphers. For reliability, we propose a set of trusted instructions to verify the integrity of a cryptographic software library. These instructions are aimed at detecting tamper in the software or in the configurable hardware. We develop the instruction extensions for a 128-bit interface, commonly available in the vector processing unit of many modern processors. Simulation results on GEM5 architectural simulator show that the proposed instructions not only improves the performance of Keccak applications by 2 times (over NEON programming) and 6 times (over assembly programming), but also improves the reliability of applications at a performance overhead of just 6%. / Master of Science SIMD Instruction Set Extensions SHA-3 Hashing Authenticated Encryption Software Integrity

Search results