Global ETD Search

1	Efficient design-space exploration of custom instruction-set extensions Zuluaga, Marcela January 2010 (has links) Customization of processors with instruction set extensions (ISEs) is a technique that improves performance through parallelization with a reasonable area overhead, in exchange for additional design effort. This thesis presents a collection of novel techniques that reduce the design effort and cost of generating ISEs by advancing automation and reconfigurability. In addition, these techniques maximize the perfomance gained as a function of the additional commited resources. Including ISEs into a processor design implies development at many levels. Most prior works on ISEs solve separate stages of the design: identification, selection, and implementation. However, the interations between these stages also hold important design trade-offs. In particular, this thesis addresses the lack of interaction between the hardware implementation stage and the two previous stages. Interaction with the implementation stage has been mostly limited to accurately measuring the area and timing requirements of the implementation of each ISE candidate as a separate hardware module. However, the need to independently generate a hardware datapath for each ISE limits the flexibility of the design and the performance gains. Hence, resource sharing is essential in order to create a customized unit with multi-function capabilities. Previously proposed resource-sharing techniques aggressively share resources amongst the ISEs, thus minimizing the area of the solution at any cost. However, it is shown that aggressively sharing resources leads to large ISE datapath latency. Thus, this thesis presents an original heuristic that can be parameterized in order to control the degree of resource sharing amongst a given set of ISEs, thereby permitting the exploration of the existing implementation trade-offs between instruction latency and area savings. In addition, this thesis introduces an innovative predictive model that is able to quickly expose the optimal trade-offs of this design space. Compared to an exhaustive exploration of the design space, the predictive model is shown to reduce by two orders of magnitude the number of executions of the resource-sharing algorithm that are required in order to find the optimal trade-offs. This thesis presents a technique that is the first one to combine the design spaces of ISE selection and resource sharing in ISE datapath synthesis, in order to offer the designer solutions that achieve maximum speedup and maximum resource utilization using the available area. Optimal trade-offs in the design space are found by guiding the selection process to favour ISE combinations that are likely to share resources with low speedup losses. Experimental results show that this combined approach unveils new trade-offs between speedup and area that are not identified by previous selection techniques; speedups of up to 238% over previous selection thecniques were obtained. Finally, multi-cycle ISEs can be pipelined in order to increase their throughput. However, it is shown that traditional ISE identification techniques do not allow this optimization due to control flow overhead. In order to obtain the benefits of overlapping loop executions, this thesis proposes to carefully insert loop control flow statements into the ISEs, thus allowing the ISE to control the iterations of the loop. The proposed ISEs broaden the scope of instruction-level parallelism and obtain higher speedups compared to traditional ISEs, primarily through pipelining, the exploitation of spatial parallelism, and reducing the overhead of control flow statements and branches. A detailed case study of a real application shows that the proposed method achieves 91% higher speedups than the state-of-the-art, with an area overhead of less than 8% in hardware implementation. 004.1
2	Vector Instruction Set Extensions for Efficient and Reliable Computation of Keccak Rawat, Hemendra Kumar 27 August 2016 (has links) Recent processor architectures such as Intel Westmere (and later) and ARMv8 include instruction-level support for the Advanced Encryption Standard (AES), for the Secure Hashing Standard (SHA-1, SHA2) and for carry-less multiplication. These crypto-instructions are optimized for a single algorithm and provide significant performance improvements over software written using general-purpose instruction set. However, today's secure systems and protocols do not rely on just one, but a suite of many cryptographic applications that are expected to work in a correct and reliable manner. In this work, we propose a new instruction set for supporting efficient and reliable cryptography on modern processors. For efficiency, we propose flexible instruction set extensions for Keccak, a cryptographic kernel for hashing, authenticated encryption, key-stream generation and random-number generation. Keccak is the basis of the SHA-3 standard and the newly proposed Keyak and Ketje authenticated ciphers. For reliability, we propose a set of trusted instructions to verify the integrity of a cryptographic software library. These instructions are aimed at detecting tamper in the software or in the configurable hardware. We develop the instruction extensions for a 128-bit interface, commonly available in the vector processing unit of many modern processors. Simulation results on GEM5 architectural simulator show that the proposed instructions not only improves the performance of Keccak applications by 2 times (over NEON programming) and 6 times (over assembly programming), but also improves the reliability of applications at a performance overhead of just 6%. / Master of Science SIMD Instruction Set Extensions SHA-3 Hashing Authenticated Encryption Software Integrity
3	An Application-Specific Instruction Set for Accelerating Set-Oriented Database Primitives Arnold, Oliver, Haas, Sebastian, Fettweis, Gerhard, Schlegel, Benjamin, Kissinger, Thomas, Lehner, Wolfgang 13 June 2022 (has links) The key task of database systems is to efficiently manage large amounts of data. A high query throughput and a low query latency are essential for the success of a database system. Lately, research focused on exploiting hardware features like superscalar execution units, SIMD, or multiple cores to speed up processing. Apart from these software optimizations for given hardware, even tailor-made processing circuits running on FPGAs are built to run mostly stateless query plans with incredibly high throughput. A similar idea, which was already considered three decades ago, is to build tailor-made hardware like a database processor. Despite their superior performance, such application-specific processors were not considered to be beneficial because general-purpose processors eventually always caught up so that the high development costs did not pay off. In this paper, we show that the development of a database processor is much more feasible nowadays through the availability of customizable processors. We illustrate exemplarily how to create an instruction set extension for set-oriented database rimitives. The resulting application-specific processor provides not only a high performance but it also enables very energy-efficient processing. Our processor requires in various configurations more than 960x less energy than a high-end x86 processor while providing the same performance. info:eu-repo/classification/ddc/004 ddc:004

1

Page generated in 0.1092 seconds