Global ETD Search

51	Design of The Rendezvous Mechanism In The Multi-Core AMBA System Chang, Mu-Chi 06 August 2008 (has links) In current chip multi-processors (CMPs), the on-chip network is a major factor affecting overall system performance. Different kinds of communication protocols vary from different communication architectures of current SOC designs. For example, the AMBA is master-slave architecture, which transacts and communicates the data of between the two CORE (Master) through the Memory (Slave). The architecture cost long time for load and store with memory. Hence, this paper design and implement a Rendezvous protocol on AMBA architecture, which is called Rendezvous of Advanced High performance Bus (RAHB), to let two processors can communicate with each other without memory reference overheads. The RAHB is compatible with the AHB architecture, and add Rendezvous communication protocol in the AMBA architecture to perform the direct transmission of data. Without referring the memory, the RAHB can improve the efficiency of communication in multi-core. For experimental evaluation, we evaluate the performance between RAHB and AHB, RAHB speedup (B/s) is average up to 50% for different data length and performance up 30% to 40% for executing test program. Inter-processor communication AMBA multi-core Rendezvous
52	Cryptoraptor : high throughput reconfigurable cryptographic processor for symmetric key encryption and cryptographic hash functions Sayilar, Gokhan 03 February 2015 (has links) In cryptographic processor design, the selection of functional primitives and connection structures between these primitives are extremely crucial to maximize throughput and flexibility. Hence, detailed analysis on the specifications and requirements of existing crypto-systems plays a crucial role in cryptographic processor design. This thesis provides the most comprehensive literature review that we are aware of on the widest range of existing cryptographic algorithms, their specifications, requirements, and hardware structures. In the light of this analysis, it also describes a high performance, low power, and highly flexible cryptographic processor, Cryptoraptor, that is designed to support both today's and tomorrow's encryption standards. To the best of our knowledge, the proposed cryptographic processor supports the widest range of cryptographic algorithms compared to other solutions in the literature and is the only crypto-specific processor targeting the future standards as well. Unlike previous work, we aim for maximum throughput for all known encryption standards, and to support future standards as well. Our 1GHz design achieves a peak throughput of 128Gbps for AES-128 which is competitive with ASIC designs and has 25X and 160X higher throughput per area than CPU and GPU solutions, respectively. / text Cryptography Cryptographic processor High throughput Reconfigurability ASIC
53	The effect of toe trimming on heavy turkey toms' productivity and welfare 2013 December 1900 (has links) Toe trimming within the turkey industry has been used for over four decades as a method for controlling carcass scratching, and by doing so, achieving better grades and lower condemnation rates. The industry has changed greatly since the 1970’s, when the majority of the research on the procedure was completed. The technology used for toe trimming has switched from a hot-blade to the use of microwave energy, which will effect healing and toe length trimmed. The birds are larger now, which will impact mobility both before and after trimming, and in a consumer-driven trend, the industry is re-examining its codes of practice to ensure the highest level of welfare possible. As there is little pertinent research regarding these changes, the toe trimming procedure was re-examined under modern conditions and with focus on both production and welfare effects to determine if the practice can still be recommended. Hybrid Converter toms were raised to 140 d of age, with half (153) being toe trimmed at the hatchery using a Microwave Claw Processor (MCP) and the other half (153) left with their toes intact. The birds had feed consumption, body weight, mortality, toe length, stance, behaviour, and gait scores monitored throughout the trial with carcass damage assessed at processing. Means were considered significantly different when P≤0.05. Toe trimming caused a reduction in both feed consumption and body weight in the later stages of the experiment. Final weights for non-toe trimmed and toe trimmed toms were 21.70 kgs and 21.15 kgs, respectively. Feed efficiency with and without being corrected for mortality was unaffected by the procedure. Overall mortality and mortality by age group were also unaffected; however it was found that toe trimmed toms experienced higher levels of rotated tibia at 3.27% versus 0.65% for untrimmed birds. Toe length measurements found that trimmed toes were, on average, 91.9% the length of an intact toe, and that variability in length increased with trimming. The procedure was not found to impact stance or gait score, although behaviour at all ages measured demonstrated reduced mobility with trimming. In particular, reduced activity in poults for 5 d post-treatment indicates that the MCP treatment caused pain or discomfort. The percentage of carcasses which exhibited scratching was 15.6% for the non-trimmed treatment and 13.3% for the trimmed, which were not significantly different. Also, no significant effect of trimming was found for any other carcass damage category. Based on the negative impacts of toe trimming on both bird production and welfare found in this research, MCP treatment should not be recommended to turkey producers when raising heavy toms. Toe clipping Microwave Claw Processor Carcass quality
54	Operating System Techniques for Reducing Processor State Pollution Soares, Livio 31 August 2012 (has links) Application performance on modern processors has become increasingly dictated by the use of on-chip structures, such as caches and look-aside buffers. The hierarchical (multi-leveled) design of processor structures, the ubiquity of multicore processor architectures, as well as the increasing relative cost of accessing memory have all contributed to this condition. Our thesis is that operating systems should provide services and mechanisms for applications to more efficiently utilize on-chip processor structures. As such, this dissertation demonstrates how the operating system can improve processor efficiency of applications through specific techniques. Two operating system services are investigated: (1) improving secondary and last-level cache utilization through a run-time cache filtering technique, and (2) improving the processor efficiency of system intensive applications through a new exception-less system call mechanism. With the first mechanism, we introduce the concept of a software pollute buffer and show that it can be used effectively at run-time, with assistance from commodity hardware performance counters, to reduce pollution of secondary on-chip caches. In the second mechanism, we are able to decouple application and operating system execution, showing the benefits of the reduced interference in various processor components such as the first level instruction and data caches, second level caches and branch predictor. We show that exception-less system calls are particularly effective on modern multicore processors. We explore two ways for applications to use exception-less system calls. The first way, which is completely transparent to the application, uses multi-threading to hide asynchronous communication between the operating system kernel and the application. In the second way, we propose that applications can directly use the exception-less system call interface by designing programs that follow an event-driven architecture. Operating Systems Processors Processor Caches Multicore 0984
55	Operating System Techniques for Reducing Processor State Pollution Soares, Livio 31 August 2012 (has links) Application performance on modern processors has become increasingly dictated by the use of on-chip structures, such as caches and look-aside buffers. The hierarchical (multi-leveled) design of processor structures, the ubiquity of multicore processor architectures, as well as the increasing relative cost of accessing memory have all contributed to this condition. Our thesis is that operating systems should provide services and mechanisms for applications to more efficiently utilize on-chip processor structures. As such, this dissertation demonstrates how the operating system can improve processor efficiency of applications through specific techniques. Two operating system services are investigated: (1) improving secondary and last-level cache utilization through a run-time cache filtering technique, and (2) improving the processor efficiency of system intensive applications through a new exception-less system call mechanism. With the first mechanism, we introduce the concept of a software pollute buffer and show that it can be used effectively at run-time, with assistance from commodity hardware performance counters, to reduce pollution of secondary on-chip caches. In the second mechanism, we are able to decouple application and operating system execution, showing the benefits of the reduced interference in various processor components such as the first level instruction and data caches, second level caches and branch predictor. We show that exception-less system calls are particularly effective on modern multicore processors. We explore two ways for applications to use exception-less system calls. The first way, which is completely transparent to the application, uses multi-threading to hide asynchronous communication between the operating system kernel and the application. In the second way, we propose that applications can directly use the exception-less system call interface by designing programs that follow an event-driven architecture. Operating Systems Processors Processor Caches Multicore 0984
56	Neuroninio procesoriaus prototipas FPGA technologijoje / Neural processor prototype in FPGA technology Sasnauskas, Justas 25 May 2005 (has links) In this paper it is described a method of creation of a neuroprocessor prototype, from high abstraction level to FPGA technology. Most common neuroprocessor architectures are overviewed, and canonical model of the neuroprocessor is created accordingly. Based on the canonical model the serial structure neuroprocessor mathematical model is formed and evaluated. The model is than described in SystemC programming language. In the experimental part, the correct functionality of the neuroprocessor is evaluated and the results of synthesis are analyzed. Informatics Engineering FPGA neural processor Neuroninis procesorius
57	Efficient design-space exploration of custom instruction-set extensions Zuluaga, Marcela January 2010 (has links) Customization of processors with instruction set extensions (ISEs) is a technique that improves performance through parallelization with a reasonable area overhead, in exchange for additional design effort. This thesis presents a collection of novel techniques that reduce the design effort and cost of generating ISEs by advancing automation and reconfigurability. In addition, these techniques maximize the perfomance gained as a function of the additional commited resources. Including ISEs into a processor design implies development at many levels. Most prior works on ISEs solve separate stages of the design: identification, selection, and implementation. However, the interations between these stages also hold important design trade-offs. In particular, this thesis addresses the lack of interaction between the hardware implementation stage and the two previous stages. Interaction with the implementation stage has been mostly limited to accurately measuring the area and timing requirements of the implementation of each ISE candidate as a separate hardware module. However, the need to independently generate a hardware datapath for each ISE limits the flexibility of the design and the performance gains. Hence, resource sharing is essential in order to create a customized unit with multi-function capabilities. Previously proposed resource-sharing techniques aggressively share resources amongst the ISEs, thus minimizing the area of the solution at any cost. However, it is shown that aggressively sharing resources leads to large ISE datapath latency. Thus, this thesis presents an original heuristic that can be parameterized in order to control the degree of resource sharing amongst a given set of ISEs, thereby permitting the exploration of the existing implementation trade-offs between instruction latency and area savings. In addition, this thesis introduces an innovative predictive model that is able to quickly expose the optimal trade-offs of this design space. Compared to an exhaustive exploration of the design space, the predictive model is shown to reduce by two orders of magnitude the number of executions of the resource-sharing algorithm that are required in order to find the optimal trade-offs. This thesis presents a technique that is the first one to combine the design spaces of ISE selection and resource sharing in ISE datapath synthesis, in order to offer the designer solutions that achieve maximum speedup and maximum resource utilization using the available area. Optimal trade-offs in the design space are found by guiding the selection process to favour ISE combinations that are likely to share resources with low speedup losses. Experimental results show that this combined approach unveils new trade-offs between speedup and area that are not identified by previous selection techniques; speedups of up to 238% over previous selection thecniques were obtained. Finally, multi-cycle ISEs can be pipelined in order to increase their throughput. However, it is shown that traditional ISE identification techniques do not allow this optimization due to control flow overhead. In order to obtain the benefits of overlapping loop executions, this thesis proposes to carefully insert loop control flow statements into the ISEs, thus allowing the ISE to control the iterations of the loop. The proposed ISEs broaden the scope of instruction-level parallelism and obtain higher speedups compared to traditional ISEs, primarily through pipelining, the exploitation of spatial parallelism, and reducing the overhead of control flow statements and branches. A detailed case study of a real application shows that the proposed method achieves 91% higher speedups than the state-of-the-art, with an area overhead of less than 8% in hardware implementation. 004.1
58	Reconfigurable Cache Memory Brewer, Jeffery Ramon 01 January 2009 (has links) AN ABSTRACT OF THE THESIS OF Jeffery R. Brewer, for the Master degree in Electrical Computer Engineer, presented on May 22, 2009 at Southern Illinois University Carbondale. TITLE: Reconfigurable Cache Memory MAJOR PROFESSOR: Dr. Nazeih Botros As chip designers continue to push the performance of microprocessors to higher levels the energy demand grows. The increase need for integrated chips that provide energy savings without degrading performance is paramount. The cache memory is typically over fifty percent of the size of today's microprocessor chip, and consumes a significant percentage of the total power. Therefore, by designing a reconfigurable cache that's able to dynamically adjust to a smaller cache size without encountering a significant degrade in performance, we are able to realize power conservation. Tournament caching is a reconfigurable method that tracks the current performance of the cache and compares it to possible smaller or larger cache size [1] . The results in this thesis shows that reconfigurable cache memory implemented with a configuration mechanism like Tournament caching would take advantage of associativity and cache size while providing energy conservation. i Cache computer energy memory processor Reconfigurable
59	Development of the NoGAP CL Hardware Description Language and its Compiler Blumenthal, Carl January 2007 (has links) The need for a more general hardware description language aimed specifically at processors, and vague notions and visions of how that language would be realized, lead to this thesis. The aim was to use the visions and initial ideas to evolve and formalize a language and begin implementing the tools to use it. The language, called NoGAP Common Language, is designed to give the programmer freedom to implement almost any processor design without being encumbered by many of the tedious tasks normally present in the creation process. While evolving the language it was chosen to borrow syntaxes from C++ and verilog to make the code and concepts easy to understand. The main advantages of NoGAP Common Language compared to RTL languages are; -the ability to define the data paths of instructions separate from each other and have them merged automatically along with assigned timings to form the pipeline. -having control paths automatically routed by activating named clauses of code coupled to control signals. -being able to specify a decoder, where the instructions and control structures are defined, that control signals are routed to. The implemented compiler was created with C++, Bison, and Flex and utilizes an AST structure, a symbol table, and a connection graph. The AST is traversed by several functions to generate the connection graph where the instructions of the processor can be merged into a pipeline. The compiler is in the early stages of development and much is left to do and solve. It has become clear though that the concepts of NoGAP Common Language can be implemented and are not just visions. / Behovet av ett mer generellt hårdvarubeskrivande språk specialiseret för processorer och visioner om ett sådant gav upphov till detta examensarbete. Målet var att utveckla visionerna, formalisera dem till ett fungerande språk och börja implementera dess verktyg. Språket, som kallas NoGAP Common Language, är designat för att ge programmeraren friheten att implementera nästan vilken processordesign som helst utan att bli nedtyngd av många av de enformiga uppgifter som annars måste utföras. Under utvecklingsprocessen valdes det att låna många syntax från C++ och verilog för att göra språket lätt att förstå och känna igen för många. De största fördelarna med att utveckla i NoGAP Common Language jämfört med vanliga RTL språk som verilog är; -att kunna specificera datavägar för instruktioner separat från varandra och få dem automatiskt förenade med hjälp av tidsangivelser till en pipeline. -att få kontrollvägar automatiskt dragna genom att aktivera namngivna klausuler med kod kopplade till kontrollsignaler. -att kunna specifiera en avkodare som kontrollvägarna kan kopplas till där kodning för instruktioner kan anges. Kompilatorn som implementerats med C++, Bison och Flex använder sig av en AST struktur, en symboltabell och en signalvägsgraf. AST strukturen traverseras av flera funktioner som bygger upp signalvägsgrafen där processorns instruktioner förenas till en pipeline. Utvecklingen av kompilatorn är ännu bara i de första stadierna och mycket är kvar att göra och lösa. Det har dock blivit klart att det är möjligt att implementera koncepten i NoGAP Common Language och att de inte bara är lösa visioner. NoGAP HDL processor compiler Computer Engineering Datorteknik
60	Color Aware Neural ISP Souza, Matheus 03 1900 (has links) Image signal processors (ISPs) are historically grown legacy software systems for reconstructing color images from noisy raw sensor measurements. They are usually composited of many heuristic blocks for denoising, demosaicking, and color restoration. Color reproduction in this context is of particular importance, since the raw colors are often severely distorted, and each smart phone manufacturer has developed their own characteristic heuristics for improving the color rendition, for example of skin tones and other visually important colors. In recent years there has been strong interest in replacing the historically grown ISP systems with deep learned pipelines. Much progress has been made in approximating legacy ISPs with such learned models. However, so far the focus of these efforts has been on reproducing the structural features of the images, with less attention paid to color rendition. Here we present Color Rendition ISP (CRISPnet), the first learned ISP model to specifically target color rendition accuracy relative to a complex, legacy smart phone ISP. We achieve this by utilizing both image metadata (like a legacy ISP would), as well as by learning simple global semantics based on image classification – similar to what a legacy ISP does to determine the scene type. We also contribute a new ISP image dataset consisting of both high dynamic range monitor data, as well as real-world data, both captured with an actual cell phone ISP pipeline under a variety of lighting conditions, exposure times, and gain settings. Image signal processor image restoration color rendition.

Search results