Global ETD Search

61	A database accelerator for energy-efficient query processing and optimization Lehner, Wolfgang, Haas, Sebastian, Arnold, Oliver, Scholze, Stefan, Höppner, Sebastian, Ellguth, Georg, Dixius, Andreas, Ungethüm, Annett, Mier, Eric, Nöthen, Benedikt, Matúš, Emil, Schiefer, Stefan, Cederstroem, Love, Pilz, Fabian, Mayr, Christian, Schüffny, Renè, Fettweis, Gerhard P. 12 January 2023 (has links) Data processing on a continuously growing amount of information and the increasing power restrictions have become an ubiquitous challenge in our world today. Besides parallel computing, a promising approach to improve the energy efficiency of current systems is to integrate specialized hardware. This paper presents a Tensilica RISC processor extended with an instruction set to accelerate basic database operators frequently used in modern database systems. The core was taped out in a 28 nm SLP CMOS technology and allows energy-efficient query processing as well as query optimization by applying selectivity estimation techniques. Our chip measurements show an 1000x energy improvement on selected database operators compared to state-of-the-art systems. info:eu-repo/classification/ddc/005 ddc:005
62	Implementace obecného assembleru / Implementation of General Assembler Husár, Adam January 2007 (has links) This thesis describes the design of the universal assembler that represents a part of the Lissom project. You will be provided with the description of the assembler architectures and their usual tasks. Special attention is paid to GNU assembler. Designed assembler consists of the fixed and the generated part. The generated part is created automatically from the description of instruction set, that is defined using architecture and instructions set description language ISAC. Using this approach, it is possible to change assembler target architecture automatically. The second part of thesis describes the Parserlib2 library implementation that is a part of the Lissom project and provides the information about the target instruction set for an assembler generator.
63	Dynamic instruction set extension of microprocessors with embedded FPGAs Bauer, Heiner 13 April 2017 (has links) (PDF) Increasingly complex applications and recent shifts in technology scaling have created a large demand for microprocessors which can perform tasks more quickly and more energy efficient. Conventional microarchitectures exploit multiple levels of parallelism to increase instruction throughput and use application specific instruction sets or hardware accelerators to increase energy efficiency. Reconfigurable microprocessors adopt the same principle of providing application specific hardware, however, with the significant advantage of post-fabrication flexibility. Not only does this offer similar gains in performance but also the flexibility to configure each device individually. This thesis explored the benefit of a tight coupled and fine-grained reconfigurable microprocessor. In contrast to previous research, a detailed design space exploration of logical architectures for island-style field programmable gate arrays (FPGAs) has been performed in the context of a commercial 22nm process technology. Other research projects either reused general purpose architectures or spent little effort to design and characterize custom fabrics, which are critical to system performance and the practicality of frequently proposed high-level software techniques. Here, detailed circuit implementations and a custom area model were used to estimate the performance of over 200 different logical FPGA architectures with single-driver routing. Results of this exploration revealed similar tradeoffs and trends described by previous studies. The number of lookup table (LUT) inputs and the structure of the global routing network were shown to have a major impact on the area delay product. However, results suggested a much larger region of efficient architectures than before. Finally, an architecture with 5-LUTs and 8 logic elements per cluster was selected. Modifications to the microprocessor, whichwas based on an industry proven instruction set architecture, and its software toolchain provided access to this embedded reconfigurable fabric via custom instructions. The baseline microprocessor was characterized with estimates from signoff data for a 28nm hardware implementation. A modified academic FPGA tool flow was used to transform Verilog implementations of custom instructions into a post-routing netlist with timing annotations. Simulation-based verification of the system was performed with a cycle-accurate processor model and diverse application benchmarks, ranging from signal processing, over encryption to computation of elementary functions. For these benchmarks, a significant increase in performance with speedups from 3 to 15 relative to the baseline microprocessor was achieved with the extended instruction set. Except for one case, application speedup clearly outweighed the area overhead for the extended system, even though the modeled fabric architecturewas primitive and contained no explicit arithmetic enhancements. Insights into fundamental tradeoffs of island-style FPGA architectures, the developed exploration flow, and a concrete cost model are relevant for the development of more advanced architectures. Hence, this work is a successful proof of concept and has laid the basis for further investigations into architectural extensions and physical implementations. Potential for further optimizationwas identified on multiple levels and numerous directions for future research were described. / Zunehmend komplexere Anwendungen und Besonderheiten moderner Halbleitertechnologien haben zu einer großen Nachfrage an leistungsfähigen und gleichzeitig sehr energieeffizienten Mikroprozessoren geführt. Konventionelle Architekturen versuchen den Befehlsdurchsatz durch Parallelisierung zu steigern und stellen anwendungsspezifische Befehlssätze oder Hardwarebeschleuniger zur Steigerung der Energieeffizienz bereit. Rekonfigurierbare Prozessoren ermöglichen ähnliche Performancesteigerungen und besitzen gleichzeitig den enormen Vorteil, dass die Spezialisierung auf eine bestimmte Anwendung nach der Herstellung erfolgen kann. In dieser Diplomarbeit wurde ein rekonfigurierbarer Mikroprozessor mit einem eng gekoppelten FPGA untersucht. Im Gegensatz zu früheren Forschungsansätzen wurde eine umfangreiche Entwurfsraumexploration der FPGA-Architektur im Zusammenhang mit einem kommerziellen 22nm Herstellungsprozess durchgeführt. Bisher verwendeten die meisten Forschungsprojekte entweder kommerzielle Architekturen, die nicht unbedingt auf diesen Anwendungsfall zugeschnitten sind, oder die vorgeschlagenen FGPA-Komponenten wurden nur unzureichend untersucht und charakterisiert. Jedoch ist gerade dieser Baustein ausschlaggebend für die Leistungsfähigkeit des gesamten Systems. Deshalb wurden im Rahmen dieser Arbeit über 200 verschiedene logische FPGA-Architekturen untersucht. Zur Modellierung wurden konkrete Schaltungstopologien und ein auf den Herstellungsprozess zugeschnittenes Modell zur Abschätzung der Layoutfläche verwendet. Generell wurden die gleichen Trends wie bei vorhergehenden und ähnlich umfangreichen Untersuchungen beobachtet. Auch hier wurden die Ergebnisse maßgeblich von der Größe der LUTs (engl. "Lookup Tables") und der Struktur des Routingnetzwerks bestimmt. Gleichzeitig wurde ein viel breiterer Bereich von Architekturen mit nahezu gleicher Effizienz identifiziert. Zur weiteren Evaluation wurde eine FPGA-Architektur mit 5-LUTs und 8 Logikelementen ausgewählt. Die Performance des ausgewählten Mikroprozessors, der auf einer erprobten Befehlssatzarchitektur aufbaut, wurde mit Ergebnissen eines 28nm Testchips abgeschätzt. Eine modifizierte Sammlung von akademischen Softwarewerkzeugen wurde verwendet, um Spezialbefehle auf die modellierte FPGA-Architektur abzubilden und eine Netzliste für die anschließende Simulation und Verifikation zu erzeugen. Für eine Reihe unterschiedlicher Anwendungs-Benchmarks wurde eine relative Leistungssteigerung zwischen 3 und 15 gegenüber dem ursprünglichen Prozessor ermittelt. Obwohl die vorgeschlagene FPGA-Architektur vergleichsweise primitiv ist und keinerlei arithmetische Erweiterungen besitzt, musste dabei, bis auf eine Ausnahme, kein überproportionaler Anstieg der Chipfläche in Kauf genommen werden. Die gewonnen Erkenntnisse zu den Abhängigkeiten zwischen den Architekturparametern, der entwickelte Ablauf für die Exploration und das konkrete Kostenmodell sind essenziell für weitere Verbesserungen der FPGA-Architektur. Die vorliegende Arbeit hat somit erfolgreich den Vorteil der untersuchten Systemarchitektur gezeigt und den Weg für mögliche Erweiterungen und Hardwareimplementierungen geebnet. Zusätzlich wurden eine Reihe von Optimierungen der Architektur und weitere potenziellen Forschungsansätzen aufgezeigt. embedded FPGA architecture design space exploration reconfigurable computing 22 nm CMOS technologie instruction set extension tight coupled fabric embedded FPGA Architektur Entwurfsraumexploration rekonfigurierbare System 22 nm Halbleitertechnologie Befehlssatzerweiterung enge Kopplung ddc:621.3 rvk:ZN 4940
64	Testování generovaných překladačů jazyka c pro procesory ve vestavěných systémech / Testing of generated C compilers for processors in embedded systems Dolíhal, Luděk Unknown Date (has links) Vestavěné systémy se staly nepostradatelnými pro náš každodenní život. Jsou to obvykle úzce zaměřená, vysoce optimalizovaná, jednoúčelová zařízení. Jádro vestavěných zařízení obvykle tvoří jeden nebo více aplikačně specifických instrukčních procesorů. Tato disertační práce se zaměřuje na problematiku testování nástrojú pro návrh aplikačně specifických procesorů a následně i samotných aplikačne specifických procesorů. Snahou bylo vytvořit systém, ve kterém bude možné otestovat jednotlivé nástroje, jako například překladač, assembler, disassembler, debugger. Nicméně vyvstává také potřeba provádět složitější testy, například integrační, které zaručí, že mezi jednotlivými nástroji nevzniká nekompatibilita. Autor vytvořil s podporou přůběžně integračního serveru prostředí, které napomáhá odhalování a odstraňování chyb při návrhu aplikačně specifických procesorů a které je navíc do značné míry automatizované.
65	Model procesoru RISC-V / RISC-V Processor Model Barták, Jiří January 2016 (has links) The number of application specific instruction set processors is rapidly increasing, because of increased demand for low power and small area designs. A lot of new instruction sets are born, but they are usually confidential. University of California in Berkeley took an opposite approach. The RISC-V instruction set is completely free. This master's thesis focuses on analysis of RISC-V instruction set and two programming languages used to model instruction sets and microarchitectures, CodAL and Chisel. Implementation of RISC-V base instruction set along with multiplication, division and 64-bit address space extensions and implementation of cycle accurate model of Rocket Core-like microarchitecture in CodAL are main goals of this master's thesis. The instruction set model is used to generate the C compiler and the cycle accurate model is used to generate RTL representation, all thanks to Codasip Studio. Generated compiler is compared against the one implemented manually and results are used for instruction set optimizations. RTL is synthesized to Artix 7 FPGA and compared to the Rocket Core synthesis.
66	Verifikace ASIP založena na formálních tvrzeních / Assertion-Based Verification of ASIP Šulek, Jakub January 2015 (has links) This thesis introduces the concept of assertion-based verifi cation of application-specifi c instruction set processors (ASIPs). The proposed design is implemented in SystemVerilog Assertions language as a part of veri fication environment created using Codasip Framework. The implemented concept is simulated in QuestaSim tool using model of Codix RISC processor. Main outcome of this thesis is the verifi cation concept usable not only on other processors, but as a part of system that automates the processor design as well.
67	Dynamic instruction set extension of microprocessors with embedded FPGAs Bauer, Heiner 31 March 2017 (has links) Increasingly complex applications and recent shifts in technology scaling have created a large demand for microprocessors which can perform tasks more quickly and more energy efficient. Conventional microarchitectures exploit multiple levels of parallelism to increase instruction throughput and use application specific instruction sets or hardware accelerators to increase energy efficiency. Reconfigurable microprocessors adopt the same principle of providing application specific hardware, however, with the significant advantage of post-fabrication flexibility. Not only does this offer similar gains in performance but also the flexibility to configure each device individually. This thesis explored the benefit of a tight coupled and fine-grained reconfigurable microprocessor. In contrast to previous research, a detailed design space exploration of logical architectures for island-style field programmable gate arrays (FPGAs) has been performed in the context of a commercial 22nm process technology. Other research projects either reused general purpose architectures or spent little effort to design and characterize custom fabrics, which are critical to system performance and the practicality of frequently proposed high-level software techniques. Here, detailed circuit implementations and a custom area model were used to estimate the performance of over 200 different logical FPGA architectures with single-driver routing. Results of this exploration revealed similar tradeoffs and trends described by previous studies. The number of lookup table (LUT) inputs and the structure of the global routing network were shown to have a major impact on the area delay product. However, results suggested a much larger region of efficient architectures than before. Finally, an architecture with 5-LUTs and 8 logic elements per cluster was selected. Modifications to the microprocessor, whichwas based on an industry proven instruction set architecture, and its software toolchain provided access to this embedded reconfigurable fabric via custom instructions. The baseline microprocessor was characterized with estimates from signoff data for a 28nm hardware implementation. A modified academic FPGA tool flow was used to transform Verilog implementations of custom instructions into a post-routing netlist with timing annotations. Simulation-based verification of the system was performed with a cycle-accurate processor model and diverse application benchmarks, ranging from signal processing, over encryption to computation of elementary functions. For these benchmarks, a significant increase in performance with speedups from 3 to 15 relative to the baseline microprocessor was achieved with the extended instruction set. Except for one case, application speedup clearly outweighed the area overhead for the extended system, even though the modeled fabric architecturewas primitive and contained no explicit arithmetic enhancements. Insights into fundamental tradeoffs of island-style FPGA architectures, the developed exploration flow, and a concrete cost model are relevant for the development of more advanced architectures. Hence, this work is a successful proof of concept and has laid the basis for further investigations into architectural extensions and physical implementations. Potential for further optimizationwas identified on multiple levels and numerous directions for future research were described. / Zunehmend komplexere Anwendungen und Besonderheiten moderner Halbleitertechnologien haben zu einer großen Nachfrage an leistungsfähigen und gleichzeitig sehr energieeffizienten Mikroprozessoren geführt. Konventionelle Architekturen versuchen den Befehlsdurchsatz durch Parallelisierung zu steigern und stellen anwendungsspezifische Befehlssätze oder Hardwarebeschleuniger zur Steigerung der Energieeffizienz bereit. Rekonfigurierbare Prozessoren ermöglichen ähnliche Performancesteigerungen und besitzen gleichzeitig den enormen Vorteil, dass die Spezialisierung auf eine bestimmte Anwendung nach der Herstellung erfolgen kann. In dieser Diplomarbeit wurde ein rekonfigurierbarer Mikroprozessor mit einem eng gekoppelten FPGA untersucht. Im Gegensatz zu früheren Forschungsansätzen wurde eine umfangreiche Entwurfsraumexploration der FPGA-Architektur im Zusammenhang mit einem kommerziellen 22nm Herstellungsprozess durchgeführt. Bisher verwendeten die meisten Forschungsprojekte entweder kommerzielle Architekturen, die nicht unbedingt auf diesen Anwendungsfall zugeschnitten sind, oder die vorgeschlagenen FGPA-Komponenten wurden nur unzureichend untersucht und charakterisiert. Jedoch ist gerade dieser Baustein ausschlaggebend für die Leistungsfähigkeit des gesamten Systems. Deshalb wurden im Rahmen dieser Arbeit über 200 verschiedene logische FPGA-Architekturen untersucht. Zur Modellierung wurden konkrete Schaltungstopologien und ein auf den Herstellungsprozess zugeschnittenes Modell zur Abschätzung der Layoutfläche verwendet. Generell wurden die gleichen Trends wie bei vorhergehenden und ähnlich umfangreichen Untersuchungen beobachtet. Auch hier wurden die Ergebnisse maßgeblich von der Größe der LUTs (engl. "Lookup Tables") und der Struktur des Routingnetzwerks bestimmt. Gleichzeitig wurde ein viel breiterer Bereich von Architekturen mit nahezu gleicher Effizienz identifiziert. Zur weiteren Evaluation wurde eine FPGA-Architektur mit 5-LUTs und 8 Logikelementen ausgewählt. Die Performance des ausgewählten Mikroprozessors, der auf einer erprobten Befehlssatzarchitektur aufbaut, wurde mit Ergebnissen eines 28nm Testchips abgeschätzt. Eine modifizierte Sammlung von akademischen Softwarewerkzeugen wurde verwendet, um Spezialbefehle auf die modellierte FPGA-Architektur abzubilden und eine Netzliste für die anschließende Simulation und Verifikation zu erzeugen. Für eine Reihe unterschiedlicher Anwendungs-Benchmarks wurde eine relative Leistungssteigerung zwischen 3 und 15 gegenüber dem ursprünglichen Prozessor ermittelt. Obwohl die vorgeschlagene FPGA-Architektur vergleichsweise primitiv ist und keinerlei arithmetische Erweiterungen besitzt, musste dabei, bis auf eine Ausnahme, kein überproportionaler Anstieg der Chipfläche in Kauf genommen werden. Die gewonnen Erkenntnisse zu den Abhängigkeiten zwischen den Architekturparametern, der entwickelte Ablauf für die Exploration und das konkrete Kostenmodell sind essenziell für weitere Verbesserungen der FPGA-Architektur. Die vorliegende Arbeit hat somit erfolgreich den Vorteil der untersuchten Systemarchitektur gezeigt und den Weg für mögliche Erweiterungen und Hardwareimplementierungen geebnet. Zusätzlich wurden eine Reihe von Optimierungen der Architektur und weitere potenziellen Forschungsansätzen aufgezeigt. info:eu-repo/classification/ddc/621.3 ddc:621.3
68	Hybrid Debugger Software on RISC-V MCU : A no cost debugging solution foreducational use / Hybriddebugger för RISC-V MCU : En kostnadsfri debuglösning för utbildningssyfte Remahl, Linus January 2022 (has links) This work details the implementation of a debugger for a small embedded RISC-V system. KTH uses an in-house designed microcontroller development board for computer and electronics design courses. The boards did not incorporate hardware debugging capabilities and no prior software implementation fulfilled the requirements for the specific target system. The debugger used a hybrid software and hardware approach for achievingbasic debugging features such as breakpoints, stepping and break signals. The hybrid approach repurposed the microcontrollers debug module to enable debugging with no external hardware. The debugger implementation met all of the requirements for being ableto be used in the intended educational setting, and had a limited footprint withregard to resource usage, but with room for further optimization. / Detta arbete beskriver implementationen av en debugger för ett mindre RISC-V system. KTH använder ett internt framtaget utvecklingskort med en mikrokontroller för kurser inom programmering för inbyggda system och elektronikdesign. Korten inkluderade inte stöd för hårdvarubaserad debugging och inga befintliga mjukvarulösningar mötte kraven för det specifika systemet. Debuggern använde en blandad hårdvaru- och mjukvarulösning för att uppnå debug-funktionalitet som brytpunkter, stegning och brytsignaler. Implementationen nyttjade den i mikrokontrollern inbyggda debugmodulen(debug module) för att tillgängliggöra debugging utan någon extern hårdvara. Implementationen mötte alla krav för att kunna användas i den tilltänkta studiemiljön, och hade en begränsad resursanvändning, men med rum för ytterligare optimeringar. RISC-V instruction set architecture debugging debug module trigger module remote serial protocol micro controller unit RISC-V instruktionsuppsättning felsökning debug module trigger module remote serial protocol mikrokontroller Embedded Systems Inbäddad systemteknik
69	An empirically derived system for high-speed rendering Rautenbach, Helperus Ritzema 25 September 2012 (has links) This thesis focuses on 3D computer graphics and the continuous maximisation of rendering quality and performance. Its main focus is the critical analysis of numerous real-time rendering algorithms and the construction of an empirically derived system for the high-speed rendering of shader-based special effects, lighting effects, shadows, reflection and refraction, post-processing effects and the processing of physics. This critical analysis allows us to assess the relationship between rendering quality and performance. It also allows for the isolation of key algorithmic weaknesses and possible bottleneck areas. Using this performance data, gathered during the analysis of various rendering algorithms, we are able to define a selection engine to control the real-time cycling of rendering algorithms and special effects groupings based on environmental conditions. Furthermore, as a proof of concept, to balance Central Processing Unit (CPU) and Graphic Processing Unit (GPU) load for and increased speed of execution, our selection system unifies the GPU and CPU as a single computational unit for physics processing and environmental mapping. This parallel computing system enables the CPU to process cube mapping computations while the GPU can be tasked with calculations traditionally handled solely by the CPU. All analysed and benchmarked algorithms were implemented as part of a modular rendering engine. This engine offers conventional first-person perspective input control, mesh loading and support for shader model 4.0 shaders (via Microsoft’s High Level Shader Language) for effects such as high dynamic range rendering (HDR), dynamic ambient lighting, volumetric fog, specular reflections, reflective and refractive water, realistic physics, particle effects, etc. The test engine also supports the dynamic placement, movement and elimination of light sources, meshes and spatial geometry. Critical analysis was performed via scripted camera movement and object and light source additions – done not only to ensure consistent testing, but also to ease future validation and replication of results. This provided us with a scalable interactive testing environment as well as a complete solution for the rendering of computationally intensive 3D environments. As a full-fledged game engine, our rendering engine is amenable to first- and third-person shooter games, role playing games and 3D immersive environments. Evaluation criteria (identified to access the relationship between rendering quality and performance), as mentioned, allows us to effectively cycle algorithms based on empirical results and to distribute specific processing (cube mapping and physics processing) between the CPU and GPU, a unification that ensures the following: nearby effects are always of high-quality (where computational resources are available), distant effects are, under certain conditions, rendered at a lower quality and the frames per second rendering performance is always maximised. The implication of our work is clear: unifying the CPU and GPU and dynamically cycling through the most appropriate algorithms based on ever-changing environmental conditions allow for maximised rendering quality and performance and shows that it is possible to render high-quality visual effects with realism, without overburdening scarce computational resources. Immersive rendering approaches used in conjunction with AI subsystems, game networking and logic, physics processing and other special effects (such as post-processing shader effects) are immensely processor intensive and can only be successfully implemented on high-end hardware. Only by cycling and distributing algorithms based on environmental conditions and through the exploitation of algorithmic strengths can high-quality real-time special effects and highly accurate calculations become as common as texture mapping. Furthermore, in a gaming context, players often spend an inordinate amount of time fine-tuning their graphics settings to achieve the perfect balance between rendering quality and frames-per-second performance. Using this system, however, ensures that performance vs. quality is always optimised, not only for the game as a whole but also for the current scene being rendered – some scenes might, for example, require more computational power than others, resulting in noticeable slowdowns, slowdowns not experienced thanks to our system’s dynamic cycling of rendering algorithms and its proof of concept unification of the CPU and GPU. / Thesis (PhD)--University of Pretoria, 2012. / Computer Science / unrestricted Shaders Fuzzy logic High dynamic range lighting Volumetric materials Instruction set utilisation Light maps Dynamic algorithm selection Shadow mapping Refraction Parralax mapping Normal maps Physics Reflection Distributed rendering Displacement mapping Depth-of-field Chromatic dispersion Stencil shadow volumes Algorithms Specular highlights Soft shadows Spatial subdivision Ambient occlusion Particles UCTD

Search results