Global ETD Search

91	Verhaltensbeschreibung in der High-Level Synthese Schmidt, Marco, Möhrke, Ulrich, Herrmann, Paul 12 July 2019 (has links) Was versteht man unter High-Level Synthese? Wie beschreibt man das Verhalten einer Schaltung in VHDL? Diese zwei Fragen sollen hier erörtert werden. Zuerst wird kurz das High-Level Synthese Programm Caddy vorgestellt und die internen Verarbeitungsschritte kurz aufgezeigt. Dann werden die verschiedenen Stufen der Schaltungsbeschreibbung mit ihren jeweiligen Vor- und Nachteilen diskutiert. Zum Schluss wird noch auf die Grenzen von VHDL-Verhaltensbeschreibungen eingegangen und mögliche Lösungsvorschläge gemacht, um diese Grenzen zu erweitern. Es wird im Grossen und Ganzen nur die momentane Entwicklung zusammengefasst. Dabei soll dieser Bericht auch als Anleitung zur VHDL-Verhaltensbeschreibung dienen. info:eu-repo/classification/ddc/004 ddc:004
92	VH2FG - ein VHDL nach Flussgraph Konverter für Caddy Schmidt, Marco, Möhrke, Ulrich, Herrmann, Paul 12 July 2019 (has links) Für das Erstellen von Schaltungen wird immer mehr die Hardwarebeschreibungssprache VHDL eingesetzt. Sie bietet viele Vorteile gegenüber den bisherigen Beschreibungsmethoden, wie zum Beispiel das Aufbauen der Schaltung mit Hilfe von graphischen CAD Werkzeugen. Ein wichtiger Faktor ist dabei die Entwicklungszeit. Dazu benötigt man aber auch leistungsfähige Syntheseprogramme. Das High-Level Syntheseprogramm Caddy ist nicht in der Lage direkt VHDL-Quelltext zu verarbeiten. Mit dem Programm vh2fg existiert nun die Möglichkeit die VHDL-Beschreibung in ein Flussgraphenformat umzuwandeln, das Caddy verarbeiten kann. Hier wird eine Anleitung gegeben, wie man mit diesem Programm umgeht. Es wird die VHDL-Eingabe spezifiziert und die Ausgabe in Flussgraph erklärt. info:eu-repo/classification/ddc/004 ddc:004
93	Compiler-Based Tools to Aid in Data Transfer Optimization and On-Chip Debug of Heterogeneous Compute Systems Ashcraft, Matthew B. 07 July 2020 (has links) First, we present techniques to efficiently schedule data transfers through compiler analyses. Compared to transferring data immediately before and after the kernel executes, our scheduling results in orders of magnitude improvements in execution time, number of data transfers, and number of bytes transferred. Second, we demonstrate techniques to provide on-chip debugging for heterogeneous systems through recording execution on the software in addition to debugging circuitry in the hardware, and provide a temporal correlation between the hardware and software traces through synchronization. This allows us to follow debug data between the hardware and software trace buffers. Due to the added cost of synchronizing the trace buffers, we explore synchronization schemes which can reduce the impact synchronization depending on the code structure. We demonstrate the quantitative impact of these techniques on execution time and hardware and software resources, which are under a 2x increase to execution time in most cases. Third, we demonstrate how source-code debugging techniques for on-chip debugging can be applied to OpenCL FPGA kernels in heterogeneous systems. We developed techniques and a tool-flow that allows users to select variables to record, automatically insert recording instructions into the kernel source code, synthesize the changes directly into the hardware design using commercial HLS tools, retrieve the trace data through kernel arguments, and present it to the user. Overall, quantitative measurements showed our techniques resulted in modest increases to execution time and hardware resources. compilers accelerators GPGPU data transfers HLS high-level Synthesis FPGA Engineering
94	Dynamic Reconfigurable Real-Time Video Processing Pipelines on SRAM-based FPGAs Wilson, Andrew Elbert 23 June 2020 (has links) For applications such as live video processing, there is a high demand for high performance and low latency solutions. The configurable logic in FPGAs allows for custom hardware to be tailored to a specific video application. These FPGA designs require technical expertise and lengthy implementation times by vendor tools for each unique solution. This thesis presents a dynamically configurable topology as an FPGA overlay to deploy custom hardware processing pipelines during run-time by utilizing dynamic partial reconfiguration. Within the FPGA overlay, a configurable topology with a routable switch allows video streams to be copied and mixed to create complex data paths. This work demonstrates a dynamic video processing pipeline with 11 reconfigurable regions and 16 unique processing cores, allowing for billions of custom run-time configurations. Xilinx FPGA dynamic partial reconfiguration high-level synthesis video processing Computer Engineering
95	VARIATIONS ON ROTATION SCHEDULING Richter, Michael Edwin 13 September 2007 (has links) No description available. Computer Science static scheduling high-level synthesis rotation scheduling list scheduling VLIW
96	Toward Automatically Composed FPGA-Optimized Robotic Systems Using High-Level Synthesis Lin, Szu-Wei 14 April 2023 (has links) (PDF) Robotic systems are known to be computationally intensive. To improve performance, developers tend to implement custom robotic algorithms in hardware. However, a full robotic system typically consists of many interconnected algorithmic components that can easily max-out FPGA resources, thus requiring the designer to adjust each algorithm design for each new robotic systems in order to meet specific systems requirements and limited resources. Furthermore, manual development of digital circuitry using a hardware description language (HDL) such as verilog or VHDL, is error-prone, time consuming, and often takes months or years to develop and verify. Recent developments in high-level synthesis (HLS), enable automatic generation of digital circuit designs from high-level languages such as C or C++. In this thesis, we propose to develop a database of HLS-generated pareto-optimal hardware designs for various robotic algorithms, such that a fully automated process can optimally compose a complete robotic system given a set of system requirements. In the first part of this thesis, we take a first step towards this goal by developing a system for automatic selection of an Occupancy Grid Mapping (OGM) implementation given specific system requirements and resource thresholds. We first generate hundreds of possible hardware designs via Vitis HLS as we vary parameters to explore the designs space. We then present results which evaluate and explore trade-offs of these designs with respect to accuracy, latency, resource utilization, and power. Using these results, we create a software tool which is able to automatically select an optimal OGM implementation. After implementing selected designs on a PYNQ-Z2 FPGA board, our results show that the runtime of the algorithm improves by 35x over a C++-based implementation. In the second part of this thesis, we extend these same techniques to the Particle Filter (PF) algorithm by implementing 7 different resampling methods and varying parameters on hardware, again via HLS. In this case, we are able to explore and analyze thousands of PF designs. Our evaluation results show that runtime of the algorithm using Local Selection Resampling method reaches the fastest performance on an FPGA and can be as much as 10x faster than in C++. Finally, we build another design selection tool that automatically generates an optimal PF implementation from this design space for a given query set of requirements. robotic system Occupancy Grid Mapping Particle Filter Resampling algorithm parallel process FPGA High-Level Synthesis Engineering
97	Resource-constraint And Scalable Data Distribution Management For High Level Architecture Gupta, Pankaj 01 January 2007 (has links) In this dissertation, we present an efficient algorithm, called P-Pruning algorithm, for data distribution management problem in High Level Architecture. High Level Architecture (HLA) presents a framework for modeling and simulation within the Department of Defense (DoD) and forms the basis of IEEE 1516 standard. The goal of this architecture is to interoperate multiple simulations and facilitate the reuse of simulation components. Data Distribution Management (DDM) is one of the six components in HLA that is responsible for limiting and controlling the data exchanged in a simulation and reducing the processing requirements of federates. DDM is also an important problem in the parallel and distributed computing domain, especially in large-scale distributed modeling and simulation applications, where control on data exchange among the simulated entities is required. We present a performance-evaluation simulation study of the P-Pruning algorithm against three techniques: region-matching, fixed-grid, and dynamic-grid DDM algorithms. The P-Pruning algorithm is faster than region-matching, fixed-grid, and dynamic-grid DDM algorithms as it avoid the quadratic computation step involved in other algorithms. The simulation results show that the P-Pruning DDM algorithm uses memory at run-time more efficiently and requires less number of multicast groups as compared to the three algorithms. To increase the scalability of P-Pruning algorithm, we develop a resource-efficient enhancement for the P-Pruning algorithm. We also present a performance evaluation study of this resource-efficient algorithm in a memory-constraint environment. The Memory-Constraint P-Pruning algorithm deploys I/O efficient data-structures for optimized memory access at run-time. The simulation results show that the Memory-Constraint P-Pruning DDM algorithm is faster than the P-Pruning algorithm and utilizes memory at run-time more efficiently. It is suitable for high performance distributed simulation applications as it improves the scalability of the P-Pruning algorithm by several order in terms of number of federates. We analyze the computation complexity of the P-Pruning algorithm using average-case analysis. We have also extended the P-Pruning algorithm to three-dimensional routing space. In addition, we present the P-Pruning algorithm for dynamic conditions where the distribution of federated is changing at run-time. The dynamic P-Pruning algorithm investigates the changes among federates regions and rebuilds all the affected multicast groups. We have also integrated the P-Pruning algorithm with FDK, an implementation of the HLA architecture. The integration involves the design and implementation of the communicator module for mapping federate interest regions. We provide a modular overview of P-Pruning algorithm components and describe the functional flow for creating multicast groups during simulation. We investigate the deficiencies in DDM implementation under FDK and suggest an approach to overcome them using P-Pruning algorithm. We have enhanced FDK from its existing HLA 1.3 specification by using IEEE 1516 standard for DDM implementation. We provide the system setup instructions and communication routines for running the integrated on a network of machines. We also describe implementation details involved in integration of P-Pruning algorithm with FDK and provide results of our experiences. Data Distribution Management High Level Architecture Distributed Simulation FDK Distributed Computing Computer Sciences Engineering
98	Semantic Video Retrieval Using High Level Context Aytar, Yusuf 01 January 2008 (has links) Video retrieval - searching and retrieving videos relevant to a user defined query - is one of the most popular topics in both real life applications and multimedia research. This thesis employs concepts from Natural Language Understanding in solving the video retrieval problem. Our main contribution is the utilization of the semantic word similarity measures for video retrieval through the trained concept detectors, and the visual co-occurrence relations between such concepts. We propose two methods for content-based retrieval of videos: (1) A method for retrieving a new concept (a concept which is not known to the system and no annotation is available) using semantic word similarity and visual co-occurrence, which is an unsupervised method. (2) A method for retrieval of videos based on their relevance to a user defined text query using the semantic word similarity and visual content of videos. For evaluation purposes, we mainly used the automatic search and the high level feature extraction test set of TRECVID'06 and TRECVID'07 benchmarks. These two data sets consist of 250 hours of multilingual news video captured from American, Arabic, German and Chinese TV channels. Although our method for retrieving a new concept is an unsupervised method, it outperforms the trained concept detectors (which are supervised) on 7 out of 20 test concepts, and overall it performs very close to the trained detectors. On the other hand, our visual content based semantic retrieval method performs more than 100% better than the text-based retrieval method. This shows that using visual content alone we can have significantly good retrieval results. Semantic Video Retrieval Video retrieval High-level Context Computer Sciences Engineering
99	Temporal Sparse Encoding and Decoding of Arrays in Systems Based on the High Level Architecture Standard Severinsson, Viktor, Thörnblom, Johan January 2022 (has links) In this thesis, a method for encoding and decoding arrays in systems based on the standard High Level Architecture is presented. High Level Architecture is a standard in the simulation industry, which enables interoperability between different simulation systems. When simulations share specific data with other simulations, they always send all parts of the data. This can become quite inefficient when the data is of an array type and only one or a few of its elements' values have changed. The whole array is always transmitted regardless whether the other simulations in the system need all elements or just the ones that have been modified since the last transmission. Therefore there might be more traffic on the network than needed in these cases. The proposed method, named Temporal Sparse Encoding, only encodes the modified elements when it needs to, plus some additional bytes as overhead, that allows for only sending updated elements. The method is based on the concept of sparse arrays and matrices, and is inspired by the Coordinate format, which uses extra arrays with indices referring to specific elements of interest. In a small simulation system, acting as a testing environment, it is shown how Temporal Sparse Encoding can save both time and above all, bandwidth, when sharing updates. Each test was carried out 10 times and in each test case 1 000 updates were transmitted. In each test case the transmission time was measured and the compression ratio was calculated by dividing the number of bytes in the encoding containing all elements by number of bytes in the encoding containing just the updated ones. The biggest compression ratio was calculated to be 750.13 and came from when 1 out of 1 000 elements were updated and transmitted. The smallest compression ratio was 1.00 and came from all the cases where all the array's elements were updated and transmitted. Some of the conclusions that were made was that the Temporal Sparse Encoding can save up to 33% of the time compared to the standard encoding and that a lot of the transmission time is spent on extracting elements once they have been decoded. These findings suggest that endeavors in optimization should be focused at the language level, specifically on management of data, rather than the transmission of data when there is not a lot of traffic occurring on the network. Sparse Encoding Temporal Sparse Encoding HLA High Level Architecture Other Computer and Information Science Annan data- och informationsvetenskap
100	ALGORITHMS FOR COUPLING CIRCUIT AND PHYSICAL SYNTHESIS WITH HIGH-LEVEL DESIGN-SPACE EXPLORATION FOR 2D AND 3D SYSTEMS MUKHERJEE, MADHUBANTI January 2004 (has links) No description available. Computer Science Physical-aware High Level Synthesis Design-space exploration Vertically Integrated 3D Systems

Search results