• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 2
  • 1
  • Tagged with
  • 9
  • 4
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Latch-based Performance Optimization for FPGAs

Teng, Xiao 16 August 2012 (has links)
We explore using pulsed latches for timing optimization -- a first in the academic FPGA community. Pulsed latches are transparent latches driven by a clock with a non-standard (i.e. not 50%) duty cycle. As latches are already present on commercial FPGAs, their use for timing optimization can avoid the power or area drawbacks associated with other techniques such as clock skew and retiming. We propose algorithms that automatically replace certain flip-flops with latches for performance gains. Under conservative short path or minimum delay assumptions, our latch-based optimization, operating on already routed designs, provides all the benefit of clock skew in most cases and increases performance by 9%, on average, essentially for "free". We show that short paths greatly hinder the ability of using pulsed latches, and further improvements in performance are possible by increasing the delay of certain short paths.
2

Latch-based Performance Optimization for FPGAs

Teng, Xiao 16 August 2012 (has links)
We explore using pulsed latches for timing optimization -- a first in the academic FPGA community. Pulsed latches are transparent latches driven by a clock with a non-standard (i.e. not 50%) duty cycle. As latches are already present on commercial FPGAs, their use for timing optimization can avoid the power or area drawbacks associated with other techniques such as clock skew and retiming. We propose algorithms that automatically replace certain flip-flops with latches for performance gains. Under conservative short path or minimum delay assumptions, our latch-based optimization, operating on already routed designs, provides all the benefit of clock skew in most cases and increases performance by 9%, on average, essentially for "free". We show that short paths greatly hinder the ability of using pulsed latches, and further improvements in performance are possible by increasing the delay of certain short paths.
3

Retiming Smoke Simulation Using Machine Learning

Giraud Carrier, Samuel Charles Gérard 24 March 2020 (has links)
Art-directability is a crucial aspect of creating aesthetically pleasing visual effects that help tell stories. A particularly common method of art direction is the retiming of a simulation. Unfortunately, the means of retiming an existing simulation sequence which preserves the desired shapes is an ill-defined problem. Naively interpolating values between frames leads to visual artifacts such as choppy frames or jittering intensities. Due to the difficulty in formulating a proper interpolation method we elect to use a machine learning approach to approximate this function. Our model is based on the ODE-net structure and reproduces a set of desired time samples (in our case equivalent to time steps) that achieves the desired new sequence speed, based on training from frames in the original sequence. The flexibility of the updated sequences' duration provided by the time samples input makes this a visually effective and intuitively directable way to retime a simulation.
4

Algorithmique du décalage d'instructions

Huard, Guillaume 06 December 2001 (has links) (PDF)
L'évolution constante des processeurs vers des architectures proposant des capacités superscalaires, de parallélisme au niveau des instructions, de prédiction, de spéculation et la multiplication des niveaux de hiérarchie mémoire donnent de plus en plus d'importance au travail du compilateur.<br />Dans cette thèse, nous nous intéressons aux transformations du programme source destinées à l'optimisation dans la chaîne de compilation, et plus particulièrement à une transformation appelée décalage d'instructions.<br />Cette transformation sert de base au pipeline logiciel, elle a une influence sur le parallélisme au niveau des instructions et l'utilisation des registres.<br />Elle intervient également comme composante des techniques de parallélisation de boucles par ordonnancement affine.<br />Nous avons voulu mieux comprendre les perspectives offertes par le décalage d'instructions, savoir quels objectifs il permettait d'atteindre mais aussi savoir quels problèmes de décalage restaient difficiles.<br />Pour cela nous avons étudié le décalage d'instructions dans plusieurs contextes plus ou moins proches, et apporté des contributions à chacun d'entre eux.<br /><br />Dans le cadre du pipeline logiciel, nous proposons un algorithme polynomial pour déterminer le décalage le plus à même de produire un maximum de parallélisme au niveau des instructions, et une étude expérimentale de l'efficacité absolue de la technique à l'aide de l'outil logiciel que nous avons réalisé dans ce but : PASTAGA (pour Plate-forme d'Analyse Statistique et de Tests d'Algorithmes sur Graphes Aléatoires).<br />Dans le cadre de l'utilisation des registres (stage scheduling), de la parallélisation de boucle et de la localité, nous apportons des réponses aux problèmes de décalage d'instructions associés~: complexité, solutions exactes, approximations.
5

Adder Minimization and Retiming in Parallel FIR-Filters : Targeting Power Consumption in ASICs

Månsson, Jens January 2021 (has links)
Parallelized implementations of FIR-filters are often used to meet throughput and power requirements. The most common methods to optimize coefficient multiplication in FIR-filters are developed for single rate filters, thus the added redundancy of parallel implementations cannot be utilized in the optimization. In this work optimization methods utilizing the redundancy of parallel filter implementations are evaluated for a set of low-pass and interpolation filters. Results show that the proposed methods offer parallelization with less than linear increases in hardware for several evaluated filters with up to 47% reduction in adder count compared to conventional methods. Furthermore, an optimization algorithm for retiming of algorithmic delays is evaluated both with and without pipelining. Synthesis results show that the retiming algorithm can reduce the power consumption with up to 48% without added latency for high throughput applications.
6

SYSTEM-LEVEL COSYNTHESIS OF TRANSFORMATIVE APPLICATIONS FOR HETEROGENEOUS HARDWARE-SOFTWARE ARCHITECTURES

CHATHA, KARAMVIR SINGH January 2001 (has links)
No description available.
7

Actionable Traffic Signal Performance Measures from Large-scale Vehicle Trajectory Analysis

Enrique Daniel Saldivar Carranza (10223855) 19 July 2023 (has links)
<p>Road networks are significantly affected by traffic signal operations, which contribute from 5% to 10% of all traffic delay in the United States. It is therefore important for agencies to systematically monitor signal performance to identify locations where operations do not function as desired and where mobility could be improved.</p> <p><br></p> <p>Currently, most signal performance evaluations are derived from infrastructure-based Automated Traffic Signal Performance Measures (ATSPMs). These performance measures rely on high-resolution detector and phase information that is collected at 10 Hz and reported via TCP/IP connections. Even though ATSPMs have proven to be a valid approach to estimate signal performance, significant initial capital investment required for infrastructure deployment can represent an obstacle for agencies attempting to scale these techniques. Further, fixed vehicle detection zones can create challenges in the accuracy and extent of the calculated performance measures.</p> <p><br></p> <p>High-resolution connected vehicle (CV) trajectory data has recently become commercially available. With over 500 billion vehicle position records generated each month in the United States, this data set provides unique opportunities to derive accurate signal performance measures without the need for infrastructure upgrades. This dissertation provides a comprehensive suite of CV-based techniques to generate actionable and scalable traffic signal performance measures.</p> <p><br></p> <p>Turning movements of vehicles at intersections are automatically identified from attributes included in the commercial CV data set to facilitate movement-level analyses. Then, a trajectory-based visualization from which relevant performance measures can be extracted is presented. Subsequently, methodologies to identify signal retiming opportunities are discussed. An approach to evaluate closely-coupled intersections, which is particularly challenging with detector-based techniques, is then presented. Finally, a data-driven methodology to enhance the scalability of trajectory-based traffic signal performance estimations by automatically mapping relevant intersection geometry components is provided.</p> <p><br></p> <p>The trajectory data processing procedures provided in this dissertation can aid agencies make data-driven decisions on resource allocation and signal system modifications. The presented techniques are transferable to any location where CV data is available, and the scope of analysis can be easily varied from the movement to intersection, corridor, region, and state level.</p>
8

Throughput Constrained and Area Optimized Dataflow Synthesis for FPGAs

Sun, Hua 21 February 2008 (has links) (PDF)
Although high-level synthesis has been researched for many years, synthesizing minimum hardware implementations under a throughput constraint for computationally intensive algorithms remains a challenge. In this thesis, three important techniques are studied carefully and applied in an integrated way to meet this challenging synthesis requirement. The first is pipeline scheduling, which generates a pipelined schedule that meets the throughput requirement. The second is module selection, which decides the most appropriate circuit module for each operation. The third is resource sharing, which reuses a circuit module by sharing it between multiple operations. This work shows that combining module selection and resource sharing while performing pipeline scheduling can significantly reduce the hardware area, by either using slower, more area-efficient circuit modules or by time-multiplexing faster, larger circuit modules, while meeting the throughput constraint. The results of this work show that the combined approach can generate on average 43% smaller hardware than possible when a single technique (resource sharing or module selection) is applied. There are four major contributions of this work. First, given a fixed throughput constraint, it explores all feasible frequency and data introduction interval design points that meet this throughput constraint. This enlarged pipelining design space exploration results in superior hardware architectures than previous pipeline synthesis work because of the larger sapce. Second, the module selection algorithm in this work considers different module architectures, as well as different pipelining options for each architecture. This not only addresses the unique architecture of most FPGA circuit modules, it also performs retiming at the high-level synthesis level. Third, this work proposes a novel approach that integrates the three inter-related synthesis techniques of pipeline scheduling, module selection and resource sharing. To the author's best knowledge, this is the first attempt to do this. The integrated approach is able to identify more efficient hardware implementations than when only one or two of the three techniques are applied. Fourth, this work proposes and implements several algorithms that explore the combined pipeline scheduling, module selection and resource sharing design space, and identifies the most efficient hardware architecture under the synthesis constraint. These algorithms explore the combined design space in different ways which represents the trade off between algorithm execution time and the size of the explored design space.
9

Reducing Power in FPGA Designs Through Glitch Reduction

Rollins, Nathaniel Hatley 27 February 2007 (has links) (PDF)
While FPGAs provide flexibility for performing high performance DSP functions, they consume a significant amount of power. Often, a large portion of the dynamic power is wasted on unproductive signal glitches. Reducing glitching reduces dynamic energy consumption. In this study, retiming is used to reduce the unproductive energy wasted in signal glitches. Retiming can reduce energy by up to 92%. Evaluating energy consumption is an important part of energy reduction. In this work, an activity rate-based power estimation tool is introduced to provide FPGA architecture independent energy estimations at the gate level. This tool can accurately estimate power consumption to within 13% on average. This activation rate-based tool and retiming are combined in a single algorithm to reduce energy consumption of FPGA designs at the gate level. In this work, an energy evaluation metric called energy area delay is used to weigh the energy reduction and clock rate improvements gained from retiming against the area and latency costs. For a set of benchmark designs, the algorithm that combines retiming and the activation rate-based power estimator reduces power on average by 40% and improves clock rate by 54% for an average 1.1x area cost and a 1.5x latency increase.

Page generated in 0.0789 seconds