Spelling suggestions: "subject:"pipelining."" "subject:"sidelining.""
41 |
Implementation and Evaluation of a DataPipeline for Industrial IoT Using ApacheNiFiVilhelmsson, Lina, Sjöberg, Pontus January 2020 (has links)
In the last few years, the popularity of Industrial IoT has grown a lot, and it is expected to have an impact of over 14 trillion USD on the global economy by 2030. One application of Industrial IoT is using data pipelining tools to move raw data from industry machines to data storage, where the data can be processed by analytical instruments to help optimize the industrial operations. This thesis analyzes and evaluates a data pipeline setup for Industrial IoT built with the tool Apache NiFi. A data flow setup was designed in NiFi, which connected an SQL database, a file system, and a Kafka topic to a distributed file system. To evaluate the NiFi data pipeline setup, some tests were conducted to see how the system performed under different workloads. The first test consisted of determining which size to merge a FlowFile into to get the lowest latency, the second test if data from the different data sources should be kept separate or be merged together. The third test was to compare the NiFi setup with an alternative setup, which had a Kafka topic as an intermediary between NiFi and the endpoint. The first test showed that the lowest latency was achieved when merging FlowFiles together into 10 kB files. In the second test, merging together FlowFiles from all three sources gave a lower latency than keeping them separate for larger merging sizes. Finally, it was shown that there was no significant difference between the two test setups.
|
42 |
Pipelining Of Double Precision Floating Point Division And Square Root Operations On Field-programmable Gate ArraysThakkar, Anuja 01 January 2006 (has links)
Many space applications, such as vision-based systems, synthetic aperture radar, and radar altimetry rely increasingly on high data rate DSP algorithms. These algorithms use double precision floating point arithmetic operations. While most DSP applications can be executed on DSP processors, the DSP numerical requirements of these new space applications surpass by far the numerical capabilities of many current DSP processors. Since the tradition in DSP processing has been to use fixed point number representation, only recently have DSP processors begun to incorporate floating point arithmetic units, even though most of these units handle only single precision floating point addition/subtraction, multiplication, and occasionally division. While DSP processors are slowly evolving to meet the numerical requirements of newer space applications, FPGA densities have rapidly increased to parallel and surpass even the gate densities of many DSP processors and commodity CPUs. This makes them attractive platforms to implement compute-intensive DSP computations. Even in the presence of this clear advantage on the side of FPGAs, few attempts have been made to examine how wide precision floating point arithmetic, particularly division and square root operations, can perform on FPGAs to support these compute-intensive DSP applications. In this context, this thesis presents the sequential and pipelined designs of IEEE-754 compliant double floating point division and square root operations based on low radix digit recurrence algorithms. FPGA implementations of these algorithms have the advantage of being easily testable. In particular, the pipelined designs are synthesized based on careful partial and full unrolling of the iterations in the digit recurrence algorithms. In the overall, the implementations of the sequential and pipelined designs are common-denominator implementations which do not use any performance-enhancing embedded components such as multipliers and block memory. As these implementations exploit exclusively the fine-grain reconfigurable resources of Virtex FPGAs, they are easily portable to other FPGAs with similar reconfigurable fabrics without any major modifications. The pipelined designs of these two operations are evaluated in terms of area, throughput, and dynamic power consumption as a function of pipeline depth. Pipelining experiments reveal that the area overhead tends to remain constant regardless of the degree of pipelining to which the design is submitted, while the throughput increases with pipeline depth. In addition, these experiments reveal that pipelining reduces power considerably in shallow pipelines. Pipelining further these designs does not necessarily lead to significant power reduction. By partitioning these designs into deeper pipelines, these designs can reach throughputs close to the 100 MFLOPS mark by consuming a modest 1% to 8% of the reconfigurable fabric within a Virtex-II XC2VX000 (e.g., XC2V1000 or XC2V6000) FPGA.
|
43 |
Des réseaux de processus cyclo-statiques à la génération de code pour le pipeline multi-dimensionnel / From Cyclo-Static Process Networks to Code Generation for Multidimensional Software PipeliningFellahi, Mohammed 22 April 2011 (has links)
Les applications de flux de données sont des cibles importantes de l’optimisation de programme en raison de leur haute exigence de calcul et la diversité de leurs domaines d’application: communication, systèmes embarqués, multimédia, etc. L’un des problèmes les plus importants et difficiles dans la conception des langages de programmation destinés à ce genre d’applications est comment les ordonnancer à grain fin à fin d’exploiter les ressources disponibles de la machine.Dans cette thèse on propose un "framework" pour l’ordonnancement à grain fin des applications de flux de données et des boucles imbriquées en général. Premièrement on essaye de paralléliser le nombre maximum de boucles en appliquant le pipeline logiciel. Après on merge le prologue et l’épilogue de chaque boucle (phase) parallélisée pour éviter l’augmentation de la taille du code. Ce processus est un pipeline multidimensionnel, quelques occurrences (ou instructions) sont décalées par des iterations de la boucle interne et d’autres occurrences (instructions) par des iterationsde la boucle externe. Les expériences montrent que l’application de cette technique permet l’amélioration des performances, extraction du parallélisme sans augmenter la taille du code, à la fois dans le cas des applications de flux des donnée et des boucles imbriquées en général. / Applications based on streams, ordered sequences of data values, are important targets of program optimization because of their high computational requirements and the diversity of their application domains: communication, embedded systems, multimedia, etc. One of the most important and difficult problems in special purpose stream language design and implementation is how to schedule these applications in a fine-grain way to exploit available machine resources In this thesis we propose a framework for fine-grain scheduling of streaming applications and nested loops in general. First, we try to pipeline steady state phases (inner loops), by finding the repeated kernel pattern, and executing actor occurrences in parallel as much as possible. Then we merge the kernel prolog and epilog of pipelined phases to move them out of the outer loop. Merging the kernel prolog and epilog means that we shift acotor occurrences, or instructions, from one phase iteration to another and from one outer loop iteration to another, a multidimensional shifting. Experimental shows that our framwork can imporove perfomance, prallelism extraction without increasing the code size, in streaming applications and nested loops in general.
|
44 |
Low Complexity and Low Power Bit-Serial Multipliers / Bitseriella multiplikatorer med låg komplexitet och låg effektförbrukningJohansson, Kenny January 2003 (has links)
<p>Bit-serial multiplication with a fixed coefficient is commonly used in integrated circuits, such as digital filters and FFTs. These multiplications can be implemented using basic components such as adders, subtractors and D flip-flops. Multiplication with the same coefficient can be implemented in many ways, using different structures. Other studies in this area have focused on how to minimize the number of adders/subtractors, and often assumed that the cost for D flip-flops is neglectable. That simplification has been proved to be far too great, and further not at all necessary. In digital devices low power consumption is always desirable. How to attain this in bit-serial multipliers is a complex problem. </p><p>The aim of this thesis was to find a strategy on how to implement bit-serial multipliers with as low cost as possible. An important step was achieved by deriving formulas that can be used to calculate the carry switch probability in the adders/subtractors. It has also been established that it is possible to design a power model that can be applied to all possible structures of bit- serial multipliers.</p>
|
45 |
Power Estimation of High Speed Bit-Parallel Adders / Effektestimering av snabba bitparallella adderareÅslund, Anders January 2004 (has links)
<p>Fast addition is essential in many DSP algorithms. Various structures have been introduced to speed up the time critical carry propagation. For high throughput applications, however, it may be necessary to introduce pipelining. In this report the power consumption of four different adder structures, with varying word length and different number of pipeline cuts, is compared. </p><p>Out of the four adder structures compared, the Kogge-Stone parallel prefix adder proves to be the best choice most of the time. The Brent-Kung parallel prefix adder is also a good choice, but the maximal throughput does not reach as high as the maximal throughput of the Kogge-Stone parallel prefix adder.</p>
|
46 |
A MOSCAP pipeline pseudo passive DACBehera, Prachee Shree 21 September 2005 (has links)
Graduation date: 2006 / The design of a 10-bit pipelined charge redistribution DAC employing MOSCAPs biased in their accumulation mode is presented in this thesis. A switched capacitor filter and output buffer have also been designed for the system. The effect of MOSCAP nonlinearity on the performance of the pipelined charge redistribution DAC has been analyzed. MOS capacitors and their models available for simulation have been discussed. In addition, the effect of more general capacitor nonlinearities on the performance of the DAC has been presented.
|
47 |
Low Complexity and Low Power Bit-Serial Multipliers / Bitseriella multiplikatorer med låg komplexitet och låg effektförbrukningJohansson, Kenny January 2003 (has links)
Bit-serial multiplication with a fixed coefficient is commonly used in integrated circuits, such as digital filters and FFTs. These multiplications can be implemented using basic components such as adders, subtractors and D flip-flops. Multiplication with the same coefficient can be implemented in many ways, using different structures. Other studies in this area have focused on how to minimize the number of adders/subtractors, and often assumed that the cost for D flip-flops is neglectable. That simplification has been proved to be far too great, and further not at all necessary. In digital devices low power consumption is always desirable. How to attain this in bit-serial multipliers is a complex problem. The aim of this thesis was to find a strategy on how to implement bit-serial multipliers with as low cost as possible. An important step was achieved by deriving formulas that can be used to calculate the carry switch probability in the adders/subtractors. It has also been established that it is possible to design a power model that can be applied to all possible structures of bit- serial multipliers.
|
48 |
Power Estimation of High Speed Bit-Parallel Adders / Effektestimering av snabba bitparallella adderareÅslund, Anders January 2004 (has links)
Fast addition is essential in many DSP algorithms. Various structures have been introduced to speed up the time critical carry propagation. For high throughput applications, however, it may be necessary to introduce pipelining. In this report the power consumption of four different adder structures, with varying word length and different number of pipeline cuts, is compared. Out of the four adder structures compared, the Kogge-Stone parallel prefix adder proves to be the best choice most of the time. The Brent-Kung parallel prefix adder is also a good choice, but the maximal throughput does not reach as high as the maximal throughput of the Kogge-Stone parallel prefix adder.
|
49 |
Modélisation compacte et conception de circuit à base d'injection de spin / Compact modeling and circuit design based on spin injectionAn, Qi 05 October 2017 (has links)
La technologie CMOS a contribué au développement de l'industrie des semi-conducteurs. Cependant, au fur et à mesure que le noeud technologique est réduit, la technologie CMOS fait face à des défis importants liés à la dissipation dûe aux courants de fuite et aux effets du canal court. Pour résoudre ce problème, les chercheurs se sont intéressés à la spintronique ces dernières années, compte tenu de la possibilité de fabriquer des dispositifs de taille réduite et d'opérations de faible puissance. La jonction tunnel magnétique (MTJ) est l'un des dispositifs spintroniques les plus importants qui peut stocker des données binaires grâce à la Magnétorésistance à effet tunnel (TMR). En dehors des applications de mémoire non volatile, la MTJ peut également être utilisée pour combiner ou remplacer les circuits CMOS pour implémenter un circuit hybride, de façon à combiner une faible consommation d'énergie et des performances à grande vitesse. Cependant, le problème de la conversion fréquente de charge en spin dans un circuit hybride peut entraîner une importante consommation d'énergie, ce qui obère l'intérêt pour des circuits hybrides. Par conséquent, le concept ASL qui repose sur un pur courant de spin comme support de l'information est proposé pour limiter les conversions entre charge et spin, donc pour réduire la consommation d'énergie. La conception de circuits à base de dispositif ASL entraîne de nombreux défis liés à l'hétérogénéité qu'ils introduisent et à l'espace de conception étendu à explorer. Par conséquent, cette thèse se concentre sur l'écart entre les exigences d'application au niveau du système et la fabrication des nanodispositifs. Au niveau du dispositif, nous avons développé un modèle compact intégrant le STT, la TMR, les effets d'injection/accumulation de spin, le courant de breakdown des canaux et le délai de diffusion de spin. Validé par comparaison avec les résultats expérimentaux, ce modèle permet d'explorer les paramètres du dispositif liés à la fabrication, tels que les longueurs de canaux et les tailles de MTJ, et aide les concepteurs à éviter leur destruction. De plus, ce modèle, décrit avec Verilog-A sur Cadence et divisé en plusieurs blocs : injecteur, détecteur, canal et contact, permet une conception indépendante et une optimisation des circuits ASL qui facilitent la conception de circuits hiérarchiques et complexes. En outre, les expressions permettant le calcul de l'injection/accumulation de spin pour le dispositif ASL utilisé sont dérivées. Elles permettent de discuter des phénomènes expérimentaux observés sur les dispositifs ASL. Au niveau circuit, nous avons développé une méthodologie de conception de circuit/système, en tenant compte de la distribution des canaux, de l'interconnexion des portes et des différents rapports de courant d'injection provoqués par la diffusion de spin. Avec les spécifications et les contraintes du circuit/système, les fonctions booléennes du circuit sont synthétisées en fonction de la méthode de synthèse développée et des paramètres de niveau de fabrication : longueur des canaux, et tailles MTJ sont spécifiées. Basé sur cette méthodologie développée, les circuits combinatoires de base qui forment une bibliothèque de circuits sont conçus et évalués en utilisant le modèle compact développé. Au niveau du système, un circuit DCT, un circuit de convolution et un système Intel i7 sont évalués en explorant les problèmes d'interconnexion : la répartition de l'interconnexion entre les portes et le nombre de tampons inséré. Avec des paramètres théoriques, les résultats montrent que le circuit/système ASL peut surpasser le circuit/système basé sur CMOS. De plus, le pipeline du circuit basé sur ASL est discuté avec MTJ comme tampons insérés entre les étapes. La reconfigurabilité provoquée par les polarités/valeurs du courant d'injection et les états des terminaux de control des circuits ASL sont également discutés avec l'exploration reconfigurable des circuits logiques de base. / The CMOS technology has tremendously affected the development of the semi-conductor industry. However, as the technology node is scaled down, the CMOS technology faces significant challenges set by the leakage power and the short channel effects. To cope with this problem, researchers pay their attention to the spintronics in recent years, considering its possibilities to allow smaller size fabrication and lower power operations. The magnetic tunnel junction (MTJ) is one of the most important spintronic devices which can store binary data based on Tunnel MagnetoResistance (TMR) effect. Except for the non-volatile memory, MTJ can be also used to combine with or replace the CMOS circuits to implement a hybrid circuit, for the potential to achieve low power consumption and high speed performance. However, the problem of frequent spin-charge conversion in a hybrid circuit may cause large power consumption, which diminishes the advantage of the hybrid circuits. Therefore, the ASL concept which uses a pure spin current to transport the information is proposed for fewer charge-spin conversions, thus for less power consumption. The design of ASL device-based circuits leads to numerous challenges related to the heterogeneity they introduce and the large design space to explore. Hence, this thesis focus on filling the gap between application requirements at the system level and the device fabrication at the device level. In device level, we developed a compact model integrating the STT, the TMR, the spin injection/accumulation effects, the channel breakdown current and the spin diffusion delay. Validated by comparing with experimental results, this model allows exploring fabrication-related device parameters such as channel lengths and MTJ sizes and help designers to prevent from device damages. Moreover, programmed with Verilog-A on Cadence and divided into several blocks: injector, detector, channel and contact devices, this model allows the independent design and cross-layer optimization of ASL-based circuits, that eases the design of hierarchical, complex circuits. Furthermore, the spin injection/accumulation expressions for the used ASL device are derived, enabling to discuss the experimental phenomena of the ASL device. In circuit level, we developed a circuit/system design methodology, taking into account the channel distribution, the gate interconnection and the different injection current ratios caused by the spin diffusion. With circuit/system specifications and constraints, the boolean functions of a circuit are synthesized based on the developed synthesis method and fabrication-level parameters: channel lengths, MTJ sizes are specified. Based on this developed methodology, basic combinational circuits that form a circuit library are designed and evaluated by using the developed compact model. In system level, a DCT circuit, a convolution circuit and an Intel i7 system are evaluated exploring the interconnection issues: interconnection distribution between gates and inserted buffer count. With theoretical parameters, results show that ASL-based circuit/system can outperform CMOS-based circuit/system. Moreover, the pipelining schema of the ASL-based circuit is discussed with MTJ as latches inserted between stages. The reconfigurability caused by the injection current polarities/values and the control terminal states of ASL-based circuits are also discussed with the reconfigurable exploration of basic logic circuits.
|
50 |
Návrh a implementace prostředků pro zvýšení výkonu procesoru / Design and Implementation of Mechanisms for Enhancing Performance of CPUZlatohlávková, Lucie January 2007 (has links)
This masters thesis is focused on the issue of processor architecture. The ground of this project is a design of a simple processor, which is enriched by modern components in processor architecture such as pipelining, cache memory and branch prediction. The processor has been made in VHDL programming language and was simulated in ModelSim simulation tool.
|
Page generated in 0.1104 seconds