Spelling suggestions: "subject:"microarchitecture"" "subject:"microarchitectures""
31 |
Microarchitectural techniques to reduce energy consumption in the memory hierarchyGhosh, Mrinmoy 03 April 2009 (has links)
This thesis states that dynamic profiling of the memory reference stream can improve energy
and performance in the memory hierarchy. The research presented in this theses provides
multiple instances of using lightweight hardware structures to profile the memory
reference stream. The objective of this research is to develop microarchitectural techniques
to reduce energy consumption at different levels of the memory hierarchy. Several simple
and implementable techniques were developed as a part of this research. One of the
techniques identifies and eliminates redundant refresh operations in DRAM and reduces
DRAM refresh power. Another, reduces leakage energy in L2 and higher level caches for
multiprocessor systems. The emphasis of this research has been to develop several techniques
of obtaining energy savings in caches using a simple hardware structure called the
counting Bloom filter (CBF). CBFs have been used to predict L2 cache misses and obtain
energy savings by not accessing the L2 cache on a predicted miss. A simple extension of
this technique allows CBFs to do way-estimation of set associative caches to reduce energy
in cache lookups. Another technique using CBFs track addresses in a Virtual Cache and
reduce false synonym lookups. Finally this thesis presents a technique to reduce dynamic
power consumption in level one caches using significance compression. The significant
energy and performance improvements demonstrated by the techniques presented in this
thesis suggest that this work will be of great value for designing memory hierarchies of
future computing platforms.
|
32 |
StreamWorks: An Energy-efficient Embedded Co-processor for Stream ComputingJanuary 2014 (has links)
abstract: Stream processing has emerged as an important model of computation especially in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The kernels are computationally intensive and are mainly characterized by real-time constraints that demand high throughput and data bandwidth with limited global data reuse. Conventional architectures fail to meet these demands due to their poorly matched execution models and the overheads associated with instruction and data movements.
This work presents StreamWorks, a multi-core embedded architecture for energy-efficient stream computing. The basic processing element in the StreamWorks architecture is the StreamEngine (SE) which is responsible for iteratively executing a stream kernel. SE introduces an instruction locking mechanism that exploits the iterative nature of the kernels and enables fine-grain instruction reuse. Each instruction in a SE is locked to a Reservation Station (RS) and revitalizes itself after execution; thus never retiring from the RS. The entire kernel is hosted in RS Banks (RSBs) close to functional units for energy-efficient instruction delivery. The dataflow semantics of stream kernels are captured by a context-aware dataflow execution mode that efficiently exploits the Instruction Level Parallelism (ILP) and Data-level parallelism (DLP) within stream kernels.
Multiple SEs are grouped together to form a StreamCluster (SC) that communicate via a local interconnect. A novel software FIFO virtualization technique with split-join functionality is proposed for efficient and scalable stream communication across SEs. The proposed communication mechanism exploits the Task-level parallelism (TLP) of the stream application. The performance and scalability of the communication mechanism is evaluated against the existing data movement schemes for scratchpad based multi-core architectures. Further, overlay schemes and architectural support are proposed that allow hosting any number of kernels on the StreamWorks architecture. The proposed oevrlay schemes for code management supports kernel(context) switching for the most common use cases and can be adapted for any multi-core architecture that use software managed local memories.
The performance and energy-efficiency of the StreamWorks architecture is evaluated for stream kernel and application benchmarks by implementing the architecture in 45nm TSMC and comparison with a low power RISC core and a contemporary accelerator. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2014
|
33 |
Design and Implementation of Single Issue DSP Processor CoreRavinath, Vinodh January 2007 (has links)
Micro processors built specifically for digital signal processing are DSP processors. DSP is one of the core technologies in rapidly growing applications like communications and audio processing. The estimated growth of DSP processors in the last 6 years is over 40%. The variety of DSP capable processors for various applications also increased with the rising popularity of DSP processors. The design flow and architecture of such processors are not commonly available to students for learning. This report is a structured approach to design and implementation of an embedded DSP processor core for voice, audio and video codec. The report focuses on the design requirement specification, senior instruction set and assembly manual release, micro architecture design and implementation of the core. Details about the core verification are also included in this report. The instruction set of this processor supports running basic kernels of BDTI benchmarking.
|
34 |
Automatic Generation Of Compiled Cycle Level Microarchitecture Simulators For Superspeculative ProcessorsChandran, Priya 06 1900 (has links) (PDF)
No description available.
|
35 |
The Performance Cost of SecurityBowen, Lucy R 01 June 2019 (has links)
Historically, performance has been the most important feature when optimizing computer hardware. Modern processors are so highly optimized that every cycle of computation time matters. However, this practice of optimizing for performance at all costs has been called into question by new microarchitectural attacks, e.g. Meltdown and Spectre. Microarchitectural attacks exploit the effects of microarchitectural components or optimizations in order to leak data to an attacker. These attacks have caused processor manufacturers to introduce performance impacting mitigations in both software and silicon.
To investigate the performance impact of the various mitigations, a test suite of forty-seven different tests was created. This suite was run on a series of virtual machines that tested both Ubuntu 16 and Ubuntu 18. These tests investigated the performance change across version updates and the performance impact of CPU core number vs. default microarchitectural mitigations. The testing proved that the performance impact of the microarchitectural mitigations is non-trivial, as the percent difference in performance can be as high as 200%.
|
36 |
Investigating the Cause and Effect of an AMD Zen Energy Management Anomalyvon Elm, Christian, Ilsche, Thomas, Schöne, Robert, Bielert, Mario, Schmidl, Markus 23 April 2021 (has links)
This paper discusses an architectural anomaly observed on server processors of the AMD Zen microarchitecture: At a specific operating point, increasing the number of active cores reduces system power consumption while increasing performance more than proportionally to the additional cores. The occurrence of the anomaly is rooted in the hardware control loop for energy management and software-independent. Experiments show a connection to the AMD turbo frequency feature Max Core Boost Frequency (MCBF). In less efficient configurations, this feature could be employed from a processor’s perspective, even though it is not necessarily used on any core. Voltage measurements indicate that the availability of MCBF leads to a higher voltage from mainboard voltage regulators, subsequently raising power consumption unnecessarily. We describe the impact of this anomaly on the performance and energy-efficiency of several micro-benchmarks. The reduced power consumption when additional cores are enabled can lead to higher core frequencies and increased per-core-performance. The presented findings can be used to avoid inefficient core configurations and reduce the overall energy-to-solution.
|
37 |
Micro-architectured materials for thermal management : Porous graphite/graphene boiling enhancement structuresGhaderidosst, Melody January 2022 (has links)
The convergence of the digital and physical world encourages advances in high-speed telecommunication and fifth generation technology. Two-phase heat transfer systems are common engineering solutions. However, due to the large frequency spectra in 5G, the systematic heat generation increases requiring more efficient thermal management. The surface characteristics of solid materials in these systems is vital making micro-architectured materials a novel pathway to improve heat transfer. The coefficient of thermal expansion and thermal conductivity of the Schoen-Gyroid, a triply periodic minimal surface structure is studied along with a classical cylindrical porous structure. Graphite and graphene are considered as materials with excellent thermal and mechanical properties and are thus the base materials considered in this project. A comprehensive manufacturability study was conducted in order to gain knowledge regarding different graphite/graphene options and it was concluded that commercially available isotropic graphite was the best suited material for the purpose of this project. A decoupled thermo-mechanical analysis of the coefficient of thermal expansion and thermal conductivity of said structures as a function of volume fraction was conducted using computational homogenization with finite element analysis. A linearly elastic constitutive material model in COMSOL Multiphysics was used. As expected, the homogenized effective material is governed by linear constitutive model. Moreover, the results displayed a linear dependency on the porosity for both the CTE and thermal conductivity. The mechanical FEM model was validated using an analytical model derived by Gibson and Ashby and the thermal conductivity FEM model was validated using experimental data.
|
38 |
Évaluation de la microarchitecture trabéculaire et des propriétés mécaniques osseuses in vivo chez l’humain par scanner périphérique a haute résolution : application clinique à l’ostéoporose / In vivo assessment of trabecular microarchitecture and bone biomechanical properties by high resolution peripheral quantitative tomography : application to osteoporosisVilayphiou, Nicolas 16 December 2010 (has links)
La microarchitecture osseuse est un des déterminants de la qualité osseuse qui peut maintenant être évaluée in vivo au radius et au tibia distaux avec une résolution isotropique de 82μm par un nouveau scanner à haute résolution (XtremeCT, SCANCO Medical AG). Par ailleurs, l’utilisation d’analyse en éléments finis sur les volumes 3D obtenus permet d’évaluer les propriétés biomécaniques de l’os comme la résistance osseuse. Nous avons montré qu’il s’agissait d’une technique prometteuse pour évaluer la densité, la microarchitecture et les propriétés biomécaniques osseuses au niveau des sites périphériques, notamment parce que ces mesures étaient associées chez la femme avec des fractures ostéoporotiques de toutes sortes. Nous avons également montré que les mêmes mesures étaient tout aussi pertinentes chez l’homme, alors qu’il est moins sujet à l’ostéoporose. Les résultats étaient associés aux fractures ostéoporotiques de toutes sortes, notamment les fractures vertébrales. L’analyse en éléments finis permet donc la mesure in vivo de la résistance osseuse, ce qui pourrait fournir des informations sur la fragilité osseuse et le risque de fracture non accessible par les seules mesures de densité ou de microarchitecture osseuse. / Bone microarchitecture is one of the determinants of bone quality that can now be evaluated in vivo at the distal radius and tibia with an isotropic resolution of 82μm with a new high-resolution peripheral scanner (XtremeCT, SCANCO Medical AG). Moreover, the use of finite element analysis on the 3D bone volume acquired allows the assessment of bone biomechanical properties such as bone strength. Our studies show that this technique is promising to assess bone density, microarchitecture and strength at peripheral skeletal sites. Indeed those measures were associated with osteoporotic fractures of all kinds in women. We also demonstrated that those same measures were associated with osteoporotic fractures of all kinds, including vertebral fractures, in men, who are less prone to be affected by osteoporosis. Finite element analysis allows in vivo measurement of bone strength, which might provide additional information about bone fragility and fracture risk that are not assessed by measures of density or microarchitecture.
|
39 |
Consommation chronique d'alcool, exercice physique et tissu osseux : modifications densitométriques, architecturales, biomécaniques et métaboliques chez le rat / Chronic alcohol consumption, physical exercise and bone tissue : densitometric, microarchitectural, biomechanic and metabolic changes in the ratMaurel, Delphine 24 November 2011 (has links)
La consommation d’alcool a des effets sur le tissu osseux. L’alcoolisme est une des causes d’ostéoporose secondaire chez l’homme. Dans ce travail nous avons mené différentes expérimentations chez le rat afin d’étudier les effets d’une consommation chronique d’alcool combinée ou non à un entraînement aérobie sur le tissu osseux. Nous avons montré qu’une faible dose d’alcool administrée pendant une période courte peut avoir un effet positif sur la densité minérale osseuse et l’épaisseur trabéculaire. En revanche, la combinaison activité physique et consommation modérée d’alcool n’a pas d’effet additif sur la potentialisation du tissu osseux. Nous avons également démontré un effet dose de l’alcool indiquant des effets délétères majorés sur la densité minérale osseuse (DMO), la microarchitecture corticale et la résistance osseuse avec des apports croissants (25%, 30% et 35% v/v). La modification de DMO s’accompagne d’un changement de composition corporelle et d’une diminution de la leptine systémique. Cependant, le nombre d’adipocytes augmente dans la moelle osseuse. Nous avons mis en évidence dans ce modèle d’ostéoporose secondaire due à l’alcool une augmentation de l’apoptose des ostéocytes, corrélée à la diminution de la DMO et à l’augmentation de l’adiposité médullaire. Nous avons de plus mis en évidence une incorporation de lipides dans les ostéocytes, incorporation fortement corrélée à l’apoptose de ces cellules. Enfin, nos résultats montrent qu’un exercice physique régulier combiné à une consommation chronique et excessive d’alcool permet de prévenir les effets délétères de l’alcool sur les paramètres osseux (porosité corticale, épaisseur corticale) et limite la diminution de la DMO. Cette diminution est associée à une régulation de l’apoptose des ostéocytes. / Heavy chronic alcohol consumption has deleterious effects on bone tissue. It is one of the major causes of secondary osteoporosis in men. In this work, we draw several experimentations to assess the effects of chronic alcohol consumption on bone, combined or not to an aerobic training in the rat. We showed that light to moderate chronic alcohol consumption during a short time lead to an increase of bone mineral density (BMD) and trabecular thickness, whith no additive effects of physical exercise on bone tissue. When the alcohol doses were increased, we showed deleterious effects on BMD, microarchitecture, bone resistance with a dose effect with increasing alcohol doses (25%, 30% and 35% v/v): the more alcohol was concentrated and the more the bone parameters were decreased. The BMD decrease was associated with a change in body composition, and with a decrease in serum leptin. However, the number of lipid droplets in the bone marrow was increased dramatically. We demonstrated that there was a huge increase in osteocyte apoptosis with alcohol (35% v/v) in this alcohol-induced osteoporosis model, which was correlated with BMD and bone marrow adiposity. We have also shown that there was lipid incorporation in bone micro vessels and in osteocytes, which was correlated with osteocyte apoptosis. Lastly, we showed that when regular exercise was associated with heavy chronic alcohol consumption, the bone parameters were normal (trabecular, cortical thickness, femur length) and the BMD was less decreased compared to alcohol-fed and sedentary rats. These effects were associated with a regulation of osteocyte apoptosis.
|
40 |
Scalable Low Power Issue Queue And Store Queue Design For Superscalar ProcessorsVivekanandham, Rajesh 12 1900 (has links)
A Large instruction window is a key requirement to exploit greater Instruction Level Parallelism in out-of-order superscalar processors. Along with the instruction window size, the size of various other structures including the issue queue, store queue and register file need to increase as well. However, the cycle time and energy consumption of conventional large monolithic Content Addressable Memories (CAMs), the underlying structure of most conventional issue queue and store queue designs, worsen rapidly with an increase in size. This results in a three way trade-off involving ILP, clock frequency and energy consumption. In this thesis, we propose efficient designs for the issue queue and the store queue that improve the circuit latency and energy consumption while minimizing the loss in IPC.
We propose the Scalable Low power Issue Queue (SLIQ) design which segments the issue queue structure to reduce the latency. This is complemented with a fast Wakeup index to a consumer in the issue queue for every instruction. As this consumer instruction can be woken up directly, without any delay, this mitigates the IPC loss faced by the pipelined issue queue. Also, as the scheme incorporates a pipelined broadcast, the indices are not required for correctness and can simply be gang invalidated on branch mispredictions. The IPC loss of an 8 segment SLIQ is Within 2.3% for the entire SPEC CPU2000 benchmark suite while achieving a 39.3% reduction in issue latency. Further, in the SLIQ design unnecessary broadcasts to
the higher segments are avoided most of the time as in a large majority of the cases,
an instruction has a single consumer. This consumer is woken up either by direct indexing or by broadcast in the first segment of the SLIQ. This enables the 8 segment SLIQ to significantly reduce the energy consumption and the energy-delay product by 48.3% and 67.4% respectively on an average. SLIQ also allows the architects to segment the issue queue carefully so that the latency of the issue logic is just within
the per pipeline stage latency goals of the design.
We also propose the Scalable Low power Store Queue (SLSQ) to address similar problems associated with the store queue data forwarding logic. We extend the state-
of-the-art Store Vector based Disambiguator to also predict the index of the store that will forward to a given load. SLSQ marginally adds to the hardware budget,
but predicts the store queue index of the store which will forward with an accuracy
of 99.5% on an average. SLSQ, thus, eliminates unnecessary address broadcasts and
Compares and reduces energy consumption of the store-to-load forwarding logic by
78.4% and 91.6% for the SPEC Int and FP suites respectively. Another variant of
SLSQ, eliminates the need for a CAM in the forwarding logic and achieves a 49.9%
reduction in store to load data forwarding latency while incurring a minimal IPC
loss less than 0.1% on average for the entire SPEC CPU2000 benchmark suite.
|
Page generated in 0.0469 seconds