Spelling suggestions: "subject:"microarchitecture"" "subject:"microarchitectures""
1 |
Revisiting Wide Superscalar Microarchitecture / Révision de larges unités superscalairesMondelli, Andrea 12 September 2017 (has links)
Depuis plusieurs décennies, la fréquence des processeurs à usage général n'a cessé d'augmenter grâce aux transistors de plus en plus rapides et aux micro-architectures avec des pipelines plus profonds. Cependant il y a environ 10 ans, à cause des courants de fuite et de la température, la finesse de gravure des processeurs a atteint sa limite physique. Depuis, au lieu d'augmenter la fréquence du processeur, les fabricants ont intégré plus de cœurs sur une seule puce, agrandi la hiérarchie de caches et amélioré l'efficacité énergétique. Cependant, il est également important d'accélérer les processeurs individuellement.La réduction de la consommation énergétique est donc devenue un objectif majeur lors de la conception d'une micro-architecture pour la haute performance. Certaines fonctionnalités ont été introduites dans les unités superscalaires principalement pour réduire la consommation énergétique. Un exemple de fonctionnalité est le tampon de boucles ("loop buffer"), qui est maintenant mis en œuvre dans plusieurs micro-architectures superscalaires. Le but d'un tampon de boucle est d'économiser l'énergie dans le bloc avant du microprocesseur (cache d'instructions, prédicteur de branchements, décodeur, etc.) lors de l'exécution d'une boucle avec un corps assez petit pour tenir dans cette mémoire tampon spécifique. Si la fréquence du processeur reste constante, la seule possibilité laissée libre pour l'amélioration des performances des applications séquentielles dans les futurs processeurs est d'augmenter l'exploitation du parallélisme d'instructions. Certaines améliorations des micro-architectures (e.g., une meilleure prédiction de branchement) améliorent simultanément la performance et l'efficacité énergétique. Cependant, améliorer l'exploitation du parallélisme d'instructions a généralement un coût: augmentation de la surface de silicium, de la consommation d'énergie, des efforts de conception, etc. Par conséquent, la micro-architecture est modifiée lentement, incrément par incrément. En effet, les fabricants de processeurs ont fait des efforts continus afin d'exploiter davantage l'ILP avec de meilleurs prédicteurs de branchements, de meilleurs pré-chargeurs de données, de plus grandes fenêtres d'instructions, ajout de registres physiques, et ainsi de suite. Cette thèse décrit ce que devraient être les unités superscalaires dans les 10 ans à venir et explore la possibilité d'exploiter le comportement des boucles afin de réduire la consommation énergétique au-delà du bloc avant. Certaines propositions ont été publiées notamment sur les accélérateurs de boucles et sur les unités superscalaires à bloc arrière non conventionnel. Il est soutenu que la taille de la fenêtre d'instructions peut être augmentée en combinant le regroupement (clustering) et la spécialisation des registres d'écriture (register write specialization). Une différence majeure avec les précédentes études sur les micro-architectures en grappe est l'utilisation de grappes larges (wide issue clusters), contrairement aux études passées qui étaient principalement axées sur des petites grappes (narrow issue cluster). Le passage de petites grappes à des grappes larges n'est pas qu'un changement quantitatif, mais a aussi un impact qualitatif sur le problème de regroupement, et en particulier sur la politique de pilotage (steering policy). La seconde contribution propose deux optimisations indépendantes et orthogonales concernant la consommation énergétique et exploitant les boucles. La première optimisation détecte les micro-opérations redondantes produisant le même résultat à chaque itération puis supprime définitivement ces micro-opérations. La seconde optimisation se concentre sur la diminution de l'énergie consommée des micro-opérations de chargement, en détectant les situations où un chargement n'a pas besoin d'accéder à la file d'attente des enregistrements ou n'a pas besoin d'accéder au cache de données de niveau. / For several decades, the clock frequency of general purpose processors was growing thanks to faster transistors and microarchitectures with deeper pipelines. However, about 10 years ago, technology hit leakage power and temperature walls. Since then, the clock frequency of high-end processors did not increase. Instead of increasing the clock frequency, processor makers integrated more cores on a single chip, enlarged the cache hierarchy and improved energy efficiency. Putting more cores on a single chip has increased the total chip throughput and benefits some applications with thread-level parallelism. However, most applications have low thread-level parallelism. So having more cores is not sufficient. It is important also to accelerate individual threads. Moreover, reducing the energy consumption has become a major objective when designing a high-performance microarchitecture. Some microarchitecture features have been introduced in superscalar cores mainly for reducing energy. An example of such feature is the loop buffer, which is now implemented in several superscalar microarchitectures. The purpose of a loop buffer is to save energy in the core's front-end (instruction cache, branch predictor, decoder, etc.) when executing a loop with a body small enough to fit in the loop buffer. If the clock frequency remains constant, the only possibility left for higher single-thread performance in future processors is to exploit more ILP. Certain microarchitecture improvements (e.g., better branch predictor) simultaneously improve performance and energy efficiency. However, in general, exploiting more ILP has a cost in silicon area, energy consumption, design effort, etc. Therefore, the microarchitecture is modified slowly, incrementally, taking advantage of technology scaling. And indeed, processor makers have made continuous efforts to exploit more, with better branch predictors, better data prefetchers, larger instruction windows, more physical registers, and so forth. In this thesis, we try to depict what future superscalar cores may look like in 10 years and explore the possibility of exploiting loop behaviors to reduce energy consumption beyond the front-end. Some propositions have been published for loop accelerators or for unconventional superscalar core back-ends. I argue that the instruction window and the issue width can be augmented by combining clustering and register write specialization A major difference with past research on clustered microarchitecture is that I assume wide issue clusters, whereas past research mostly focused on narrow issue clusters. Going from narrow issue to wide issue clusters is not just a quantitative change, it has a qualitative impact on the clustering problem, in particular on the steering policy. We propose, in the second part of this thesis, two independent and orthogonal energy optimizations exploiting loops. The first optimization detects redundant micro-ops producing the same result on every iteration and removes these micro-ops completely. The second optimization focuses on the energy consumed by load micro-ops, detecting situations where a load does not need to access the store queue or does not need to access the level-1 data cache.
|
2 |
Microarchitecture-Aware Physical Planning for Deep Submicron TechnologyEkpanyapong, Mongkol 17 March 2006 (has links)
The main objective of this thesis is to develop a new design paradigm that combines microarchitecture design and circuit design with physical design for deep submicron technology. For deep submicron technology, wire delay will be the bottleneck for high performance microarchitecture design. Given the location information, inter-module latency can be calculated and hence, performance of the system can be estimated.
In this thesis, we present a novel microarchitectural floorplanning that can help computer architects tackle the wire delay problem early in the microarchitecture design stage. We show that by employing microarchitectural floorplanning up to 40\% performance gain can be achieved. We also extend the framework to include three dimensional integrated circuit (3D-IC). 3D-IC is a new integration technique that is also introduced to address the wire delay issue in deep submicron technology. By combining microarchitectural floorplanning with 3D-IC, we show that wire delay impact can be reduced substantially. We also show that not only the module location, but also the module size can impact the performance. Adaptive search engine is introduced to identify the right module size. Using our adaptive search engine, we show that the system can identify good module sizes that help improve the performance with a shorter run-time than the limited runtime brute force search.
Our microarchitecture-aware physical planning assumes that the target clock period can be achieved by inserting more flip-flops into the system. Inserting flip-flops along the wires can make the system meet the timing constraints without violating correctness of the circuit on that path because the function of the wire is to transfer signal from one location to another. However, inserting the flip-flop along the paths that consist of gates cannot guarantee the correctness of that path. A circuit optimization technique that allows flip-flop insertion along circuit path is called retiming. In this dissertation, We show that retiming can be used to achieve target clock period in microprocessor design. With the same target clock period, power reduction technique can be combined with retiming to help reduce the power consumption. We show that up to 34% power reduction can be achieved without timing violation. Furthermore, to tackle the problem of process variation in deep submicron, we also propose a modified retiming that can tolerate errors from statistical timing computation. We show that our statistical retiming algorithm provides close results to Monte-Carlo simulation results.
|
3 |
Effects of surface microstructure and nanostructure on osteoblast-like mg63 cell number, differentiation and local factor productionZhao, Ge 09 January 2004 (has links)
Surface roughness affects bone formation around orthopaedic implants in vivo and osteoblast functions in vitro. Osteoblast-like MG63 cells cultured on rough surfaces exhibited decreased cell number, increased differentiation and increased local factor production when compared to cells grow on smooth surfaces. In these experiments, roughness was characterized as average peak to valley height (Ra) which is not equal throughout the surface. Other features of roughness, including peak and valley area distributions and curvature of the valleys, will affect cell functions. In this study, novel titanium surfaces were prepared by photolithography to produce well designed microstructure and nanostructure. Smooth disks were made by producing craters of 10 micrometer, 30 micrometer and 100 micrometer diameters on titanium disks with constant curvatures. Craters were placed sparsely (10/1, 30/1, 100/1) or compactly (10/6, 30/6, 100/6). Smooth disks were also acid etched to make an overall roughness of Ra 0.7 micrometer or anodized to produce volcano-like nanostructure of Ra 0.4 micrometer. The results revealed the distinguishing contributions of microcrater size, crater spacing and nanostructures to surface effect on cell number, differentiation (alkaline phosphatase; osteocalcin) and local factor levels (TGF-beta1; PGE2). Cell attachment depends on crater spacing; cell growth and aggregation depend on crater dimension and cell morphology depends on the presence of nanostructural features. Cell differentiation and local factor production are modulated by acid etched roughness in concert with microstructure, and active TGF-beta1 level depends on nanoscale roughness.
|
4 |
Fully Distributed Register Files for Heterogeneous Clustered MicroarchitecturesBunchua, Santithorn 09 July 2004 (has links)
Conventional processor design utilizes a central register file and a bypass network to deliver operands to and from functional units, which cannot scale to a large number of functional units. As more functional units are integrated into a processor, the number of ports on a register file grows linearly while area, delay, and energy consumption grow even more rapidly. Physical properties of a bypass network scale in a similar manner.
In this dissertation, a fully distributed register file organization is presented to overcome this limitation by relying on small register files with fewer ports and localized operand bypasses. Unlike other clustered microarchitectures, each cluster features a small single-issue functional unit coupled with a small local register file. Several clusters are used, and each of them can be different. All register files are connected through a register transfer network that supports multicast communications. Techniques to support distributed register file operations are presented for both dynamically and statically scheduled processors. These include the eager and multicast register transfer mechanisms in the dynamic approach and the global data routing with multicasting algorithm in the static approach. Although this organizaiton requires additional cycles to execute a program, it is compensated by significant savings obtained through smaller area, faster operand access time, and lower energy consumption. With faster operating frequency and more efficient hardware implementation, overall performance can be improved.
Additionally, the fully distributed register file organization is applied to an ILP-SIMD processing element, which is the major building block of a massively parallel media processor array. The results show reduction in die area, which can be utilized to implement additional processing elements. Consequently, performance is improved through a higher degree of data parallelism through a larger processor array.
In summary, the fully distributed register file architecture permits future processors to scale to a large number of functional units. This is especially desirable in high-throughput processors such as wide-issue processors and multithreaded processors. Moreover, localized communication is highly desirable in the transition to future deep submicron technologies since long wire is a critical issue in processes with extremely small feature sizes.
|
5 |
Using machine-learning to efficiently explore the architecture/compiler co-design spaceDubach, Christophe January 2009 (has links)
Designing new microprocessors is a time consuming task. Architects rely on slow simulators to evaluate performance and a significant proportion of the design space has to be explored before an implementation is chosen. This process becomes more time consuming when compiler optimisations are also considered. Once the architecture is selected, a new compiler must be developed and tuned. What is needed are techniques that can speedup this whole process and develop a new optimising compiler automatically. This thesis proposes the use of machine-learning techniques to address architecture/compiler co-design. First, two performance models are developed and are used to efficiently search the design space of amicroarchitecture. These models accurately predict performance metrics such as cycles or energy, or a tradeoff of the two. The first model uses just 32 simulations to model the entire design space of new applications, an order of magnitude fewer than state-of-the-art techniques. The second model addresses offline training costs and predicts the average behaviour of a complete benchmark suite. Compared to state-of-the-art, it needs five times fewer training simulations when applied to the SPEC CPU 2000 and MiBench benchmark suites. Next, the impact of compiler optimisations on the design process is considered. This has the potential to change the shape of the design space and improve performance significantly. A new model is proposed that predicts the performance obtainable by an optimising compiler for any design point, without having to build the compiler. Compared to the state-of-the-art, this model achieves a significantly lower error rate. Finally, a new machine-learning optimising compiler is presented that predicts the best compiler optimisation setting for any new program on any new microarchitecture. It achieves an average speedup of 1.14x over the default best gcc optimisation level. This represents 61% of the maximum speedup available, using just one profile run of the application.
|
6 |
Energy-Efficient Pre-Execution Techniques in Two-Step Physical Register DeallocationANDO, Hideki, IWAMOTO, Kengo, HYODO, Kazunaga 01 November 2009 (has links)
No description available.
|
7 |
Gross Morphology, Microarchitecture, Strength and Evolution of the Hominoid Vertebral BodyCotter, Meghan Marie January 2011 (has links)
No description available.
|
8 |
High-Performance Crossbar Designs for Network-on-Chips (NoCs)Zhang, Yixuan 23 September 2010 (has links)
No description available.
|
9 |
Comprendre l'arthrose : analyse histomorphométrique de l'unité os-cartilage / Understanding Arthritis : histomorphometric analysis of the bone-cartilage unitCherief, Masnsen 15 December 2017 (has links)
L'importance de l'os sous-chondral dans la pathogenèse et la prise en charge de l'arthrose intéresse les cliniciens et la communauté scientifique. En effet, il existe des liens forts entre l'os sous-chondral et le cartilage, maintenant l'intégrité de ce dernier reposant sur l'os sous-chondral pour fournir un support mécanique et un soutien nutritionnel. Ici, nous avons étudié la relation entre les structures osseuses et cartilagineuses et l'approvisionnement vasculaire dans l'arthrose de la humaine.Nous avons recueilli 37 plateaux tibiaux arthrosiques prélevés après arthroplastie totale du genou. Dans ces mêmes plateaux, plusieurs carottes ont été prélevées et scannés par microtomographie. Les projections résultantes ont été reconstruites, puis segmentées manuellement pour séparer l'os sous-chondral de l'os trabéculaire et une analyse microarchitecturale a été développée sous les deux structures osseuses. Les échantillons ont été décalcifiés, coupés en sections de 16 heures, colorés dans de l'HES et classés en 6 groupes selon l'échelle OARSI. La surface de l'os sous-chondral et l'épaisseur et la surface du cartilage articulaire ont été cultivées. Le nombre de vaisseaux dans le sous-chondral a été compté par deux opérateurs différents et une coloration immunofluorescente avec du VEGF a été effectuée. Enfin, le cartilage, l'os sous-chondral et trabéculaire ont été utilisés pour mesurer les marqueurs ribonucléiques et protéiques liés à la vascularisation, l'innervation et l'inflammation.La microstructure de l'os a évolué au fur et à mesure que l'arthrose s'aggrave. L'os sous-chondral s'est épaissi et est devenu plus poreux. La fraction volumique osseuse, l'épaisseur trabéculaire, l'espacement et le nombre de trabécules ont été corrélés positivement avec le score OARSI. Une diminution significative du nombre de vaisseaux sanguins a été observée au dernier stade de l'arthrose. Enfin, les marqueurs ribonucléiques et protéiques liés à la vascularisation, à l'innervation et à l'inflammation ont été modulés au cours du développement de la pathologie. Pris ensemble, nos données montrent une interaction et des structures de soutien dynamiques entre l'os sous-chondral et le cartilage. La compréhension des voies de signalisation, l'unité biochimique du cartilage dans les articulations et la communication intercellulaire entre le cartilage et l'os sous-chondral peuvent mener à l'élaboration de stratégies plus efficaces pour traiter les patients souffrant d'arthrose. / The importance of subchondral bone in the pathogenesis and management of osteoarthritis retain the interest of clinicians and the scientific community. Indeed, there are strong links between the subchondral bone and the cartilage, maintaining the integrity of the latter resting on the subchondral bone to provide mechanical and nutritional support. Here, we investigated the relationship between bone and cartilage structures and vascular supply in human osteoarthritis.We collected 37 osteoarthritic tibial plates taken after total knee arthroplasty. In these same plates, several carrots were removed and scanned by microtomography. The resulting projections were reconstructed, then manually segmented to separate the subchondral bone from the trabecular bone and a microarchitectural analysis was done on both bone structures. The samples were decalcified, cut into 4 μm sections, stained in HES and classified into 6 groups according to the OARSI scale. The surface of the subchondral bone and the thickness and surface of the articular cartilage were measured. The number of vessels in the subchondral region was counted by two different operators and a VEGF immunofluorescent staining was performed. Finally, cartilage, subchondral and trabecular bone were used to measure ribonucleic and protein markers related to vascularization, innervation and inflammation.The microconstructure of the bone has evolved as osteoarthritis worsens. The subchondral bone has thickened and become more porous. Bone volume fraction, trabecular thickness, spacing and number of trabeculae were positively correlated with the OARSI score. A significant decrease in the number of blood vessels was observed in the last stage of osteoarthritis. Finally, ribonucleic and protein markers related to vascularization, innervation and inflammation were modulated during the development of the pathology. Taken together, our data show dynamic interaction and support structures between subchondral bone and cartilage. Understanding of signaling pathways, the biochemical unity of cartilage in the joints and intercellular communication between cartilage and subchondral bone can lead to the development of more effective strategies for treating patients with osteoarthritis.
|
10 |
Effets sur le tissu osseux (microarchitecture, densitométrie, biomécanique et remodelage) et sur le métabolisme lipidique de l'acide zolédronique et de l'exercice physique chez la rate ovariectomisée / Specific and combined effects of zoledronic acid and physical exercice in ovariectomized ratsLespessailles, Eric 20 January 2009 (has links)
L’objectif de ce travail était d’examiner chez la rate adulte ovariectomisée les effets osseux et sur le métabolisme lipidique de l’acide zolécronique et de l’exercice physique. Dans une première étude, les effets individuels et combinés de l’acide zolédronique 20 µg/kg une injection unique et de l’exercice physique (course dur tapis roulant pendant 12 semaines) ont été examinés sur la densité osseuse au corps entier et au fémur, l’analyse de la microarchitecture trabéculaires, les propriétés biomécaniques et le remodelage osseux. Les résultats montrent globalement que l’acide zolédronique prévient la dégradation microarchitecturale et l’hyperresorption induite par l’ovariectomie, que l’exercice physique maintient partiellement les propriétés biomécaniques et agit sur le remodelage osseux en augmentant la formation osseuse mais qu’aucun bénéfice additionnel ou synergique n’est trouvé sur le squelette osseux de la combinaison des deux interventions. La deuxième étude s’est intéressée aux effets sur le profil lipidique de l’action spécifique et combinée des deux interventions ci-dessus mentionnées. Si l’acide zolédronique et l’exercice de course sur tapis roulant modifient les taux de cholestérol total et de HDL cholestérol dans le sens d’une amélioration du risque d’athérosclérose, leurs effets associés ne sont pas synergiques et ont eu un effet paradoxal inverse possiblement expliqué par un effet pro-inflammatoire de la combinaison des deux interventions. / The aim of this study was to investigate in mature ovariectomized rats the effects on bone tissue and on the lipids metabolism of zoledronic acid and physical exercise. In this first study, the individual and combined effects of zoledronic acid (20 µg/kg a single injection) and physical exercise (treadmill running exercise during twelve week) have been examined on whole body and femur bone mineral density, on trabecular microarchitecture analysis, on bone strength parameters and on bone turnover. Results showned globally that zoledronic acid prevented the trabecular microarchitectural changes and the increase in resorption induced by ovariectomy. Treadmill running exercise particully maintained the bone strength and exerted its action by an increase in bone formation. However we dit not found any additive or synergistic effect of the two interventions combined on the rat skeletal status. The second study aimed to assess the specific and combined effects of zoledronic acid and treadmill running exercise on the lipid profile in this model of ovariectomized mature rats. If both zoledronic acid and treadmill running exercise modified total cholesterol and HDL cholesterol with an improvement of the atherosclerosis risk, their combined effects were not synergistic and furthermore they produced a paradoxical inverse effect possibly explained by a pro-inflammatory effect of the two interventions combined.
|
Page generated in 0.0777 seconds