1 |
Architectural Support for High-Performance, Power-Efficient and Secure Multiprocessor SystemsAn, Baik Song 2012 August 1900 (has links)
High performance systems have been widely adopted in many fields and the demand for better performance is constantly increasing. And the need of powerful yet flexible systems is also increasing to meet varying application requirements from diverse domains. Also, power efficiency in high performance computing has been one of the major issues to be resolved. The power density of core components becomes significantly higher, and the fraction of power supply in total management cost is dominant. Providing dependability is also a main concern in large-scale systems since more hardware resources can be abused by attackers. Therefore, designing high-performance, power-efficient and secure systems is crucial to provide adequate performance as well as reliability to users.
Adhering to using traditional design methodologies for large-scale computing systems has a limit to meet the demand under restricted resource budgets. Interconnecting a large number of uniprocessor chips to build parallel processing systems is not an efficient solution in terms of performance and power. Chip multiprocessor (CMP) integrates multiple processing cores and caches on a chip and is thought of as a good alternative to previous design trends.
In this dissertation, we deal with various design issues of high performance multiprocessor systems based on CMP to achieve both performance and power efficiency while maintaining security. First, we propose a fast and secure off-chip interconnects through minimizing network overheads and providing an efficient security mechanism. Second, we propose architectural support for fast and efficient memory protection in CMP systems, making the best use of the characteristics in CMP environments and multi-threaded workloads. Third, we propose a new router design for network-on-chip (NoC) based on a new memory technique. We introduce hybrid input buffers that use both SRAM and STT-MRAM for better performance as well as power efficiency.
Simulation results show that the proposed schemes improve the performance of off-chip networks through reducing the message size by 54% on average. Also, the schemes diminish the overheads of bounds checking operations, thus enhancing the overall performance by 11% on average. Adopting hybrid buffers in NoC routers contributes to increasing the network throughput up to 21%.
|
2 |
Dynamic scheduling in multicore processorsRosas Ham, Demian January 2012 (has links)
The advent of multi-core processors, particularly with projections that numbers of cores will continue to increase, has focused attention on parallel programming. It is widely recognized that current programming techniques, including those that are used for scientific parallel programming, will not allow the easy formulation of general purpose applications. An area which is receiving interest is the use of programming styles which do not have side-effects. Previous work on parallel functional programming demonstrated the potential of this to permit the easy exploitation of parallelism. This thesis investigates a dynamic load balancing system for shared memory Chip Multiprocessors. This system is based on a parallel computing model called SLAM (Spreading Load with Active Messages), which makes use of functional language evaluation techniques. A novel hardware/software mechanism for exploiting fine grain parallelism is presented. This mechanism comprises a runtime system which performs dynamic scheduling and synchronization automatically when executing parallel applications. Additionally the interface for using this mechanism is provided in the form of an API. The proposed system is evaluated using cycle-level models and multithreaded applications running in a full system simulation environment.
|
3 |
Energy and Reliability in Future NOC Interconnected CMPSKim, Hyungjun 16 December 2013 (has links)
In this dissertation, I explore energy and reliability in future NoC (Network-on-Chip) interconnected CMPs (chip multiprocessors) as they have become a first-order constraint in future CMP design.
In the first part, we target the root cause of network energy consumption through techniques that reduce link and router-level switching activity. We specifically focus on memory subsystem traffic, as it comprises the bulk of NoC load in a CMP. By transmitting only the flits that contain words that we predicted would be useful using a novel spatial locality predictor, our scheme seeks to reduce network activity. We aim to further lower NoC energy consumption through microarchitectural mechanisms that inhibit datapath switching activity caused by unused words in individual flits. Using simulation-based performance studies and detailed energy models based on synthesized router designs and different link wire types, we show that (a) the pre- diction mechanism achieves very high accuracy, with an average rate of false-unused prediction of just 2.5%; (b) the combined NoC energy savings enabled by the predictor and microarchitectural support are 36% on average and up to 57% in the best case; and (c) there is no system performance penalty as a result of this technique.
In the second part, we present a method for dynamic voltage/frequency scaling of networks-on-chip and last level caches in CMP designs, where the shared resources form a single voltage/frequency domain. We develop a new technique for monitoring and control and validate it by running PARSEC benchmarks through full system simulations. These techniques reduce energy-delay product by 46% compared to a state-of-the-art prior work. In the third part, we develop critical path models for HCI- and NBTI-induced wear assuming stress caused under realistic workload conditions, and apply them onto the interconnect microarchitecture. A key finding from this modeling is that, counter to prevailing wisdom, wearout in the CMP on-chip interconnect is correlated with a lack of load observed in the NoC routers, rather than high load. We then develop a novel wearout-decelerating scheme in which routers under low load have their wearout-sensitive components exercised without significantly impacting the router’s cycle time, pipeline depth, and area or power consumption. We subsequently show that the proposed design yields a 13.8∼65× increase in CMP lifetime.
|
4 |
Scheduling Tasks on Heterogeneous Chip Multiprocessors with Reconfigurable HardwareTeller, Justin Stevenson 31 July 2008 (has links)
No description available.
|
5 |
Performance prediction for dynamic voltage and frequency scalingMiftakhutdinov, Rustam Raisovich 28 October 2014 (has links)
This dissertation proves the feasibility of accurate runtime prediction of processor performance under frequency scaling. The performance predictors developed in this dissertation allow processors capable of dynamic voltage and frequency scaling (DVFS) to improve their performance or energy efficiency by dynamically adapting chip or core voltages and frequencies to workload characteristics. The dissertation considers three processor configurations: the uniprocessor capable of chip-level DVFS, the private cache chip multiprocessor capable of per-core DVFS, and the shared cache chip multiprocessor capable of per-core DVFS. Depending on processor configuration, the presented performance predictors help the processor realize 72–85% of average oracle performance or energy efficiency gains. / text
|
6 |
Σχεδίαση και ανάπτυξη συστήματος κατανεμημένης διαμοιραζόμενης μνήμης για πολυεπεξεργαστή του ενός ολοκληρωμένου (CMP) / Design and development of a shared distributed memory system for a chip multiprocessor (CMP)Αδαμίδης, Ανδρέας 09 February 2009 (has links)
Αντικείμενο της παρούσας μεταπτυχιακής εργασίας είναι ο σχεδιασμός και η ανάπτυξη συστήματος κατανεμημένης διαμοιραζόμενης μνήμης ως τμήμα της αρχιτεκτονικής πολυεπεξεργαστικού συστήματος SiScape. Λόγω των ιδιαιτεροτήτων της αρχιτεκτονικής αυτής, το σύστημα μνήμης της και συγκεκριμένα η κρυφή μνήμη δευτέρου επιπέδου που καθιστά δυνατή τη λειτουργία του, κρίθηκε απαραίτητο να σχεδιαστεί και να αναπτυχθεί από το μηδέν, προκειμένου να ανταποκριθεί στις απαιτήσεις της. Ο σχεδιασμός της κρυφής μνήμης δευτέρου επιπέδου περιγράφηκε στη γλώσσα περιγραφής υλικού VHDL. / The purpose of this master thesis is the design and development of a shared distributed memory system as part of the multiprocessor architecture SiScape. Because of the architecture's irregular structure, it was imperative that the memory system and particularly the second level cache that enables its functionality, was designed from scratch, to fill all of its requirements. The design of the second level cache was described using the VHDL hardware description language.
|
7 |
A chip multiprocessor for a large-scale neural simulatorPainkras, Eustace January 2013 (has links)
A Chip Multiprocessor for a Large-scale Neural SimulatorEustace PainkrasA thesis submitted to The University of Manchesterfor the degree of Doctor of Philosophy, 17 December 2012The modelling and simulation of large-scale spiking neural networks in biologicalreal-time places very high demands on computational processing capabilities andcommunications infrastructure. These demands are difficult to satisfy even with powerfulgeneral-purpose high-performance computers. Taking advantage of the remarkableprogress in semiconductor technologies it is now possible to design and buildan application-driven platform to support large-scale spiking neural network simulations.This research investigates the design and implementation of a power-efficientchip multiprocessor (CMP) which constitutes the basic building block of a spikingneural network modelling and simulation platform. The neural modelling requirementsof many processing elements, high-fanout communications and local memoryare addressed in the design and implementation of the low-level modules in the designhierarchy as well as in the CMP. By focusing on a power-efficient design, the energyconsumption and related cost of SpiNNaker, the massively-parallel computation engine,are kept low compared with other state-of-the-art hardware neural simulators.The SpiNNaker CMP is composed of many simple power-efficient processors withsmall local memories, asynchronous networks-on-chip and numerous bespoke modulesspecifically designed to serve the demands of neural computation with a globallyasynchronous, locally synchronous (GALS) architecture.The SpiNNaker CMP, realised as part of this research, fulfills the demands of neuralsimulation in a power-efficient and scalable manner, with added fault-tolerancefeatures. The CMPs have, to date, been incorporated into three versions of SpiNNakersystem PCBs with up to 48 chips onboard. All chips on the PCBs are performing successfully, during both functional testing and their targeted role of neural simulation.
|
8 |
Evaluating the Scalability of SDF Single-chip Multiprocessor Architecture Using Automatically Parallelizing CodeZhang, Yuhua 12 1900 (has links)
Advances in integrated circuit technology continue to provide more and more transistors on a chip. Computer architects are faced with the challenge of finding the best way to translate these resources into high performance. The challenge in the design of next generation CPU (central processing unit) lies not on trying to use up the silicon area, but on finding smart ways to make use of the wealth of transistors now available. In addition, the next generation architecture should offer high throughout performance, scalability, modularity, and low energy consumption, instead of an architecture that is suitable for only one class of applications or users, or only emphasize faster clock rate. A program exhibits different types of parallelism: instruction level parallelism (ILP), thread level parallelism (TLP), or data level parallelism (DLP). Likewise, architectures can be designed to exploit one or more of these types of parallelism. It is generally not possible to design architectures that can take advantage of all three types of parallelism without using very complex hardware structures and complex compiler optimizations. We present the state-of-art architecture SDF (scheduled data flowed) which explores the TLP parallelism as much as that is supplied by that application. We implement a SDF single-chip multiprocessor constructed from simpler processors and execute the automatically parallelizing application on the single-chip multiprocessor. SDF has many desirable features such as high throughput, scalability, and low power consumption, which meet the requirements of the next generation of CPU design. Compared with superscalar, VLIW (very long instruction word), and SMT (simultaneous multithreading), the experiment results show that for application with very little parallelism SDF is comparable to other architectures, for applications with large amounts of parallelism SDF outperforms other architectures.
|
9 |
Location Cache Design and Performance Analysis for Chip MultiprocessorsNEMETH, JASON 19 September 2008 (has links)
No description available.
|
10 |
Adaptive Shared Cache Migration PolicyBien-aise, Hemsley 20 July 2010 (has links)
No description available.
|
Page generated in 0.1008 seconds