Spelling suggestions: "subject:"multithreaded"" "subject:"threading""
41 |
Exploiting Speculative and Asymmetric Execution on Multicore ArchitecturesWamhoff, Jons-Tobias 27 March 2015 (has links) (PDF)
The design of microprocessors is undergoing radical changes that affect the performance and reliability of hardware and will have a high impact on software development. Future systems will depend on a deep collaboration between software and hardware to cope with the current and predicted system design challenges. Instead of higher frequencies, the number of processor cores per chip is growing. Eventually, processors will be composed of cores that run at different speeds or support specialized features to accelerate critical portions of an application. Performance improvements of software will only result from increasing parallelism and introducing asymmetric processing. At the same time, substantial enhancements in the energy efficiency of hardware are required to make use of the increasing transistor density. Unfortunately, the downscaling of transistor size and power will degrade the reliability of the hardware, which must be compensated by software.
In this thesis, we present new algorithms and tools that exploit speculative and asymmetric execution to address the performance and reliability challenges of multicore architectures. Our solutions facilitate both the assimilation of software to the changing hardware properties as well as the adjustment of hardware to the software it executes. We use speculation based on transactional memory to improve the synchronization of multi-threaded applications. We show that shared memory synchronization must not only be scalable to large numbers of cores but also robust such that it can guarantee progress in the presence of hardware faults. Therefore, we streamline transactional memory for a better throughput and add fault tolerance mechanisms with a reduced overhead by speculating optimistically on an error-free execution. If hardware faults are present, they can manifest either in a single event upset or crashes and misbehavior of threads. We address the former by applying transactions to checkpoint and replicate the state such that threads can correct and continue their execution.
The latter is tackled by extending the synchronization such that it can tolerate crashes and misbehavior of other threads. We improve the efficiency of transactional memory by enabling a lightweight thread that always wins conflicts and significantly reduces the overheads. Further performance gains are possible by exploiting the asymmetric properties of applications. We introduce an asymmetric instrumentation of transactional code paths to enable applications to adapt to the underlying hardware. With explicit frequency control of individual cores, we show how applications can expose their possibly asymmetric computing demand and dynamically adjust the hardware to make a more efficient usage of the available resources.
|
42 |
Exploiting Speculative and Asymmetric Execution on Multicore ArchitecturesWamhoff, Jons-Tobias 21 November 2014 (has links)
The design of microprocessors is undergoing radical changes that affect the performance and reliability of hardware and will have a high impact on software development. Future systems will depend on a deep collaboration between software and hardware to cope with the current and predicted system design challenges. Instead of higher frequencies, the number of processor cores per chip is growing. Eventually, processors will be composed of cores that run at different speeds or support specialized features to accelerate critical portions of an application. Performance improvements of software will only result from increasing parallelism and introducing asymmetric processing. At the same time, substantial enhancements in the energy efficiency of hardware are required to make use of the increasing transistor density. Unfortunately, the downscaling of transistor size and power will degrade the reliability of the hardware, which must be compensated by software.
In this thesis, we present new algorithms and tools that exploit speculative and asymmetric execution to address the performance and reliability challenges of multicore architectures. Our solutions facilitate both the assimilation of software to the changing hardware properties as well as the adjustment of hardware to the software it executes. We use speculation based on transactional memory to improve the synchronization of multi-threaded applications. We show that shared memory synchronization must not only be scalable to large numbers of cores but also robust such that it can guarantee progress in the presence of hardware faults. Therefore, we streamline transactional memory for a better throughput and add fault tolerance mechanisms with a reduced overhead by speculating optimistically on an error-free execution. If hardware faults are present, they can manifest either in a single event upset or crashes and misbehavior of threads. We address the former by applying transactions to checkpoint and replicate the state such that threads can correct and continue their execution.
The latter is tackled by extending the synchronization such that it can tolerate crashes and misbehavior of other threads. We improve the efficiency of transactional memory by enabling a lightweight thread that always wins conflicts and significantly reduces the overheads. Further performance gains are possible by exploiting the asymmetric properties of applications. We introduce an asymmetric instrumentation of transactional code paths to enable applications to adapt to the underlying hardware. With explicit frequency control of individual cores, we show how applications can expose their possibly asymmetric computing demand and dynamically adjust the hardware to make a more efficient usage of the available resources.
|
43 |
Mapping Concurrent Applications to Multiprocessor Systems with Multithreaded Processors and Network on Chip-Based InterconnectionsPop, Ruxandra January 2011 (has links)
Network on Chip (NoC) architectures provide scalable platforms for designing Systems on Chip (SoC) with large number of cores. Developing products and applications using an NoC architecture offers many challenges and opportunities. A tool which can map an application or a set of applications to a given NoC architecture will be essential. In this thesis we first survey current techniques and we present our proposals for mapping and scheduling of concurrent applications to NoCs with multithreaded processors as computational resources. NoC platforms are basically a special class of Multiprocessor Embedded Systems (MPES). Conventional MPES architectures are mostly bus-based and, thus, are exposed to potential difficulties regarding scalability and reusability. There has been a lot of research on MPES development including work on mapping and scheduling of applications. Many of these results can also be applied to NoC platforms. Mapping and scheduling are known to be computationally hard problems. A large range of exact and approximate optimization algorithms have been proposed for solving these problems. The methods include Branch-and–Bound (BB), constructive and transformative heuristics such as List Scheduling (LS), Genetic Algorithms (GA) and various types of Mathematical Programming algorithms. Concurrent applications are able to capture a typical embedded system which is multifunctional. Concurrent applications can be executed on an NoC which provides a large computational power with multiple on-chip computational resources. Improving the time performances of concurrent applications which are running on Network on Chip (NoC) architectures is mainly correlated with the ability of mapping and scheduling methodologies to exploit the Thread Level Parallelism (TLP) of concurrent applications through the available NoC parallelism. Matching the architectural parallelism to the application concurrency for obtaining good performance-cost tradeoffs is another aspect of the problem. Multithreading is a technique for hiding long latencies of memory accesses, through the overlapped execution of several threads. Recently, Multi-Threaded Processors (MTPs) have been designed providing the architectural infrastructure to concurrently execute multiple threads at hardware level which, usually, results in a very low context switching overhead. Simultaneous Multi-Threaded Processors (SMTPs) are superscalar processor architectures which adaptively exploit the coarse grain and the fine grain parallelism of applications, by simultaneously executing instructions from several thread contexts. In this thesis we make a case for using SMTPs and MTPs as NoC resources and show that such a multiprocessor architecture provides better time performances than an NoC with solely General-purpose Processors (GP). We have developed a methodology for task mapping and scheduling to an NoC with mixed SMTP, MTP and GP resources, which aims to maximize the time performance of concurrent applications and to satisfy their soft deadlines. The developed methodology was evaluated on many configurations of NoC-based platforms with SMTP, MTP and GP resources. The experimental results demonstrate that the use of SMTPs and MTPs in NoC platforms can significantly speed-up applications.
|
44 |
Anatomy of a GUI (Graphical User Interface) Application for Rexx ProgrammersFlatscher, Rony G. 03 1900 (has links) (PDF)
Creating for the first time GUI (graphical user interface) applications is an endeavor that can be most challenging. This article introduces the general concepts of GUIs and the need to interact with GUI elements only on the so called "GUI thread". The
concepts pertain to GUI applications written for Windows, Linux and MacOS alike.
Using Java libraries for creating Rexx GUI applications makes these Rexx GUI
applications totally platform independent. Taking advantage of BSF4ooRexx even
the powerful JavaFX GUI libraries can be exploited by pure Rexx, allowing Rexx
programmers to create the most demanding and complex GUI applications in an
unparalleled easiness in an astonishing short period of time.
The introduced GUI concepts will be demonstrated with short nutshell examples
exploiting the JavaFX GUI libraries, empowering the Rexx programmers with the ability to create stable and error free GUI applications in Rexx.
|
45 |
Evaluations of the parallel extensions in .NET 4.0Islam, Md. Rashedul, Islam, Md. Rofiqul, Mazumder, Tahidul Arafhin January 2011 (has links)
Parallel programming or making parallel application is a great challenging part of computing research. The main goal of parallel programming research is to improve performance of computer applications. A well-structured parallel application can achieve better performance in terms of execution speed over sequential execution on existing and upcoming parallel computer architecture. This thesis named "Evaluations of the parallel extensions in .NET 4.0" describes the experimental evaluation of different parallel application performance with thread-safe data structure and parallel constructions in .NET Framework 4.0. Described different performance issues of this thesis help to make efficient parallel application for better performance. Before describing the experimental evaluation, this thesis describes some methodologies relevant to parallel programming like Parallel computer architecture, Memory architectures, Parallel programming models, decomposition, threading etc. It describes the different APIs in .NET Framework 4.0 and the way of coding for making an efficient parallel application in different situations. It also presents some implementations of different parallel constructs or APIs like Static Multithreading, Using ThreadPool, Task, Parallel.For, Parallel.ForEach, PLINQ etc. The evaluation of parallel application has been done by experimental result evaluation and performance measurements. In most of the cases, the result evaluation shows better performance of parallelism like less execution time and increase CPU uses over traditional sequential execution. In addition parallel loop doesn’t show better performance in case of improper partitioning, oversubscription, improper workloads etc. The discussion about proper partitioning, oversubscription and proper work load balancing will help to make more efficient parallel application. / Program: Magisterutbildning i informatik
|
46 |
Quantifying the impacts of disabling speculation and relaxing the scheduling loop in multithreaded processorsLoew, Jason. January 2006 (has links)
Thesis (M.S.)--State University of New York at Binghamton, Thomas J. Watson School of Engineering and Applied Sciences, Department of Computer Science, 2006. / Includes bibliographical references.
|
47 |
Network Processor specific Multithreading tradeoffsBoivie, Victor January 2005 (has links)
<p>Multithreading is a processor technique that can effectively hide long latencies that can occur due to memory accesses, coprocessor operations and similar. While this looks promising, there is an additional hardware cost that will vary with for example the number of contexts to switch to and what technique is used for it and this might limit the possible gain of multithreading.</p><p>Network processors are, traditionally, multiprocessor systems that share a lot of common resources, such as memories and coprocessors, so the potential gain of multithreading could be high for these applications. On the other hand, the increased hardware required will be relatively high since the rest of the processor is fairly small. Instead of having a multithreaded processor, higher performance gains could be achieved by using more processors instead.</p><p>As a solution, a simulator was built where a system can effectively be modelled and where the simulation results can give hints of the optimal solution for a system in the early design phase of a network processor system. A theoretical background to multithreading, network processors and more is also provided in the thesis.</p>
|
48 |
Heterogeneity-awareness in multithreaded multicore processorsAcosta Ojeda, Carmelo Alexis 07 July 2009 (has links)
During the last decades, Computer Architecture has experienced a great series of revolutionary changes. The increasing transistor count on a single chip has led to some of the main milestones in the field, from the release of the first Superscalar (1965) to the state-of-the-art Multithreaded Multicore Architectures, like the Intel Core i7 (2009).Moore's Law has continued for almost half of a century and is not expected to stop for at least another decade, and perhaps much longer. Moore observed a trend in the process technology advances. So, the number of transistors that can be placed inexpensively on an integrated circuit has increased exponentially, doubling approximately every two years. Nevertheless, having more available transistors can not be always directly translated into having more performance.The complexity of state-of-the-art software has reached heights unthinkable in prior ages, both in terms of the amount of computation and the complexity involved. If we deeply analyze this complexity in software we would realize that software is comprised of smaller execution processes that, although maintaining certain spatial/temporal locality, imply an inherently heterogeneous behavior. That is, during execution time the hardware executes very different portions of software, with huge differences in terms of behavior and hardware requirements. This heterogeneity in the behaviour of the software is not specific of the latest videogame, but it is inherent to software programming itself, since the very beginning of Algorithmics.In this PhD dissertation we deeply analyze the inherent heterogeneity present in software behavior. We identify the main issues and sources of this heterogeneity, that hamper most of the state-of-the-art processor designs from obtaining their maximum potential. Hence, the heterogeneity in software turns most of the current processors, commonly called general-purpose processors, into overdesigned. That is, they have much more hardware resources than really needed to execute the software running on them. This fact would not represent a main problem if we were not concerned on the additional power consumption involved in software computation.The final goal of this PhD dissertation consists in assigning each portion of software exactly the amount of hardware resources really needed to fully exploit its maximal potential; without consuming more energy than the strictly needed. That is, obtaining complexity-effective executions using the inherent heterogeneity in software behavior as steering indicator. Thus, we start deeply analyzing the heterogenous behaviour of the software run on top of general-purpose processors and then matching it on top of a heterogeneously distributed hardware, which explicitly exploit heterogeneous hardware requirements. Only by being heterogeneity-aware in software, and appropriately matching this software heterogeneity on top of hardware heterogeneity, may we effectively obtain better processor designs.The PhD dissertation is comprised of four main contributions that cover both multithreaded single-core (hdSMT) and multicore (TCA Algorithm, hTCA Framework and MFLUSH) scenarios, deeply explained in their corresponding chapters in the PhD dissertation memory. Overall, these contributions cover a significant range of the Heterogeneity-Aware Processors' design space. Within this design space, we have focused on the state-of-the-art trend in processor design: Multithreaded Multicore (CMP+SMT) Processors.We make special emphasis on the MPsim simulation tool, specifically designed and developed for this PhD dissertation. This tool has already gone beyond this PhD dissertation, becoming a reference tool by an important group of researchers spread over the Computer Architecture Department (DAC) at the Polytechnic University of Catalonia (UPC), the Barcelona Supercomputing Center (BSC) and the University of Las Palmas de Gran Canaria (ULPGC).
|
49 |
Efficient shared cache management in multicore processorsXie, Yuejian 20 May 2011 (has links)
In modern multicore processors, various resources (such as memory bandwidth and caches) are designed to be shared by concurrently running threads. Though it is good to be able to run multiple programs on a single chip at the same time, sometimes the contention of these shared resources can create problems for system performance. Naive hard-partitioning between threads can result in low resource utilization. This research shows that simple and effective approaches to dynamically manage the shared cache can be achieved. The contributions of this work are the following: (1) a technique for dynamic on-line classification of application memory access behaviors to predict the usefulness of cache partitioning, and a simple shared-cache management approach based on the classification; (2) a cache pseudo-partitioning technique that manipulates insertion and promotion policies; (3) a scalable algorithm to quickly decide per-core cache allocations; (4) pseudo-LRU cache partition approximation; (5) a dynamic shared cache compression technique that considers different thread behaviors.
|
50 |
Design and implementation of a multithreaded softcore processor with tightly coupled hardware real-time operating systemWijesinghe, Terance Prabhasara. January 1900 (has links)
Thesis (M.S.)--West Virginia University, 2008. / Title from document title page. Document formatted into pages; contains ix, 107 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 101-107).
|
Page generated in 0.0818 seconds