11 |
The Effects of Microprocessor Architecture on Speedup in Distrbuted Memory SupercomputersBeane, Glen L. January 2004 (has links) (PDF)
No description available.
|
12 |
Express lanes modification to the data vortex photonic all-optical path interconnection networkBozek, Matthew Peter 19 May 2008 (has links)
Today s supercomputers require interconnection networks with high bandwidth and low latency to exploit parallelism. The data vortex is an all optical path interconnection network defined and then proven to achieve high level of message acceptance and low levels of message latency. In this thesis research, three enhancements to the data vortex are defined and tested for performance. They are compared to an unmodified data vortex using the average latency and offered traffic acceptance rates as metrics. Minimal angle counts are established where express lane enhancements are established. An express lane enhancement allows exploitation of locality yielding an 8% to 12 % reduction in average latency and a 4% to 6% increase in message acceptance. Semi-Express lanes cannot effectively exploit locality but still yield a 20% increase in message acceptance and a 4% decrease in average latency. Express outputs can exploit locality for a 28% to 32% increase in message acceptance and 12% to 15% decrease in average latency.
|
13 |
Low-power high-performance register file design for chip multiprocessorsKhasawneh, Shadi Turki. January 2006 (has links)
Thesis (M.S.)--State University of New York at Binghamton, Department of Computer Science, Thomas J. Watson School of Engineering and Applied Science, 2006. / Includes bibliographical references.
|
14 |
Hardware implementation of a synchronization state buffer in VHDLBarton, Jonathan L. January 2008 (has links)
Thesis (M.E.E.)--University of Delaware, 2008. / Principal faculty advisor: Guang R. Gao, Dept. of Electrical and Computer Engineering. Includes bibliographical references.
|
15 |
Design and implementation of tool-chain framework to support OpenMP single source compilation on cell platformJiang, Yi. January 2008 (has links)
Thesis (M.S.)--University of Delaware, 2007. / Principal faculty advisor: Guang R. Gao, Dept. of Electrical and Computer Engineering. Includes bibliographical references.
|
16 |
Machine Learning for AI-Augmented Design Space Exploration of Computer SystemsKwon, Jihye January 2022 (has links)
Advanced and emerging computer systems, ranging from supercomputers to embedded systems, feature high performance, energy efficiency, acceleration, and specialization. Design of such systems involves ever-increasing circuit complexity and architectural diversity. Commercial high-end processors, realized as very-large-scale integration circuits, have integrated exponentially increasing number of transistors on a chip over many decades. Along with the evolution of semiconductor manufacturing technology, another driving force behind the progress of processors has been the development of computer-aided design (CAD) software tools. Logic synthesis and physical design (LSPD) tool-chains allow designers to describe the computer system at the register-transfer level of abstraction and automatically convert the description into an integration circuit layout. The slowdown of technology scaling, on the other hand, has motivated the emergence of dark silicon and heterogeneous architectures with application-specific hardware accelerators. Design of various accelerators is facilitated by high-level synthesis (HLS) tools that translate a behavioral description of a computer system into a structural register-transfer level one. CAD approaches have evolved towards raising the level of design abstraction and providing more options to optimize the architecture.
For each system synthesized via advanced CAD tools, designers explore the design space in search of optimal configurations of the tool options and architectural choices, also called 𝘬𝘯𝘰𝘣𝘴. These knobs affect the execution of CAD algorithms and eventually impact the multi-dimensional 𝘲𝘶𝘢𝘭𝘪𝘵𝘺-𝘰𝘧-𝘳𝘦𝘴𝘶𝘭𝘵 (𝘘𝘰𝘙) of the final implementation. During design-space exploration (DSE), designers leverage their experience and expertise pertaining to determining the relationship between knobs and QoR. To further reduce the number of time and resource consuming CAD runs during DSE, a large number of heuristic and model-based approaches have been proposed. More recently, the rise of machine learning (ML) and artificial intelligence (AI) has prompted the possibility of AI-augmented DSE which exploits ML techniques to predict the knobs-QoR relationship. Yet, existing heuristic and ML-based approaches still require a sufficient number of CAD runs for each system because they do not accumulate and exploit experiential knowledge across the systems as designers would do.
To expand the potential of AI-augmented DSE and push the frontier forward, multiple challenges arise due to the characteristics of CAD flows. 1) Whereas many ML applications utilize data obtained from huge collections of users' input and public databases for a single problem, the QoR-prediction problem for each system suffers from limited availability of data obtained from expensive CAD runs. Especially, an industrial LSPD tool-chain specifies hundreds of separate knobs, resulting in an extreme curse of dimensionality. 2) Different systems exhibit different knobs-QoR relationship. Hence, learning from previously explored systems needs to be preceded by identifying distinct systems and relating them to one another. Often, it is difficult to obtain an efficient representation of a system. 3) Designers often apply different sets of knob configurations to different systems, which makes it harder to learn from previous DSE results. Especially in HLS, the heterogeneity of various systems leads to broad knob heterogeneity across them. To address these challenges and boost the ML performance, I propose to flexibly connect the elements of the many QoR-prediction problems with one another. My thesis is that 𝘵𝘩𝘦 𝘦𝘹𝘱𝘭𝘰𝘳𝘢𝘵𝘪𝘰𝘯 𝘰𝘧 𝘵𝘩𝘦 𝘥𝘦𝘴𝘪𝘨𝘯 𝘴𝘱𝘢𝘤𝘦 𝘰𝘧 𝘢 𝘤𝘰𝘮𝘱𝘶𝘵𝘦𝘳 𝘴𝘺𝘴𝘵𝘦𝘮 𝘤𝘢𝘯 𝘣𝘦 𝘦𝘧𝘧𝘦𝘤𝘵𝘪𝘷𝘦𝘭𝘺 𝘢𝘶𝘨𝘮𝘦𝘯𝘵𝘦𝘥 𝘣𝘺 𝘢𝘳𝘵𝘪𝘧𝘪𝘤𝘪𝘢𝘭 𝘪𝘯𝘵𝘦𝘭𝘭𝘪𝘨𝘦𝘯𝘤𝘦 𝘷𝘪𝘢 𝘭𝘦𝘢𝘳𝘯𝘪𝘯𝘨 𝘧𝘳𝘰𝘮 𝘵𝘩𝘦 𝘦𝘹𝘱𝘦𝘳𝘪𝘦𝘯𝘤𝘦 𝘸𝘪𝘵𝘩 𝘵𝘩𝘦 𝘥𝘦𝘴𝘪𝘨𝘯 𝘢𝘯𝘥 𝘰𝘱𝘵𝘪𝘮𝘪𝘻𝘢𝘵𝘪𝘰𝘯 𝘰𝘧 𝘰𝘵𝘩𝘦𝘳 𝘴𝘺𝘴𝘵𝘦𝘮𝘴.
For LSPD of industrial high-performance processors, I propose a novel collaborative recommender system approach that learns hidden features from the interactions (CAD runs) of many \textit{users} (systems) and \textit{items} (knob configurations). To cope with the curse of dimensionality, the item features are decomposed into features of item attributes (knobs). The combined model predicts QoR for each user-item pair. For HLS of application-specific accelerators, I present a series of neural network models in the order of evolution towards the proposed mixed-sharing \textit{transfer learning} model. Transfer learning aims at leveraging knowledge gained from previous problems; however, due to the system and knob heterogeneities, the model needs to distinguish which piece of that knowledge should be transferred. The proposed ML approaches aim to not only use experiential knowledge as designers do but also to ultimately assist designers by providing alternative insights and suggesting optimization possibilities for new systems. As an effort in this direction, I develop an AI-augmented DSE tool that exploits the aforementioned models and \textit{generates} recommended knob configurations for new target systems. Through this research, I investigate the potential of next-level AI-augmented DSE with the goal of promoting secure collaborative engineering in the CAD community without the need of sharing confidential information and intellectual properties.
|
17 |
High-Concurrency Visualization on SupercomputersNouanesengsy, Boonthanome 30 August 2012 (has links)
No description available.
|
18 |
Superscalar Processor Models Using Statistical LearningJoseph, P J 04 1900 (has links)
Processor architectures are becoming increasingly complex and hence architects have to evaluate a large design space consisting of several parameters, each with a number of potential settings. In order to assist in guiding design decisions we develop simple and accurate models of the superscalar processor design space using a detailed and validated superscalar processor simulator.
Firstly, we obtain precise estimates of all significant micro-architectural parameters and their interactions by building linear regression models using simulation based experiments. We obtain good approximate models at low simulation costs using an iterative process in which Akaike’s Information Criteria is used to extract a good linear model from a small set of simulations, and limited further simulation is guided by the model using D-optimal experimental designs. The iterative process is repeated until desired error bounds are achieved. We use this procedure for model construction and show that it provides a cost effective scheme to experiment with all relevant parameters.
We also obtain accurate predictors of the processors performance response across the entire design-space, by constructing radial basis function networks from sampled simulation experiments. We construct these models, by simulating at limited design points selected by latin hypercube sampling, and then deriving the radial neural networks from the results. We show that these predictors provide accurate approximations to the simulator’s performance response, and hence provide a cheap alternative to simulation while searching for optimal processor design points.
|
19 |
A parallel external memory system /Nikseresht, Mohammad Reza, January 1900 (has links)
Thesis (M.C.S.) - Carleton University, 2007. / Includes bibliographical references (p. 77-84). Also available in electronic format on the Internet.
|
20 |
Towards a high performance parallel library to compute fluid flexible structures interactionsNagar, Prateek 08 April 2015 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / LBM-IB method is useful and popular simulation technique that is adopted ubiquitously
to solve Fluid-Structure interaction problems in computational
fluid dynamics.
These problems are known for utilizing computing resources intensively while solving
mathematical equations involved in simulations. Problems involving such interactions
are omnipresent, therefore, it is eminent that a faster and accurate algorithm
exists for solving these equations, to reproduce a real-life model of such complex analytical
problems in a shorter time period. LBM-IB being inherently parallel, proves
to be an ideal candidate for developing a parallel software. This research focuses
on developing a parallel software library, LBM-IB based on the algorithm proposed
by [1] which is first of its kind that utilizes the high performance computing abilities
of supercomputers procurable today. An initial sequential version of LBM-IB is developed
that is used as a benchmark for correctness and performance evaluation of
shared memory parallel versions. Two shared memory parallel versions of LBM-IB
have been developed using OpenMP and Pthread library respectively. The OpenMP
version is able to scale well enough, as good as 83% speedup on multicore machines
for <=8 cores. Based on the profiling and instrumentation done on this version, to
improve the data-locality and increase the degree of parallelism, Pthread based data
centric version is developed which is able to outperform the OpenMP version by 53%
on manycore machines. A distributed version using the MPI interfaces on top of
the cube based Pthread version has also been designed to be used by extreme scale
distributed memory manycore systems.
|
Page generated in 0.0521 seconds