Global ETD Search

61	Microarchitectural Level Power Analysis And Optimization In Single Chip Parallel Computers Ramachandran, Priyadarshini 29 July 2004 (has links) As device technologies migrate into Deep Submicron (DSM) feature sizes, high-performance power-efficient computer architectures that keep pace with improving technologies need to be explored. Technology scaling increases the effects of wire latencies, inductive effects, noise and crosstalk in on-chip communication, limiting the performance of DSM designs. Power efficient performance gains from Instruction Level Parallelism (ILP) are reaching a limit. Single-Chip Parallel Computers are promising solutions to the DSM design challenges and the performance limitations of ILP. These systems are explicitly modular architectures that efficiently support Thread Level Parallelism (TLP) while avoiding global signals and shared resources. Microarchitectural level power analysis is required for evaluating the feasibility of newly conceived architectures in terms of power dissipation and energy efficiency. Accounting for power in the early stages of design shortens the time-to-market due to reduced design iteration times. Power optimizations at the architectural level can yield large power savings. This thesis proposes a microarchitectural level power estimation and analysis infrastructure for Single Chip Parallel Computers. The power estimation tool and the analysis methodology are developed based on the Single Chip Message-Passing Parallel (SCMP) Computer and can be extended to other Single Chip Parallel Computers. The thesis focuses on the development of power estimation models, construction of the power analysis tool, study of the power advantages of the architecture and identification of subsystems requiring power optimization. / Master of Science SCMP Leakage Current Switching Capacitance Single Chip Parallel Computers Architectural Level Analysis Power Estimation
62	Analog "Neuronal" Networks in Early Vision Koch, Christof, Marroquin, Jose, Yuille, Alan 01 June 1985 (has links) Many problems in early vision can be formulated in terms of minimizing an energy or cost function. Examples are shape-from-shading, edge detection, motion analysis, structure from motion and surface interpolation (Poggio, Torre and Koch, 1985). It has been shown that all quadratic variational problems, an important subset of early vision tasks, can be "solved" by linear, analog electrical or chemical networks (Poggio and Koch, 1985). IN a variety of situateions the cost function is non-quadratic, however, for instance in the presence of discontinuities. The use of non-quadratic cost functions raises the question of designing efficient algorithms for computing the optimal solution. Recently, Hopfield and Tank (1985) have shown that networks of nonlinear analog "neurons" can be effective in computing the solution of optimization problems. In this paper, we show how these networks can be generalized to solve the non-convex energy functionals of early vision. We illustrate this approach by implementing a specific network solving the problem of reconstructing a smooth surface while preserving its discontinuities from sparsely sampled data (Geman and Geman, 1984; Marroquin 1984; Terzopoulos 1984). These results suggest a novel computational strategy for solving such problems for both biological and artificial vision systems. analog networks analog-digital hardware parallel computers ssurface interpolation surface reconstruction optimization problem sregularization theory early vision
63	An Attractor Memory Model of Neocortex Johansson, Christopher January 2006 (has links) This thesis presents an abstract model of the mammalian neocortex. The model was constructed by taking a top-down view on the cortex, where it is assumed that cortex to a first approximation works as a system with attractor dynamics. The model deals with the processing of static inputs from the perspectives of biological mapping, algorithmic, and physical implementation, but it does not consider the temporal aspects of these inputs. The purpose of the model is twofold: Firstly, it is an abstract model of the cortex and as such it can be used to evaluate hypotheses about cortical function and structure. Secondly, it forms the basis of a general information processing system that may be implemented in computers. The characteristics of this model are studied both analytically and by simulation experiments, and we also discuss its parallel implementation on cluster computers as well as in digital hardware. The basic design of the model is based on a thorough literature study of the mammalian cortex’s anatomy and physiology. We review both the layered and columnar structure of cortex and also the long- and short-range connectivity between neurons. Characteristics of cortex that defines its computational complexity such as the time-scales of cellular processes that transport ions in and out of neurons and give rise to electric signals are also investigated. In particular we study the size of cortex in terms of neuron and synapse numbers in five mammals; mouse, rat, cat, macaque, and human. The cortical model is implemented with a connectionist type of network where the functional units correspond to cortical minicolumns and these are in turn grouped into hypercolumn modules. The learning-rules used in the model are local in space and time, which make them biologically plausible and also allows for efficient parallel implementation. We study the implemented model both as a single- and multi-layered network. Instances of the model with sizes up to that of a rat-cortex equivalent are implemented and run on cluster computers in 23% of real time. We demonstrate on tasks involving image-data that the cortical model can be used for meaningful computations such as noise reduction, pattern completion, prototype extraction, hierarchical clustering, classification, and content addressable memory, and we show that also the largest cortex equivalent instances of the model can perform these types of computations. Important characteristics of the model are that it is insensitive to limited errors in the computational hardware and noise in the input data. Furthermore, it can learn from examples and is self-organizing to some extent. The proposed model contributes to the quest of understanding the cortex and it is also a first step towards a brain-inspired computing system that can be implemented in the molecular scale computers of tomorrow. The main contributions of this thesis are: (i) A review of the size, modularization, and computational structure of the mammalian neocortex. (ii) An abstract generic connectionist network model of the mammalian cortex. (iii) A framework for a brain-inspired self-organizing information processing system. (iv) Theoretical work on the properties of the model when used as an autoassociative memory. (v) Theoretical insights on the anatomy and physiology of the cortex. (vi) Efficient implementation techniques and simulations of cortical sized instances. (vii) A fixed-point arithmetic implementation of the model that can be used in digital hardware. / QC 20100903 Attractor Neural Networks Cerebral Cortex Neocortex Brain Like Computing Hypercolumns Minicolumns BCPNN Parallel Computers Autoassociative Memory Computer science Datalogi
64	Programming models for speculative and optimistic parallelism based on algorithmic properties Cledat, Romain 24 August 2011 (has links) Today's hardware is becoming more and more parallel. While embarrassingly parallel codes, such as high-performance computing ones, can readily take advantage of this increased number of cores, most other types of code cannot easily scale using traditional data and/or task parallelism and cores are therefore left idling resulting in lost opportunities to improve performance. The opportunistic computing paradigm, on which this thesis rests, is the idea that computations should dynamically adapt to and exploit the opportunities that arise due to idling resources to enhance their performance or quality. In this thesis, I propose to utilize algorithmic properties to develop programming models that leverage this idea thereby providing models that increase and improve the parallelism that can be exploited. I exploit three distinct algorithmic properties: i) algorithmic diversity, ii) the semantic content of data-structures, and iii) the variable nature of results in certain applications. This thesis presents three main contributions: i) the N-way model which leverages algorithmic diversity to speed up hitherto sequential code, ii) an extension to the N-way model which opportunistically improves the quality of computations and iii) a framework allowing the programmer to specify the semantics of data-structures to improve the performance of optimistic parallelism. Parallel programming Algorithmic properties Algorithmic diversity Variable semantics Data-structure semantics N-way model Algorithms Parallel computers Computer science
65	In pursuit of NP-hard combinatorial optimization problems Ono, Satoshi. January 2009 (has links) Thesis (Ph. D.)--State University of New York at Binghamton, Thomas J. Watson School of Engineering and Applied Science, Department of Computer Science, 2009. / Includes bibliographical references.
66	Adaptive finite element simulation of flow and transport applications on parallel computers Kirk, Benjamin Shelton, January 1900 (has links) Thesis (Ph. D.)--University of Texas at Austin, 2007. / Vita. Includes bibliographical references.
67	Lokale Realisierung von Vektoroperationen auf Parallelrechnern Groh, U. 30 October 1998 (has links) For the basic algebraic vector operations several variants of a local implementation on distributed memory parallel computers are presented and discussed systematically. In particular necessary and sufficient conditions are shown for the local realizability of the multiplication matrix by vector. info:eu-repo/classification/ddc/510 ddc:510
68	PPerfGrid: A Grid Services-Based Tool for the Exchange of Heterogeneous Parallel Performance Data Hoffman, John Jared 01 January 2004 (has links) This thesis details the approach taken in developing PPerfGrid. Section 2 discusses other research related to this project. Section 3 provides general background on the technologies utilized in PPerfGrid, focusing on the components that make up the Grid services architecture. Section 4 provides a description of the architecture of PPerfGrid. Section 5 details the implementation of PPerfGrid. Section 6 presents tests designed to measure the overhead and scalability of the PPerfGrid application. Section 7 suggests future work, and Section 8 concludes the thesis. High performance computing Computational grids (Computer systems) Parallel computers Computer Engineering Computer Sciences
69	A Platform for reliable computing on clusters using group communications. Rough, Justin, mikewood@deakin.edu.au January 2001 (has links) Shared clusters represent an excellent platform for the execution of parallel applications given their low price/performance ratio and the presence of cluster infrastructure in many organisations. The focus of recent research efforts are on parallelism management, transport and efficient access to resources, and making clusters easy to use. In this thesis, we examine reliable parallel computing on clusters. The aim of this research is to demonstrate the feasibility of developing an operating system facility providing transport fault tolerance using existing, enhanced and newly built operating system services for supporting parallel applications. In particular, we use existing process duplication and process migration services, and synthesise a group communications facility for use in a transparent checkpointing facility. This research is carried out using the methods of experimental computer science. To provide a foundation for the synthesis of the group communications and checkpointing facilities, we survey and review related work in both fields. For group communications, we examine the V Distributed System, the x-kernel and Psync, the ISIS Toolkit, and Horus. We identify a need for services that consider the placement of processes on computers in the cluster. For Checkpointing, we examine Manetho, KeyKOS, libckpt, and Diskless Checkpointing. We observe the use of remote computer memories for storing checkpoints, and the use of copy-on-write mechanisms to reduce the time to create a checkpoint of a process. We propose a group communications facility providing two sets of services: user-oriented services and system-oriented services. User-oriented services provide transparency and target application. System-oriented services supplement the user-oriented services for supporting other operating systems services and do not provide transparency. Additional flexibility is achieved by providing delivery and ordering semantics independently. An operating system facility providing transparent checkpointing is synthesised using coordinated checkpointing. To ensure a consistent set of checkpoints are generated by the facility, instead of blindly blocking the processes of a parallel application, only non-deterministic events are blocked. This allows the processes of the parallel application to continue execution during the checkpoint operation. Checkpoints are created by adapting process duplication mechanisms, and checkpoint data is transferred to remote computer memories and disk for storage using the mechanisms of process migration. The services of the group communications facility are used to coordinate the checkpoint operation, and to transport checkpoint data to remote computer memories and disk. Both the group communications facility and the checkpointing facility have been implemented in the GENESIS cluster operating system and provide proof-of-concept. GENESIS uses a microkernel and client-server based operating system architecture, and is demonstrated to provide an appropriate environment for the development of these facilities. We design a number of experiments to test the performance of both the group communications facility and checkpointing facility, and to provide proof-of-performance. We present our approach to testing, the challenges raised in testing the facilities, and how we overcome them. For group communications, we examine the performance of a number of delivery semantics. Good speed-ups are observed and system-oriented group communication services are shown to provide significant performance advantages over user-oriented semantics in the presence of packet loss. For checkpointing, we examine the scalability of the facility given different levels of resource usage and a variable number of computers. Low overheads are observed for checkpointing a parallel application. It is made clear by this research that the microkernel and client-server based cluster operating system provide an ideal environment for the development of a high performance group communications facility and a transparent checkpointing facility for generating a platform for reliable parallel computing on clusters. parallel computers disributed operating systems genesis v distributed systems x-kernel psync isis toolkit horus manetho keykos libckpt diskless checkpointing checkpointing clusters
70	A model of dynamic compilation for heterogeneous compute platforms Kerr, Andrew 10 December 2012 (has links) Trends in computer engineering place renewed emphasis on increasing parallelism and heterogeneity. The rise of parallelism adds an additional dimension to the challenge of portability, as different processors support different notions of parallelism, whether vector parallelism executing in a few threads on multicore CPUs or large-scale thread hierarchies on GPUs. Thus, software experiences obstacles to portability and efficient execution beyond differences in instruction sets; rather, the underlying execution models of radically different architectures may not be compatible. Dynamic compilation applied to data-parallel heterogeneous architectures presents an abstraction layer decoupling program representations from optimized binaries, thus enabling portability without encumbering performance. This dissertation proposes several techniques that extend dynamic compilation to data-parallel execution models. These contributions include: - characterization of data-parallel workloads - machine-independent application metrics - framework for performance modeling and prediction - execution model translation for vector processors - region-based compilation and scheduling We evaluate these claims via the development of a novel dynamic compilation framework, GPU Ocelot, with which we execute real-world workloads from GPU computing. This enables the execution of GPU computing workloads to run efficiently on multicore CPUs, GPUs, and a functional simulator. We show data-parallel workloads exhibit performance scaling, take advantage of vector instruction set extensions, and effectively exploit data locality via scheduling which attempts to maximize control locality. Dynamic compilation GPU computing Cuda Opencl SIMD Vector Multicore Parallel computing Parallel computers Parallel programs (Computer programs) Heterogeneous computing High performance computing

Search results