Global ETD Search

1	Necessary and sufficient conditions on partial orders for modeling concurrent computations Chauhan, Himanshu 03 October 2014 (has links) Concurrent computations have been modeled using partial orders in both event based and state based domains. We give necessary and sufficient conditions on partial orders for them to be valid state based or event based models of concurrent computations. In particular, we define notions of width-extensibility and interleaving-consistency of partial orders, and show that a partial order can be valid state based model of a concurrent computation iff it is width-extensible. Distributed computations that involve asynchronous message passing are a subset of concurrent computations. For asynchronous distributed computations, a partial order can be a valid state based model iff it is width-extensible and interleaving-consistent. We show a duality between the event based and state based models of concurrent computations, and give algorithms to convert partial orders from the event based domain to state based domain and vice-versa. / text Distributed systems Concurrent computations
2	A Computational Approach For Investigating Unsteady Turbine Heat Transfer Due To Shock Wave Impact Reid, Terry Vincent 05 February 1999 (has links) The effects of shock wave impact on unsteady turbine heat transfer are investigated. A numerical approach is developed to simulate the flow physics present in a previously performed unsteady wind tunnel experiment. The windtunnel experiment included unheated and heated flows over a cascade of highly loaded turbine blades. After the flow over the blades was established, a single shock with a pressure ratio of 1.1 was introduced into the wind tunnel test section. A single blade was equipped with pressure transducers and heat flux microsensors. As the shock wave strikes the blade, time resolved pressure, temperature, and heat transfer data were recorded. / Ph. D. Unsteady computations heat flux
3	Nonlocality and communication complexity Dam, Wim van January 1999 (has links) No description available. 530.1
4	Code optimizations for narrow bitwidth architectures Bhagat, Indu 23 February 2012 (has links) This thesis takes a HW/SW collaborative approach to tackle the problem of computational inefficiency in a holistic manner. The hardware is redesigned by restraining the datapath to merely 16-bit datawidth (integer datapath only) to provide an extremely simple, low-cost, low-complexity execution core which is best at executing the most common case efficiently. This redesign, referred to as the Narrow Bitwidth Architecture, is unique in that although the datapath is squeezed to 16-bits, it continues to offer the advantage of higher memory addressability like the contemporary wider datapath architectures. Its interface to the outside (software) world is termed as the Narrow ISA. The software is responsible for efficiently mapping the current stack of 64-bit applications onto the 16-bit hardware. However, this HW/SW approach introduces a non-negligible penalty both in dynamic code-size and performance-impact even with a reasonably smart code-translator that maps the 64- bit applications on to the 16-bit processor. The goal of this thesis is to design a software layer that harnesses the power of compiler optimizations to assuage this negative performance penalty of the Narrow ISA. More specifically, this thesis focuses on compiler optimizations targeting the problem of how to compile a 64-bit program to a 16-bit datapath machine from the perspective of Minimum Required Computations (MRC). Given a program, the notion of MRC aims to infer how much computation is really required to generate the same (correct) output as the original program. Approaching perfect MRC is an intrinsically ambitious goal and it requires oracle predictions of program behavior. Towards this end, the thesis proposes three heuristic-based optimizations to closely infer the MRC. The perspective of MRC unfolds into a definition of productiveness - if a computation does not alter the storage location, it is non-productive and hence, not necessary to be performed. In this research, the definition of productiveness has been applied to different granularities of the data-flow as well as control-flow of the programs. Three profile-based, code optimization techniques have been proposed : 1. Global Productiveness Propagation (GPP) which applies the concept of productiveness at the granularity of a function. 2. Local Productiveness Pruning (LPP) applies the same concept but at a much finer granularity of a single instruction. 3. Minimal Branch Computation (MBC) is an profile-based, code-reordering optimization technique which applies the principles of MRC for conditional branches. The primary aim of all these techniques is to reduce the dynamic code footprint of the Narrow ISA. The first two optimizations (GPP and LPP) perform the task of speculatively pruning the non-productive (useless) computations using profiles. Further, these two optimization techniques perform backward traversal of the optimization regions to embed checks into the nonspeculative slices, hence, making them self-sufficient to detect mis-speculation dynamically. The MBC optimization is a use case of a broader concept of a lazy computation model. The idea behind MBC is to reorder the backslices containing narrow computations such that the minimal necessary computations to generate the same (correct) output are performed in the most-frequent case; the rest of the computations are performed only when necessary. With the proposed optimizations, it can be concluded that there do exist ways to smartly compile a 64-bit application to a 16- bit ISA such that the overheads are considerably reduced. / Esta tesis deriva su motivación en la inherente ineficiencia computacional de los procesadores actuales: a pesar de que muchas aplicaciones contemporáneas tienen unos requisitos de ancho de bits estrechos (aplicaciones de enteros, de red y multimedia), el hardware acaba utilizando el camino de datos completo, utilizando más recursos de los necesarios y consumiendo más energía. Esta tesis utiliza una aproximación HW/SW para atacar, de forma íntegra, el problema de la ineficiencia computacional. El hardware se ha rediseñado para restringir el ancho de bits del camino de datos a sólo 16 bits (únicamente el de enteros) y ofrecer así un núcleo de ejecución simple, de bajo consumo y baja complejidad, el cual está diseñado para ejecutar de forma eficiente el caso común. El rediseño, llamado en esta tesis Arquitectura de Ancho de Bits Estrecho (narrow bitwidth en inglés), es único en el sentido que aunque el camino de datos se ha estrechado a 16 bits, el sistema continúa ofreciendo las ventajas de direccionar grandes cantidades de memoria tal como procesadores con caminos de datos más anchos (64 bits actualmente). Su interface con el mundo exterior se denomina ISA estrecho. En nuestra propuesta el software es responsable de mapear eficientemente la actual pila software de las aplicaciones de 64 bits en el hardware de 16 bits. Sin embargo, esta aproximación HW/SW introduce penalizaciones no despreciables tanto en el tamaño del código dinámico como en el rendimiento, incluso con un traductor de código inteligente que mapea las aplicaciones de 64 bits en el procesador de 16 bits. El objetivo de esta tesis es el de diseñar una capa software que aproveche la capacidad de las optimizaciones para reducir el efecto negativo en el rendimiento del ISA estrecho. Concretamente, esta tesis se centra en optimizaciones que tratan el problema de como compilar programas de 64 bits para una máquina de 16 bits desde la perspectiva de las Mínimas Computaciones Requeridas (MRC en inglés). Dado un programa, la noción de MRC intenta deducir la cantidad de cómputo que realmente se necesita para generar la misma (correcta) salida que el programa original. Aproximarse al MRC perfecto es una meta intrínsecamente ambiciosa y que requiere predicciones perfectas de comportamiento del programa. Con este fin, la tesis propone tres heurísticas basadas en optimizaciones que tratan de inferir el MRC. La utilización de MRC se desarrolla en la definición de productividad: si un cálculo no altera el dato que ya había almacenado, entonces no es productivo y por lo tanto, no es necesario llevarlo a cabo. Se han propuesto tres optimizaciones del código basadas en profile: 1. Propagación Global de la Productividad (GPP en inglés) aplica el concepto de productividad a la granularidad de función. 2. Poda Local de Productividad (LPP en inglés) aplica el mismo concepto pero a una granularidad mucho más fina, la de una única instrucción. 3. Computación Mínima del Salto (MBC en inglés) es una técnica de reordenación de código que aplica los principios de MRC a los saltos condicionales. El objetivo principal de todas esta técnicas es el de reducir el tamaño dinámico del código estrecho. Las primeras dos optimizaciones (GPP y LPP) realizan la tarea de podar especulativamente las computaciones no productivas (innecesarias) utilizando profiles. Además, estas dos optimizaciones realizan un recorrido hacia atrás de las regiones a optimizar para añadir chequeos en el código no especulativo, haciendo de esta forma la técnica autosuficiente para detectar, dinámicamente, los casos de fallo en la especulación. La idea de la optimización MBC es reordenar las instrucciones que generan el salto condicional tal que las mínimas computaciones que general la misma (correcta) salida se ejecuten en la mayoría de los casos; el resto de las computaciones se ejecutarán sólo cuando sea necesario. code optimization narrow computations 004
5	Efficient multiband algorithms for blind source separation Badran, Salah Al-Din Ibrahim January 2016 (has links) The problem of blind separation refers to recovering original signals, called source signals, from the mixed signals, called observation signals, in a reverberant environment. The mixture is a function of a sequence of original speech signals mixed in a reverberant room. The objective is to separate mixed signals to obtain the original signals without degradation and without prior information of the features of the sources. The strategy used to achieve this objective is to use multiple bands that work at a lower rate, have less computational cost and a quicker convergence than the conventional scheme. Our motivation is the competitive results of unequal-passbands scheme applications, in terms of the convergence speed. The objective of this research is to improve unequal-passbands schemes by improving the speed of convergence and reducing the computational cost. The first proposed work is a novel maximally decimated unequal-passbands scheme. This scheme uses multiple bands that make it work at a reduced sampling rate, and low computational cost. An adaptation approach is derived with an adaptation step that improved the convergence speed. The performance of the proposed scheme was measured in different ways. First, the mean square errors of various bands are measured and the results are compared to a maximally decimated equal-passbands scheme, which is currently the best performing method. The results show that the proposed scheme has a faster convergence rate than the maximally decimated equal-passbands scheme. Second, when the scheme is tested for white and coloured inputs using a low number of bands, it does not yield good results; but when the number of bands is increased, the speed of convergence is enhanced. Third, the scheme is tested for quick changes. It is shown that the performance of the proposed scheme is similar to that of the equal-passbands scheme. Fourth, the scheme is also tested in a stationary state. The experimental results confirm the theoretical work. For more challenging scenarios, an unequal-passbands scheme with over-sampled decimation is proposed; the greater number of bands, the more efficient the separation. The results are compared to the currently best performing method. Second, an experimental comparison is made between the proposed multiband scheme and the conventional scheme. The results show that the convergence speed and the signal-to-interference ratio of the proposed scheme are higher than that of the conventional scheme, and the computation cost is lower than that of the conventional scheme.
6	Tiling and Asynchronous Communication Optimizations for Stencil Computations Malas, Tareq Majed Yasin 07 December 2015 (has links) The importance of stencil-based algorithms in computational science has focused attention on optimized parallel implementations for multilevel cache-based processors. Temporal blocking schemes leverage the large bandwidth and low latency of caches to accelerate stencil updates and approach theoretical peak performance. A key ingredient is the reduction of data traffic across slow data paths, especially the main memory interface. Most of the established work concentrates on updating separate cache blocks per thread, which works on all types of shared memory systems, regardless of whether there is a shared cache among the cores. This approach is memory-bandwidth limited in several situations, where the cache space for each thread can be too small to provide sufficient in-cache data reuse. We introduce a generalized multi-dimensional intra-tile parallelization scheme for shared-cache multicore processors that results in a significant reduction of cache size requirements and shows a large saving in memory bandwidth usage compared to existing approaches. It also provides data access patterns that allow efficient hardware prefetching. Our parameterized thread groups concept provides a controllable trade-off between concurrency and memory usage, shifting the pressure between the memory interface and the Central Processing Unit (CPU).We also introduce efficient diamond tiling structure for both shared memory cache blocking and distributed memory relaxed-synchronization communication, demonstrated using one-dimensional domain decomposition. We describe the approach and our open-source testbed implementation details (called Girih), present performance results on contemporary Intel processors, and apply advanced performance modeling techniques to reconcile the observed performance with hardware capabilities. Furthermore, we conduct a comparison with the state-of-the-art stencil frameworks PLUTO and Pochoir in shared memory, using corner-case stencil operators. We study the impact of the diamond tile size on computational intensity, cache block size, and energy consumption. The impact of computational intensity on power dissipation on the CPU and in the DRAM is investigated and shows that DRAM power is a decisive factor for energy consumption in the Intel Ivy Bridge processor, which is strongly influenced by the computational intensity. Moreover, we show that highest performance does not necessarily lead to lowest energy even if the clock speed is fixed. We apply our approach to an electromagnetic simulation application for solar cell development, demonstrating several-fold speedup compared to an efficient spatially blocked variant. Finally, we discuss the integration of our approach with other techniques for future High Performance Computing (HPC) systems, which are expected to be more memory bandwidth-starved with a deeper memory hierarchy. High Performance Computing stencil computations tiling
7	Error Algebras Lei, Wei 11 1900 (has links) In computations over many-sorted algebras, one typically encounters error cases, caused by attempting to evaluate an operation outside its domain (e.g. division by the integer 0; taking the square root of a negative integer; popping an empty stack). We present a method for systematically dealing with such error cases, namely the construction of an "error algebra" based on the original algebra. As an application of this method, we show that it provides a good semantics for (possibly improper) function tables. / Thesis / Master of Science (MSc) error algebras computations many-sorted algebras
8	The Standard Map Machine LaMacchia, Brian, Nieh, Jason 01 September 1989 (has links) We have designed the Standard Map Machine(SMM) as an answer to the intensive computational requirements involved in the study of chaotic behavior in nonlinear systems. The high-speed and high-precision performance of this computer is due to its simple architecture specialized to the numerical computations required of nonlinear systems. In this report, we discuss the design and implementation of this special-purpose machine. chaos nonlinear mappings numerical computations computersarchitecture special-purpose computing
9	Shock compression response of aluminum-based intermetallic-forming reactive systems Specht, Paul Elliott 06 February 2013 (has links) Heterogeneities at the meso-scale strongly influence the shock compression response of composite materials. These heterogeneities arise from both structural variations and differing physical/mechanical properties between constituents. In mixtures of reactive materials, such as Ni and Al, the meso-scale heterogeneities greatly affect component mixing and activation, which, in turn, can induce a chemical reaction. Cold-rolled multilayered composites of Ni and Al provide a unique system for studying the effects of material heterogeneities on a propagating shock wave, due to their full density, periodic layering, and intimate particle contacts. Computational analysis of the shock compression response of fully dense Ni/Al multilayered composites is performed with real, heterogeneous microstructures, obtained from optical microscopy, using the Eulerian hydrocode CTH. Changes in the orientation, density, structure, and strength of the material interfaces, as well as the strength of the constituents, are used to understand the influence microstructure plays on the multilayered composite response at high strain rates. The results show a marked difference in the dissipation and dispersion of the shock wave as the underlying microstructure varies. These variations can be attributed to the development of two-dimensional effects and the nature of the wave reflections and interactions. Validation of the computational results is then obtained through time-resolved measurements (VISAR, PDV, and PVDF stress gauges) performed during uniaxial strain plate-on-plate impact experiments. The experimental results prove that the computational method accurately represents the multilayered composites, thereby justifying the conclusions and trends extracted from the simulations. The reaction response of cold-rolled multilayer composites is also investigated and characterized using uniaxial stress rod-on-anvil impact experiments through post-mortem microscopy and x-ray diffraction. This extensive understanding of the shock compression response of the multilayers systems is contrasted with other composites of Ni and Al, including shock consolidated and pressed (porous) powder compacts. A comprehensive design space is then developed to assist in the understanding and design of Ni/Al composites under conditions of high pressure shock compression. Research funded by ONR/MURI grant No. N00014-07-1-0740. Shock loading Composites Experiments Computations Shock (Mechanics) Inhomogeneous materials Microstructure
10	Energy decay in vortices Lönn, Björn January 2011 (has links) The long time energy decay of vortices for several different initial flow scenarios is investigated both theoretically and numerically. The theoretical analysis is based on the energy method. Numerical calculations are done by solving the compressible Navier-Stokes equations using a high order stable finite difference method. The simulations verify the theoretical conclusion that vortices decay at a slow rate compared to other types of flows. Several Reynolds numbers and grid sizes in both two and three dimensions are considered. Navier-Stokes Energy method Energy decay Vortices Vortex Fluid computations

Search results