Spelling suggestions: "subject:"performancecomputing"" "subject:"performance.comparing""
411 |
High-performance direct solution of finite element problems on multi-core processorsGuney, Murat Efe 04 May 2010 (has links)
A direct solution procedure is proposed and developed which exploits the parallelism that exists in current symmetric multiprocessing (SMP) multi-core processors. Several algorithms are proposed and developed to improve the performance of the direct solution of FE problems. A high-performance sparse direct solver is developed which allows experimentation with the newly developed and existing algorithms. The performance of the algorithms is investigated using a large set of FE problems. Furthermore, operation count estimations are developed to further assess various algorithms. An out-of-core version of the solver is developed to reduce the memory requirements for the solution. I/O is performed asynchronously without blocking the thread that makes the I/O request. Asynchronous I/O allows overlapping factorization and triangular solution computations with I/O. The performance of the developed solver is demonstrated on a large number of test problems. A problem with nearly 10 million degree of freedoms is solved on a low price desktop computer using the out-of-core version of the direct solver. Furthermore, the developed solver usually outperforms a commonly used shared memory solver.
|
412 |
Direct numerical simulation and analysis of saturated deformable porous mediaKhan, Irfan 07 July 2010 (has links)
Existing numerical techniques for modeling saturated deformable porous media are based on
homogenization techniques and thus are incapable of performing micro-mechanical investigations, such as the effect of micro-structure on the deformational characteristics of the media. In this research work, a numerical scheme is developed based on the parallelized hybrid lattice-Boltzmann finite-element method, that is capable of performing micro-mechanical investigations through direct numerical
simulations.
The method has been used to simulate compression of model saturated porous media made of
spheres and cylinders in regular arrangements. Through these simulations it is found that in the limit of small Reynolds number, Capillary number and strain, the deformational behaviour of a real porous media can be recovered through model porous media when the parameters porosity, permeability and bulk compressive modulus are matched between the two media.
This finding motivated research in using model porous geometries to represent more complex
real porous geometries in order to perform investigations of deformation on the latter. An attempt has been made to apply this technique to the complex geometries of ªfeltº, (a fibrous mat used in paper industries). These investigations lead to new understanding on the effect of fiber diameter on the bulk properties of a fibrous media and subsequently on the deformational behaviour of the media. Further the method has been used to investigate the constitutive relationships in deformable porous media.
Particularly the relationship between permeability and porosity during the deformation of the media is investigated. Results show the need of geometry specific investigations.
|
413 |
Data services: bringing I/O processing to petascaleAbbasi, Mohammad Hasan 08 July 2011 (has links)
The increasing size of high performance computing systems and the associated
increase in the volume of generated data, has resulted in an I/O bottleneck for these applications.
This bottleneck is further exacerbated by the imbalance in the growth of processing
capability compared to storage capability, due mainly to the power and cost requirements
of scaling the storage. This thesis introduces data services, a new abstraction which provides
significant benefits for data intensive applications. Data services combine low overhead
data movement with flexible placement of data manipulation operations, to address
the I/O challenges of leadership class scientific applications. The impact of asynchronous
data movement on application runtime is minimized by utilizing novel server side data
movement schedulers to avoid contention related jitter in application communication. Additionally,
the JITStager component is presented. Utilizing dynamic code generation and
flexible code placement, the JITStager allows data services to be executed as a pipeline
extending from the application to storage. It is shown in this thesis that data services can
add new functionality to the application without having an significant negative impact on
performance.
|
414 |
Coordinated system level resource management for heterogeneous many-core platformsGupta, Vishakha 24 August 2011 (has links)
A challenge posed by future computer architectures is the efficient exploitation of their many and sometimes
heterogeneous computational cores. This challenge is exacerbated by the multiple facilities for data movement
and sharing across cores resident on such platforms. To answer the question of how systems software should treat heterogeneous
resources, this dissertation describes an approach that (1) creates a common manageable pool for all the
resources present in the platform, and then (2) provides virtual machines (VMs) with multiple `personalities',
flexibly mapped to and efficiently run on the heterogeneous underlying hardware. A VM's personality is its execution
context on the different types of available processing resources usable by the VM. We provide mechanisms for
making such platforms manageable and evaluate coordinated scheduling policies for mapping different VM personalities on
heterogeneous hardware.
Towards that end, this dissertation contributes technologies that include
(1) restructuring hypervisor and system functions to create high performance environments that enable flexibility
of execution and data sharing,
(2) scheduling and other resource management infrastructure for supporting diverse application needs and
heterogeneous platform characteristics, and
(3) hypervisor level policies to permit efficient and coordinated resource usage and sharing.
Experimental evaluations on multiple heterogeneous platforms, like one comprised of x86-based cores with attached
NVIDIA accelerators and others with asymmetric elements on chip,
demonstrate the utility of the approach and its ability to efficiently host diverse applications
and resource management methods.
|
415 |
High-performance computer system architectures for embedded computingLee, Dongwon 26 August 2011 (has links)
The main objective of this thesis is to propose new methods for designing high-performance embedded computer system architectures. To achieve the goal, three major components - multi-core processing elements (PEs), DRAM main memory systems, and on/off-chip interconnection networks - in multi-processor embedded systems are examined in each section respectively.
The first section of this thesis presents architectural enhancements to graphics processing units (GPUs), one of the multi- or many-core PEs, for improving performance of embedded applications. An embedded application is first mapped onto GPUs to explore the design space, and then architectural enhancements to existing GPUs are proposed for improving throughput of the embedded application.
The second section proposes high-performance buffer mapping methods, which exploit useful features of DRAM main memory systems, in DSP multi-processor systems. The memory wall problem becomes increasingly severe in multiprocessor environments because of communication and synchronization overheads. To alleviate the memory wall problem, this section exploits bank concurrency and page mode access of DRAM main memory systems for increasing the performance of multiprocessor DSP systems.
The final section presents a network-centric Turbo decoder and network-centric FFT processors. In the era of multi-processor systems, an interconnection network is another performance bottleneck. To handle heavy communication traffic, this section applies a crossbar switch - one of the indirect networks - to the parallel Turbo decoder, and applies a mesh topology to the parallel FFT processors. When designing the mesh FFT processors, a very different approach is taken to improve performance; an optical fiber is used as a new interconnection medium.
|
416 |
Parallel algorithms for direct blood flow simulationsRahimian, Abtin 21 February 2012 (has links)
Fluid mechanics of blood can be well approximated by a mixture model of a Newtonian fluid and deformable particles representing the red blood cells. Experimental and theoretical evidence suggests that the deformation and rheology of red blood cells is similar to that of phospholipid vesicles. Vesicles and red blood cells are both area preserving closed membranes that resist bending. Beyond red blood cells, vesicles can be used to investigate the behavior of cell membranes, intracellular organelles, and viral particles. Given the importance of vesicle flows, in this thesis we focus in efficient numerical methods for such problems: we present computationally scalable algorithms for the simulation of dilute suspension of deformable vesicles in two and three dimensions. Our method is based on the boundary integral formulation of Stokes flow. We present new schemes for simulating the three-dimensional hydrodynamic interactions of large number of vesicles with viscosity contrast. The algorithms incorporate a stable time-stepping scheme, high-order spatiotemporal discretizations, spectral preconditioners, and a reparametrization scheme capable of resolving extreme mesh distortions in dynamic simulations. The associated linear systems are solved in optimal time using spectral preconditioners. The highlights of our numerical scheme are that (i) the physics of vesicles is faithfully represented by using nonlinear solid mechanics to capture the deformations of each cell, (ii) the long-range, N-body, hydrodynamic interactions between vesicles are accurately resolved using the fast multipole method (FMM), and (iii) our time stepping scheme is unconditionally stable for the flow of single and multiple vesicles with viscosity contrast and its computational cost-per-simulation-unit-time is comparable to or less than that of an explicit scheme. We report scaling of our algorithms to simulations with millions of vesicles on thousands of computational cores.
|
417 |
Advanced Memory Data Structures for Scalable Event Trace AnalysisKnüpfer, Andreas 17 April 2009 (has links) (PDF)
The thesis presents a contribution to the analysis and visualization of computational performance based on event traces with a particular focus on parallel programs and High Performance Computing (HPC). Event traces contain detailed information about specified incidents (events) during run-time of programs and allow minute investigation of dynamic program behavior, various performance metrics, and possible causes of performance flaws. Due to long running and highly parallel programs and very fine detail resolutions, event traces can accumulate huge amounts of data which become a challenge for interactive as well as automatic analysis and visualization tools. The thesis proposes a method of exploiting redundancy in the event traces in order to reduce the memory requirements and the computational complexity of event trace analysis. The sources of redundancy are repeated segments of the original program, either through iterative or recursive algorithms or through SPMD-style parallel programs, which produce equal or similar repeated event sequences. The data reduction technique is based on the novel Complete Call Graph (CCG) data structure which allows domain specific data compression for event traces in a combination of lossless and lossy methods. All deviations due to lossy data compression can be controlled by constant bounds. The compression of the CCG data structure is incorporated in the construction process, such that at no point substantial uncompressed parts have to be stored. Experiments with real-world example traces reveal the potential for very high data compression. The results range from factors of 3 to 15 for small scale compression with minimum deviation of the data to factors > 100 for large scale compression with moderate deviation. Based on the CCG data structure, new algorithms for the most common evaluation and analysis methods for event traces are presented, which require no explicit decompression. By avoiding repeated evaluation of formerly redundant event sequences, the computational effort of the new algorithms can be reduced in the same extent as memory consumption. The thesis includes a comprehensive discussion of the state-of-the-art and related work, a detailed presentation of the design of the CCG data structure, an elaborate description of algorithms for construction, compression, and analysis of CCGs, and an extensive experimental validation of all components. / Diese Dissertation stellt einen neuartigen Ansatz für die Analyse und Visualisierung der Berechnungs-Performance vor, der auf dem Ereignis-Tracing basiert und insbesondere auf parallele Programme und das Hochleistungsrechnen (High Performance Computing, HPC) zugeschnitten ist. Ereignis-Traces (Ereignis-Spuren) enthalten detaillierte Informationen über spezifizierte Ereignisse während der Laufzeit eines Programms und erlauben eine sehr genaue Untersuchung des dynamischen Verhaltens, verschiedener Performance-Metriken und potentieller Performance-Probleme. Aufgrund lang laufender und hoch paralleler Anwendungen und dem hohen Detailgrad kann das Ereignis-Tracing sehr große Datenmengen produzieren. Diese stellen ihrerseits eine Herausforderung für interaktive und automatische Analyse- und Visualisierungswerkzeuge dar. Die vorliegende Arbeit präsentiert eine Methode, die Redundanzen in den Ereignis-Traces ausnutzt, um sowohl die Speicheranforderungen als auch die Laufzeitkomplexität der Trace-Analyse zu reduzieren. Die Ursachen für Redundanzen sind wiederholt ausgeführte Programmabschnitte, entweder durch iterative oder rekursive Algorithmen oder durch SPMD-Parallelisierung, die gleiche oder ähnliche Ereignis-Sequenzen erzeugen. Die Datenreduktion basiert auf der neuartigen Datenstruktur der "Vollständigen Aufruf-Graphen" (Complete Call Graph, CCG) und erlaubt eine Kombination von verlustfreier und verlustbehafteter Datenkompression. Dabei können konstante Grenzen für alle Abweichungen durch verlustbehaftete Kompression vorgegeben werden. Die Datenkompression ist in den Aufbau der Datenstruktur integriert, so dass keine umfangreichen unkomprimierten Teile vor der Kompression im Hauptspeicher gehalten werden müssen. Das enorme Kompressionsvermögen des neuen Ansatzes wird anhand einer Reihe von Beispielen aus realen Anwendungsszenarien nachgewiesen. Die dabei erzielten Resultate reichen von Kompressionsfaktoren von 3 bis 5 mit nur minimalen Abweichungen aufgrund der verlustbehafteten Kompression bis zu Faktoren > 100 für hochgradige Kompression. Basierend auf der CCG_Datenstruktur werden außerdem neue Auswertungs- und Analyseverfahren für Ereignis-Traces vorgestellt, die ohne explizite Dekompression auskommen. Damit kann die Laufzeitkomplexität der Analyse im selben Maß gesenkt werden wie der Hauptspeicherbedarf, indem komprimierte Ereignis-Sequenzen nicht mehrmals analysiert werden. Die vorliegende Dissertation enthält eine ausführliche Vorstellung des Stands der Technik und verwandter Arbeiten in diesem Bereich, eine detaillierte Herleitung der neu eingeführten Daten-strukturen, der Konstruktions-, Kompressions- und Analysealgorithmen sowie eine umfangreiche experimentelle Auswertung und Validierung aller Bestandteile.
|
418 |
Modeling of an adaptive parallel system with malleable applications in a distributed computing environmentGhafoor, Sheikh Khaled, January 2007 (has links)
Thesis (Ph.D.)--Mississippi State University. Department of Computer Science and Engineering. / Title from title screen. Includes bibliographical references.
|
419 |
Intelligent text recognition system on a heterogeneous multi-core processor cluster a performance profile and architecture exploration /Ritholtz, Lee. January 2009 (has links)
Thesis (M.S.)--State University of New York at Binghamton, Thomas J. Watson School of Engineering and Applied Science, Department of Electrical and Computer Engineering, 2009. / Includes bibliographical references.
|
420 |
Simulations of pulsatile flow through bileaflet mechanical heart valves using a suspension flow model: to assess blood damageYun, Brian Min 08 June 2015 (has links)
Defective or diseased native valves have been replaced by bileaflet mechanical heart valves (BMHVs) for many years. However, severe complications still exist, and thus blood damage that occurs in BMHV flows must be well understood. The aim of this research is to numerically study platelet damage that occurs in BMHV flows. The numerical suspension flow method combines lattice-Boltzmann fluid modeling with the external boundary force method. This method is validated as a general suspension flow solver, and then validated against experimental BMHV flow data. Blood damage is evaluated for a physiologic adult case of BMHV flow and then for BMHVs with pediatric sizing and flow conditions. Simulations reveal intricate, small-scale BMHV flow features, and the presence of turbulence in BMHV flow. The results suggest a shift from previous evaluations of instantaneous flow to the determination of long-term flow recirculation regions when assessing thromboembolic potential. Sharp geometries that may induce these recirculation regions should be avoided in device design. Simulations for predictive assessment of pediatric sized valves show increased platelet damage values for potential pediatric valves. However, damage values do not exceed platelet activation thresholds, and highly damaged platelets are found far from the valve. Thus, the increased damage associated with resized valves is not such that pediatric valve development should be hindered. This method can also be used as a generic tool for future evaluation of novel prosthetic devices or cardiovascular flow problems.
|
Page generated in 0.1007 seconds