Global ETD Search

311	Automatic Parallel Memory Address Generation for Parallel DSP Computing Dai, Jiehua January 2008 (has links) <p>The concept of Parallel Vector (scratch pad) Memories (PVM) was introduced as one solution for Parallel Computing in DSP, which can provides parallel memory addressing efficiently with minimum latency. The parallel programming more efficient by using the parallel addressing generator for parallel vector memory (PVM) proposed in this thesis. However, without hiding complexities by cache, the cost of programming is high. To minimize the programming cost, automatic parallel memory address generation is needed to hide the complexities of memory access.</p><p>This thesis investigates methods for implementing conflict-free vector addressing algorithms on a parallel hardware structure. In particular, match vector addressing requirements extracted from the behaviour model to a prepared parallel memory addressing template, in order to supply data in parallel from the main memory to the on-chip vector memory.</p><p>According to the template and usage of the main and on-chip parallel vector memory, models for data pre-allocation and permutation in scratch pad memories of ASIP can be decided and configured. By exposing the parallel memory access of source code, the memory access flow graph (MFG) will be generated. Then MFG will be used combined with hardware information to match templates in the template library. When it is matched with one template, suited permutation equation will be gained, and the permutation table that include target addresses for data pre-allocation and permutation is created. Thus it is possible to automatically generate memory address for parallel memory accesses.</p><p>A tool for achieving the goal mentioned above is created, Permutator, which is implemented in C++ combined with XML. Memory access coding template is selected, as a result that permutation formulas are specified. And then PVM address table could be generated to make the data pre-allocation, so that efficient parallel memory access is possible.</p><p>The result shows that the memory access complexities is hiden by using Permutator, so that the programming cost is reduced.It works well in the context that each algorithm with its related hardware information is corresponding to a template case, so that extra memory cost is eliminated.</p> DSP Parallel Computing Parallel Vector (scratch pad) Memories Memory access Permutation Coding Template XML Computer engineering Datorteknik
312	Behandlung gekrümmter Oberflächen in einem 3D-FEM-Programm für Parallelrechner Pester, M. 30 October 1998 (has links) (PDF) The paper presents a method for generating curved surfaces of 3D finite element meshes by mesh refinement starting with a very coarse grid. This is useful for parallel implementations where the finest meshes should be computed and not read from large files. The paper deals with simple geometries as sphere, cylinder, cone. But the method may be extended to more complicated geometries. (with 45 figures) finite element meshes curved surfaces mesh refinement parallel computing MSC 65Y05 MSC 65N30 ddc:510 ddc:004
313	Domain decomposition methods in geomechanics Florez Guzman, Horacio Antonio 11 October 2012 (has links) Hydrocarbon production or injection of fluids in the reservoir can produce changes in the rock stresses and in-situ geomechanics, potentially leading to compaction and subsidence with harmful effects in wells, cap-rock, faults, and the surrounding environment as well. In order to tackle these changes and their impact, accurate simulations are essential. The Mortar Finite Element Method (MFEM) has been demonstrated to be a powerful technique in order to formulate a weak continuity condition at the interface of sub-domains in which different meshes, i.e. non-conforming or hybrid, and / or variational approximations are used. This is particularly suitable when coupling different physics on different domains, such as elasticity and poroelasticity, in the context of coupled flow and geomechanics. In this dissertation, popular Domain Decomposition Methods (DDM) are implemented in order to carry large simulations by taking full advantage of current parallel computer architectures. Different solution schemes can be defined depending upon the way information is exchanged between sub-domain interfaces. Three different schemes, i.e. Dirichlet-Neumann (DN), Neumann-Neumann (NN) and MFEM, are tested and the advantages and disadvantages of each of them are identified. As a first contribution, the MFEM is extended to deal with curve interfaces represented by Non-Uniform Rational B-Splines (NURBS) curves and surfaces. The goal is to have a more robust geometrical representation for mortar spaces, which allows gluing non-conforming interfaces on realistic geometries. The resulting mortar saddle-point problem will be decoupled by means of the DN- and NN-DDM. Additionally, a reservoir geometry reconstruction procedure based on NURBS surfaces is presented as well. The technique builds a robust piecewise continuous geometrical representation that can be exploited by MFEM in order to tackle realistic problems, which is a second contribution. Tensor product meshes are usually propagated from the reservoir in a conforming way into its surroundings, which makes non-matching interfaces highly attractive in this case. In the context of reservoir compaction and subsidence estimation, it is common to deal with serial legacy codes for flow. Indeed, major reservoir simulators such as compositional codes lack parallelism. Another issue is the fact that, generally speaking, flow and mechanics domains are different. To overcome this limitation, a serial-parallel approach is proposed in order to couple serial flow codes with our parallel mechanics code by means of iterative coupling. Concrete results in loosely coupling are presented as a third contribution. As a final contribution, the DN-DDM is applied to couple elasticity and plasticity, which seems very promising in order to speed up computations involving poroplasticity. Several examples of coupling of elasticity, poroelasticity, and plasticity ranging from near-wellbore applications to field level subsidence computations help to show that the proposed methodology can handle problems of practical interest. In order to facilitate the implementation of complex workflows, an advanced Python wrapper interface that allows programming capabilities have been implemented. The proposed serial-parallel approach seems to be appropriate to handle geomechanical problems involving different meshes for flow and mechanics as well as coupling parallel mechanistic codes with legacy flow simulators. / text Domain decomposition Parallel computing Dirichlet-Neumann Neumann-Neumann Elasticity Plasticity Geomechanics Finite elements Mortar finite elements NURBS
314	Υπολογιστικές εφαρμογές σε περιβάλλον παράλληλης επεξεργασίας Κομηνός, Χαράλαμπος Γαβριήλ 10 March 2014 (has links) Η παρούσα διπλωματική εργασία πραγματοποιήθηκε κατά το διάστημα 2012-2013 στο Εργαστήριο Συστημάτων Υπολογιστών (CSL) του Πανεπιστημίου Πατρών. Στόχος της εργασίας είναι η επίλυση ενός συνόλου προβλημάτων χρονοπρογραμματισμού εξετάσεων (ETP, Carter Dataset), με χρήση πληροφορημένου γενετικού αλγορίθμου. Στην εργασία αυτή θα παρουσιαστούν, τα βασικά μοντέλα λειτουργίας των γενετικών αλγορίθμων, του ETP καθώς και παρουσίαση βασικών εννοιών των παράλληλων συστημάτων. Τέλος παρουσιάζεται ο σειριακός κώδικας που υλοποιήθηκε σε ANSI-C και στην συνέχεια γίνεται σύγκριση με τον παράλληλο κώδικα που υλοποιήθηκε με MPI-C και παρουσιάζονται τα αποτελέσματα της σύγκρισης μεταξύ των δύο. / The Aim of this thesis which was completed during the 2012/2013 academic year at the Computer Systems Laboratory (CSL) at the University of Patras is to solve a set of Examination Timetabling Problems (Carter Dataset,ETP) with the aid of an informed genetic algorithm. I will present the basic model under which the genetic algorithms operate and some information about the ETP and general parallel systems. To conclude we will present our serial ANSI-C code and compare it with the parallel MPI-C code that we build and compare the two results. Γενετικοί αλγόριθμοι 005.275 MPI-C Genetic algorithms Examination timetabling Parallel computing Carter dataset
315	High performance algorithms to improve the runtime computation of spacecraft trajectories Arora, Nitin 20 September 2013 (has links) Challenging science requirements and complex space missions are driving the need for fast and robust space trajectory design and simulation tools. The main aim of this thesis is to develop new and improved high performance algorithms and solution techniques for commonly encountered problems in astrodynamics. Five major problems are considered and their state-of-the art algorithms are systematically improved. Theoretical and methodological improvements are combined with modern computational techniques, resulting in increased algorithm robustness and faster runtime performance. The five selected problems are 1) Multiple revolution Lambert problem, 2) High-fidelity geopotential (gravity field) computation, 3) Ephemeris computation, 4) Fast and accurate sensitivity computation, and 5) High-fidelity multiple spacecraft simulation. The work being presented enjoys applications in a variety of fields like preliminary mission design, high-fidelity trajectory simulation, orbit estimation and numerical optimization. Other fields like space and environmental science to chemical and electrical engineering also stand to benefit. Spacecraft trajectory simulation Fast gravity model Parallel computing GPU computing, Lambert's problem Trajectory optimization Ephemeris computation Space trajectories Algorithms Astrodynamics
316	Susidūrimų paieškos, naudojant lygiagrečius skaičiavimus, metodų tyrimas / Collision detection methods using parallel computing Šiukščius, Martynas 26 August 2013 (has links) Susidūrimų paieška - tai dviejų ar daugiau objektų susikirtimo radimas. Praktikoje susidūrimų paieška taikoma šiose srityse: kompiuteriniuose žaidimuose, netiesinėje baigtinių elementų analizėje, dalelių hidrodinamikoje, daugiafunkcinės dinamikos analizėje, įvairiose fizikos simuliacijose ir kt. Egzistuoja daugybė susidūrimų paieškos algoritmų, iš kurių populiariausi yra erdvinio skaidymo, hierarchinio struktūrizavimo ir atrinkimo bei rūšiavimo metodai. Šiame darbe yra tiriamas šių algoritmų veikimas ant CPU (Central processing unit) ir ant GPU (Graphics processing unit), analizuojami susidūrimų paieškos nustatymo būdai bei nagrinėjamos pasirinktų algoritmų veikimo spartinimo galimybės panaudojant CUDA (Compute Unified Device Architecture) technologiją. Ši technologija yra Nvidia sukurta nauja duomenų apdorojimo architektūra išnaudojanti grafinio procesoriaus resursus bendro pobūdžio skaičiavimams. Darbe iškeltų tikslų pasiekimui yra realizuotos kelios bazinės algoritmų versijos, jų pritaikymo lygiagretiems skaičiavimams galimybės ir taip pat atliekami bazinių algoritmų laiko, reikalingo skaičiavimams atlikti, grafinio procesoriaus atminties sąnaudos bei įvairių veikimo laiką įtakojančių faktorių tyrimai. Darbo pabaigoje aptariami lygiagretaus programavimo privalumai pritaikant nagrinėjamai temai. Šiame darbe atlikti tyrimai parodė, jog perduodant skaičiavimus į GPU pasiekiamas 200 kartų didesnis nagrinėjamų algoritmų našumas negu atliekant skaičiavimus naudojant CPU. / Collision detection is a well-studied and active research field where the main problem is to determine if one or more objects collide with each other in 3D virtual space. Collision detection is an issue affecting many different fields of study, including computer animation, physical-based simulation, robotics, video games and haptic applications. There is a big variety of collision detection algorithms of witch spatial subdivision, octree and sort and sweep are three of them. In this document we provide a short summary of collision detection algorithms, but the main focus will be on analyzing and increasing their performance working on CPU (orig. Central processing unit) and GPU (orig. Graphics processing unit) separately by making use of CUDA (orig.Compute Unified Device Architecture) technology. This technology is a part of Nvidia, witch helps the use of graphics processor for general-purpose computation. Main goal of this research is achieved by performing analysis of implemented spatial subdivision, octree and sort and sweep algorithms. This analysis consists of both general performance, parallelization performance and various performance affecting factors analyses. At the end of the document, the advantages of parallel programming adapted to the present subject are discussed. Informatics Susidūrimų paieška Grafinis procesorius Skaičiavimų spartinimas Lygiagretus skaičiavimai Collision detection GPU-based parallel computing Spatial subdivision Parallelization
317	Lygiagretieji skaičiavimai naudojant vaizdo plokštes / Parallel computing using graphics cards Juodaitis, Robertas 01 August 2013 (has links) Šiame darbe lyginami vaizdo plokštės ir MPI lygiagrečiųjų skaičiavimų pajėgumai klasikiniais lygiagretinimo algoritmais: apytikslės π reikšmės skaičiavimo, matricų daugybos. Daug dėmesio skiriama uždavinių lygiagretinimo strategijos parinkimui, efektyviai išnaudoti tiek MPI klasterį, tiek vaizdo plokštę. Nustatytas tinkamas šių įrenginių palyginimui kriterijus – santykinis pagreitėjimas, objektyviai nusakantis, kokį skaičiavimo pajėgumą pasiekia vaizdo plokštė prieš centrinį procesorių. Išanalizavus eksperimentų rezultatus nustatyta, kad programuotojas turi siekti mažesnio duomenų apsikeitimo tarp procesų, nes komunikavimas mažina lygiagrečiųjų algoritmų efektyvumą. Taip pat nustatyta, kad programavimas Cuda reikalauja griežto prisitaikymo prie vaizdo plokštės parametrų ir yra sudėtingesnis. Kaip rezultatas - pilnai apkrauta vaizdo plokštė su Cuda yra spartesnė ne tik už kompiuterius su 4 branduolių procesoriumi, bet ir nedidelį klasterį. / This work compares two different kinds of computing devices – video card and central processor unit for general purpose computing in parallel. MPI library used for central processor unit, Cuda used for video card, compute classic parallel algorithm approximate π value and matrix multiplication. Our main attention - better strategies working with MPI cluster and Cuda to completely utilize these two kind computing resources. There are found objective method to compare video card and central processor unit computing advantages – relative speedup. After analyze experiment result there are found some advice for programmer. Programmers must find the ways to communicate between processes more rarely, because communication lowers efficiency of parallel algorithm. Programming with Cuda requires much more skills and flexibility to work efficiency with video card device. As a result fully utilized video card with Cuda is faster than computer with 4 cores CPU and little cluster. Informatics Engineering Lygiagretieji skaičiavimai Vaizdo plokštė Cuda Vertinimo kriterijai MPI biblioteka Parallel computing Video card Cuda Compare MPI library
318	COMPUTER SIMULATION OF A HOLLOW-FIBER BIOREACTOR: HEPARAN REGULATED GROWTH FACTORS-RECEPTORS BINDING AND DISSOCIATION ANALYSIS Zhang, Changjiang 01 January 2011 (has links) This thesis demonstrates the use of numerical simulation in predicting the behavior of proteins in a flow environment. A novel convection-diffusion-reaction computational model is first introduced to simulate fibroblast growth factor (FGF-2) binding to its receptor (FGFR) on cell surfaces and regulated by heparan sulfate proteoglycan (HSPG) under flow in a bioreactor. The model includes three parts: (1) the flow of medium using incompressible Navier-Stokes equations; (2) the mass transport of FGF-2 using convection-diffusion equations; and (3) the cell surface binding using chemical kinetics. The model consists of a set of coupled nonlinear partial differential equations (PDEs) for flow and mass transport, and a set of coupled nonlinear ordinary differential equations (ODEs) for binding kinetics. To handle pulsatile flow, several assumptions are made including neglecting the entrance effects and an approximate analytical solution for axial velocity within the fibers is obtained. To solve the time-dependent mass transport PDEs, the second order implicit Euler method by finite volume discretization is used. The binding kinetics ODEs are stiff and solved by an ODE solver (CVODE) using Newton’s backward differencing formula. To obtain a reasonable accuracy of the biochemical reactions on cell surfaces, a uniform mesh is used. This basic model can be used to simulate any growth factor-receptor binding on cell surfaces on the wall of fibers in a bioreactor, simply by replacing binding kinetics ODEs. Circulation is an important delivery method for natural and synthetic molecules, but microenvironment interactions, regulated by endothelial cells and critical to the molecule’s fate, are difficult to interpret using traditional approaches. Growth factor capture under flow is analyzed and predicted using computer modeling mentioned above and a three-dimensional experimental approach that includes pertinent circulation characteristics such as pulsatile flow, competing binding interactions, and limited bioavailability. An understanding of the controlling features of this process is desired. The experimental module consists of a bioreactor with synthetic endotheliallined hollow fibers under flow. The physical design of the system is incorporated into the model parameters. FGF-2 is used for both the experiments and simulations. The computational model is based on the flow and reactions within a single hollow fiber and is scaled linearly by the total number of fibers for comparison with experimental results. The model predicts, and experiments confirm, that removal of heparan sulfate (HS) from the system will result in a dramatic loss of binding by heparin-binding proteins, but not by proteins that do not bind heparin. The model further predicts a significant loss of bound protein at flow rates only slightly higher than average capillary flow rates, corroborated experimentally, suggesting that the probability of capture in a single pass at high flow rates is extremely low. Several other key parameters are investigated with the coupling between receptors and proteoglycans shown to have a critical impact on successful capture. The combined system offers opportunities to examine circulation capture in a straightforward quantitative manner that should prove advantageous for biological or drug delivery investigations. For some complicated binding systems, where there are more growth factors or proteins with competing binding among them moving through hollow fibers of a bioreactor coupled with biochemical reactions on cell surfaces on the wall of fibers, a complex model is deduced from the basic model mentioned above. The fluid flow is also modeled by incompressible Navier-Stokes equations as mentioned in the basic model, the biochemical reactions in the fluid and on the cell surfaces are modeled by two distinctive sets of coupled nonlinear ordinary differential equations, and the mass transports of different growth factors or complexes are modeled separately by different sets of coupled nonlinear partial differential equations. To solve this computationally intensive system, parallel algorithms are devised, in which all the numerical computations are solved in parallel, including the discretization of mass transport equations and the linear system solver Stone’s Implicit Procedure (SIP). A parallel SIP solver is designed, in which pipeline technique is used for LU factorization and an overlapped Jacobi iteration technique is chosen for forward and backward substitutions. For solving binding equations ODEs in the fluid and on cell surfaces, a parallel scheme combined with a sequential CVODE solver is used. The simulation results are obtained to demonstrate the computational efficiency of the algorithms and further experiments need to be conducted to verify the predictions. Numerical simulation laminar convection diffusion flow mass transport parallel computing Biochemistry Computer Engineering Systems Biology
319	Application of L1 Minimization Technique to Image Super-Resolution and Surface Reconstruction Talavatifard, Habiballah 03 October 2013 (has links) A surface reconstruction and image enhancement non-linear finite element technique based on minimization of L1 norm of the total variation of the gradient is introduced. Since minimization in the L1 norm is computationally expensive, we seek to improve the performance of this algorithm in two fronts: first, local L1- minimization, which allows parallel implementation; second, application of the Augmented Lagrangian method to solve the minimization problem. We show that local solution of the minimization problem is feasible. Furthermore, the Augmented Lagrangian method can successfully be used to solve the L1 minimization problem. This result is expected to be useful for improving algorithms computing digital elevation maps for natural and urban terrain, fitting surfaces to point-cloud data, and image super-resolution. L1 minimization Parallel Computing Image Super-resolution Domain Decomposition Interior Point method Augmented Lagrangian Method Surface Reconstruction
320	A Parallel Graph Partitioner for STAPL Castet, Nicolas 03 October 2013 (has links) Multi-core architectures are present throughout a large selection of computing devices from cell phones to super-computers. Parallel applications running on these devices solve bigger problems in a shorter time. Writing those applications is a difficult task for programmers. They need to deal with low-level parallel mechanisms such as data distribution, inter-processor communication, and task placement. The goal of the Standard Template Adaptive Parallel Library (STAPL) is to provide a generic high-level framework to develop parallel applications. One of the first steps of a parallel application is to partition and distribute the data throughout the system. An important data structure for parallel applications to store large amounts of data and model many types of relations is the graph. A mesh, which is a special type of graph, is often used to model a spatial domain in scientific applications. Graph and mesh partitioning has many applications such as VLSI circuit design, parallel task scheduling, and data distribution. Data distribution, significantly impacts the performance of a parallel application. In this thesis, we introduce the STAPL Parallel Graph Partitioner Framework. This framework provides a generic infrastructure to partition arbitrary graphs and meshes and to build customized partitioners. It includes the state of the art parallel k-way multilevel scheme to partition arbitrary graphs, a parallel mesh partitioner with parameterized partition shape, and a customized partitioner used for discrete ordinates particle transport computations. This framework is also part of a generic library, STAPL, allowing the partitioning of the data and development of the whole parallel application to be done in the same environment. We show the user-friendly interface of the framework and its scalability for partitioning different mesh and graph benchmarks on a Cray XE6 system. We also highlight the performance of our customized unstructured mesh partitioner for a discrete ordinates particle transport code. The developed columnar decompositions significantly reduce the execution time of simultaneous sweeps on unstructured meshes. Parallel graph partitioning Parallel computing STAPL Multilevel scheme Parallel mesh partitioning Unstructured mesh

Search results