Global ETD Search

41	MPI WITHIN A GPU Young, Bobby Dalton 01 January 2009 (has links) GPUs offer high-performance floating-point computation at commodity prices, but their usage is hindered by programming models which expose the user to irregularities in the current shared-memory environments and require learning new interfaces and semantics. This thesis will demonstrate that the message-passing paradigm can be conceptually cleaner than the current data-parallel models for programming GPUs because it can hide the quirks of current GPU shared-memory environments, as well as GPU-specific features, behind a well-established and well-understood interface. This will be shown by demonstrating a proof-of-concept MPI implementation which provides cleaner, simpler code with a reasonable performance cost. This thesis will also demonstrate that, although there is a virtualization constraint imposed by MPI, this constraint is harmless as long as the virtualization was already chosen to be optimal in terms of a strong execution model and nearly-optimal execution time. This will be demonstrated by examining execution times with varying virtualization using a computationally-expensive micro-kernel. message-passing virtualization data-parallel virtualization MPI GPU Electrical and Computer Engineering
42	Portierbare numerische Simulation auf parallelen Architekturen Rehm, W. 30 October 1998 (has links) (PDF) The workshop ¨Portierbare numerische Simulationen auf parallelen Architekturen¨ (¨Portable numerical simulations on parallel architectures¨) was organized by the Fac- ulty of Informatics/Professorship Computer Architecture at 18 April 1996 and held in the framework of the Sonderforschungsbereich (Joint Research Initiative) ¨Numerische Simulationen auf massiv parallelen Rechnern¨ (SFB 393) (¨Numerical simulations on massiv parallel computers¨) ( http://www.tu-chemnitz.de/~pester/sfb/sfb393.html ) The SFB 393 is funded by the German National Science Foundation (DFG). The purpose of the workshop was to bring together scientists using parallel computing to provide integrated discussions on portability issues, requirements and future devel- opments in implementing parallel software efficiently as well as portable on Clusters of Symmetric Multiprocessorsystems. I hope that the present paper gives the reader some helpful hints for further discussions in this field. Parallel Computing FEM Shared Memory Implementations Message Passing Efficiency MSC 68M99 MSC 68Q22 ddc:004 Benchmarking
43	Optimization methods for side-chain positioning and macromolecular docking Moghadasi, Mohammad 08 April 2016 (has links) This dissertation proposes new optimization algorithms targeting protein-protein docking which is an important class of problems in computational structural biology. The ultimate goal of docking methods is to predict the 3-dimensional structure of a stable protein-protein complex. We study two specific problems encountered in predictive docking of proteins. The first problem is Side-Chain Positioning (SCP), a central component of homology modeling and computational protein docking methods. We formulate SCP as a Maximum Weighted Independent Set (MWIS) problem on an appropriately constructed graph. Our formulation also considers the significant special structure of proteins that SCP exhibits for docking. We develop an approximate algorithm that solves a relaxation of MWIS and employ randomized estimation heuristics to obtain high-quality feasible solutions to the problem. The algorithm is fully distributed and can be implemented on multi-processor architectures. Our computational results on a benchmark set of protein complexes show that the accuracy of our approximate MWIS-based algorithm predictions is comparable with the results achieved by a state-of-the-art method that finds an exact solution to SCP. The second problem we target in this work is protein docking refinement. We propose two different methods to solve the refinement problem. The first approach is based on a Monte Carlo-Minimization (MCM) search to optimize rigid-body and side-chain conformations for binding. In particular, we study the impact of optimally positioning the side-chains in the interface region between two proteins in the process of binding. We report computational results showing that incorporating side-chain flexibility in docking provides substantial improvement in the quality of docked predictions compared to the rigid-body approaches. Further, we demonstrate that the inclusion of unbound side-chain conformers in the side-chain search introduces significant improvement in the performance of the docking refinement protocols. In the second approach, we propose a novel stochastic optimization algorithm based on Subspace Semi-Definite programming-based Underestimation (SSDU), which aims to solve protein docking and protein structure prediction. SSDU is based on underestimating the binding energy function in a permissive subspace of the space of rigid-body motions. We apply Principal Component Analysis (PCA) to determine the permissive subspace and reduce the dimensionality of the conformational search space. We consider the general class of convex polynomial underestimators, and formulate the problem of finding such underestimators as a Semi-Definite Programming (SDP) problem. Using these underestimators, we perform a biased sampling in the vicinity of the conformational regions where the energy function is at its global minimum. Moreover, we develop an exploration procedure based on density-based clustering to detect the near-native regions even when there are many local minima residing far from each other. We also incorporate a Model Selection procedure into SSDU to pick a predictive conformation. Testing our algorithm over a benchmark of protein complexes indicates that SSDU substantially improves the quality of docking refinement compared with existing methods. Bioinformatics Message-passing Monte Carlo-minimization Optimization Protein docking Semi-definite programming Side-chain positioning
44	A Unified Robust Minimax Framework for Regularized Learning Problems Zhou, Hongbo 01 May 2014 (has links) Regularization techniques have become a principled tool for model-based statistics and artificial intelligence research. However, in most situations, these regularization terms are not well interpreted, especially on how they are related to the loss function and data matrix in a given statistic model. In this work, we propose a robust minimax formulation to interpret the relationship between data and regularization terms for a large class of loss functions. We show that various regularization terms are essentially corresponding to different distortions to the original data matrix. This supplies a unified framework for understanding various existing regularization terms, designing novel regularization terms based on perturbation analysis techniques, and inspiring novel generic algorithms. To show how to apply minimax related concepts to real-world learning tasks, we develop a new fault-tolerant classification framework to combat class noise for general multi-class classification problems; further, by studying the relationship between the majorizable function class and the minimax framework, we develop an accurate, efficient, and scalable algorithm for solving a large family of learning formulations. In addition, this work has been further extended to tackle several important matrix-decomposition-related learning tasks, and we have validated our work on various real-world applications including structure-from-motion (with missing data) and latent structure dictionary learning tasks. This work, composed of a unified formulation, a scalable algorithm, and promising applications in many real-world learning problems, contributes to the understanding of various hidden robustness in many learning models. As we show, many classical statistical machine learning models can be unified using this formulation and accurate, efficient, and scalable algorithms become available from our research. Continous Event Recognition Machine Learning Message Passing Algorithm Minimax Framework Regularization Robust
45	Segmentation vidéo et suivi d'objets multiples / Video segmentation and multiple object tracking Kumar, Ratnesh 15 December 2014 (has links) Dans cette thèse nous proposons de nouveaux algorithmes d'analyse vidéo. La première contribution de cette thèse concerne le domaine de la segmentation de vidéos avec pour objectif d'obtenir une segmentation dense et spatio-temporellement cohérente. Nous proposons de combiner les aspects spatiaux et temporels d'une vidéo en une seule notion, celle de Fibre. Une fibre est un ensemble de trajectoires qui sont spatialement connectées par un maillage. Les fibres sont construites en évaluant simultanément les aspects spatiaux et temporels. Par rapport a l’état de l'art une segmentation de vidéo a base de fibres présente comme avantages d’accéder naturellement au voisinage grâce au maillage et aux correspondances temporelles pour la plupart des pixels de la vidéo. De plus, cette segmentation à base de fibres a une complexité quasi linéaire par rapport au nombre de pixels. La deuxième contribution de cette thèse concerne le suivi d'objets multiples. Nous proposons une approche de suivi qui utilise des caractéristiques des points suivis, la cinématique des objets suivis et l'apparence globale des détections. L'unification de toutes ces caractéristiques est effectuée avec un champ conditionnel aléatoire. Ensuite ce modèle est optimisé en combinant les techniques de passage de message et une variante de processus ICM (Iterated Conditional Modes) pour inférer les trajectoires d'objet. Une troisième contribution mineure consiste dans le développement d'un descripteur pour la mise en correspondance d'apparences de personne. Toutes les approches proposées obtiennent des résultats compétitifs ou meilleurs (qualitativement et quantitativement) que l’état de l'art sur des base de données. / In this thesis we propose novel algorithms for video analysis. The first contribution of this thesis is in the domain of video segmentation wherein the objective is to obtain a dense and coherent spatio-temporal segmentation. We propose joining both spatial and temporal aspects of a video into a single notion Fiber. A fiber is a set of trajectories which are spatially connected by a mesh. Fibers are built by jointly assessing spatial and temporal aspects of the video. Compared to the state-of-the-art, a fiber based video segmentation presents advantages such as a natural spatio-temporal neighborhood accessor by a mesh, and temporal correspondences for most pixels in the video. Furthermore, this fiber-based segmentation is of quasi-linear complexity w.r.t. the number of pixels. The second contribution is in the realm of multiple object tracking. We proposed a tracking approach which utilizes cues from point tracks, kinematics of moving objects and global appearance of detections. Unification of all these cues is performed on a Conditional Random Field. Subsequently this model is optimized by a combination of message passing and an Iterated Conditional Modes (ICM) variant to infer object-trajectories. A third, minor, contribution relates to the development of suitable feature descriptor for appearance matching of persons. All of our proposed approaches achieve competitive and better results (both qualitatively and quantitatively) than state-of-the-art on open source datasets. Fibres Problème de partitionnement Passage de messages Fibers Iterated conditional modes Partitioning problem Message passing Point tracks
46	Calcul haute performance pour la simulation d'interactions fluide-structure Partimbene, Vincent 25 April 2018 (has links) (PDF) Cette thèse aborde la résolution des problèmes d'interaction fluide-structure par un algorithme consistant en un couplage entre deux solveurs : un pour le fluide et un pour la structure. Pour assurer la cohérence entre les maillages fluide et structure, on considère également une discrétisation de chaque domaine par volumes finis. En raison des difficultés de décomposition du domaine en sous-domaines, nous considérons pour chaque environnement un algorithme parallèle de multi-splitting (ou multi-décomposition) qui correspond à une présentation unifiée des méthodes de sous-domaines avec ou sans recouvrement. Cette méthode combine plusieurs applications de points fixes contractantes et nous montrons que, sous des hypothèses appropriées, chaque application de points fixes est contractante dans des espaces de dimensions finies normés par des normes hilbertiennes et non-hilbertiennes. De plus, nous montrons qu'une telle étude est valable pour les résolutions parallèles synchrones et plus généralement asynchrones de grands systèmes linéaires apparaissant lors de la discrétisation des problèmes d'interaction fluide-structure et peut être étendue au cas où le déplacement de la structure est soumis à des contraintes. Par ailleurs, nous pouvons également considérer l’analyse de la convergence de ces méthodes de multi-splitting parallèles asynchrones par des techniques d’ordre partiel, lié au principe du maximum discret, aussi bien dans le cadre linéaire que dans celui obtenu lorsque les déplacements de la structure sont soumis à des contraintes. Nous réalisons des simulations parallèles pour divers cas test fluide-structure sur différents clusters, en considérant des communications bloquantes et non bloquantes. Dans ce dernier cas nous avons eu à résoudre une difficulté d'implémentation dans la mesure où une erreur irrécupérable survenait lors de l'exécution ; cette difficulté a été levée par introduction d’une méthode assurant la terminaison de toutes les communications non bloquantes avant la mise à jour du maillage. Les performances des simulations parallèles sont présentées et analysées. Enfin, nous appliquons la méthodologie présentée précédemment à divers contextes d'interaction fluide-structure de type industriel sur des maillages non structurés, ce qui constitue une difficulté supplémentaire. Interaction fluide-structure Multi-splitting Volumes finis Algorithmes parallèles Message Passing Interface
47	Paralelização de um programa para cálculo de propriedades físicas de impurezas magnéticas em metais. / Parallelization of a program that calculates physical properties of magnetic impurities in metals. Eloiza Helena Sonoda 10 August 2001 (has links) Este trabalho se dedica à paralelização de um programa para cálculos de propriedades físicas de ligas magnéticas diluídas. O método do grupo de renormalização aplicado ao modelo de Anderson de duas impurezas se mostrou particularmente adequado ao processamento paralelo visto que grande parte dos cálculos pode ser executada simultaneamente, assim como variações nos conjuntos de dados requeridas pelo método. Para tal reescrevemos o programa seqüencial usado anteriormente pelo Grupo de Física Teórica do IFSC e implementamos três versões paralelas. Essas versões diferem entre si em relação à abordagem dada à paralelização. O uso de clusters de computadores se revelou uma opção conveniente pois verificamos que o limitante no desempenho é o tempo tomado pelos cálculos e não pela comunicação. Os resultados mostram uma grande redução no tempo total de execução, porém deficiências no speedup e escalabilidade devido a problemas de balanceamento de carga. Analisamos esses problemas e sugerimos alternativas para solucioná-los. / This dissertation discuss the parallelization of a program that calculates physical properties of dilute magnetic alloys. The renormalization group method applied to Anderson's two impurities model showed to be specially suitable to parallel processing because a large amount of calculations as well as variations of data entries required by the method can be performed simultaneously. To achieve this we rewrote the sequential program previously used by the Theoretical Physics Group of the IFSC and wrote three parallel versions. These versions differ from each other by the parallelization approach. The use of computer clusters revealed to be an appropriate option because the calculation time is the limiting factor on performance instead of communication time. The results show a good reduction of execution time, but speedup and scalability lack due to load balancing problems. We analyze these problems and suggest possible solutions. grupo de renormalização modelo de Anderson passagem de mensagens processamento paralelo Anderson model message passing parallel processing renormalization group
48	Bibliotheken zur Entwicklung paralleler Algorithmen Haase, G., Hommel, T., Meyer, A., Pester, M. 30 October 1998 (has links) (PDF) The purpose of this paper is to supply a summary of library subroutines and functions for parallel MIMD computers. The subroutines have been developed at the University of Chemnitz during a period of the last five years. In detail, they are concerned with vector operations, inter-processor communication and simple graphic output to workstations. One of the most valuable features is the machine-independence of the communication subroutines proposed in this paper for a hypercube topology of the parallel processors (excepting a kernel of only two primitive system-dependend operations). They were implemented and tested for different hardware and operating systems including transputer, nCube, KSR, PVM. The vector subroutines are optimized by the use of C language and enrolled loops (BLAS1-like). The paper includes hints for using the libraries with both Fortran and C programs. parallel algorithms message passing numerical software computer graphics MSC 65Y05 MSC 65Y25 ddc:510
49	Optimizing MPI Collective Communication by Orthogonal Structures Kühnemann, Matthias, Rauber, Thomas, Rünger, Gudula 28 June 2007 (has links) (PDF) Many parallel applications from scientific computing use MPI collective communication operations to collect or distribute data. Since the execution times of these communication operations increase with the number of participating processors, scalability problems might occur. In this article, we show for different MPI implementations how the execution time of collective communication operations can be significantly improved by a restructuring based on orthogonal processor structures with two or more levels. As platform, we consider a dual Xeon cluster, a Beowulf cluster and a Cray T3E with different MPI implementations. We show that the execution time of operations like MPI Bcast or MPI Allgather can be reduced by 40% and 70% on the dual Xeon cluster and the Beowulf cluster. But also on a Cray T3E a significant improvement can be obtained by a careful selection of the processor groups. We demonstrate that the optimized communication operations can be used to reduce the execution time of data parallel implementations of complex application programs without any other change of the computation and communication structure. Furthermore, we investigate how the execution time of orthogonal realization can be modeled using runtime functions. In particular, we consider the modeling of two-phase realizations of communication operations. We present runtime functions for the modeling and verify that these runtime functions can predict the execution time both for communication operations in isolation and in the context of application programs. communication operations message passing optimization parallel programming scientific computing ddc:000 MPI <Schnittstelle> Parallelverarbeitung
50	Studies on Discrete-Valued Vector Reconstruction from Underdetermined Linear Measurements / 劣決定線形観測に基づく離散値ベクトル再構成に関する研究 Hayakawa, Ryo 23 March 2020 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第22587号 / 情博第724号 / 新制\|\|情\|\|124(附属図書館) / 京都大学大学院情報学研究科システム科学専攻 / (主査)教授下平英寿, 教授田中利幸, 教授山下信雄, 教授林和則(大阪市立大学) / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Discrete-valued vector reconstruction Mathematical optimization Message passing algorithm Asymptotic analysis 007

Search results