Global ETD Search

241	Fast Optimization Methods for Model Predictive Control via Parallelization and Sparsity Exploitation / 並列化とスパース性の活用によるモデル予測制御の高速最適化手法 DENG, HAOYANG 23 September 2020 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第22808号 / 情博第738号 / 新制\|\|情\|\|126(附属図書館) / 京都大学大学院情報学研究科システム科学専攻 / (主査)教授大塚敏之, 教授加納学, 教授太田快人 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Model predictive control Real-time optimization Large-scale systems Parallel computing Sparsity exploitation 007
242	Extended Hydrodynamics Using the Discontinuous-Galerkin Hancock Method Kaufmann, Willem 15 September 2021 (has links) Moment methods derived from the kinetic theory of gases can be used for the prediction of continuum and non-equilibrium flows and offer numerical advantages over other methods, such as the Navier-Stokes model. Models developed in this fashion are described by first-order hyperbolic partial differential equations (PDEs) with stiff local relaxation source terms. The application of discontinuous-Galerkin (DG) methods for the solution of such models has many benefits. Of particular interest is the third-order accurate, coupled space-time discontinuous-Galerkin Hancock (DGH) method. This scheme is accurate, as well as highly efficient on large-scale distributed-memory computers. The current study outlines a general implementation of the DGH method used for the parallel solution of moment methods in one, two, and three dimensions on modern distributed clusters. An algorithm for adaptive mesh refinement (AMR) was developed alongside the implementation of the scheme, and is used to achieve even higher accuracy and efficiency. Many different first-order hyperbolic and hyperbolic-relaxation PDEs are solved to demonstrate the robustness of the scheme. First, a linear convection-relaxation equation is solved to verify the order of accuracy of the scheme in three dimensions. Next, some classical compressible Euler problems are solved in one, two, and three dimensions to demonstrate the scheme's ability to capture discontinuities and strong shocks, as well as the efficacy of the implemented AMR. A special case, Ringleb's flow, is also solved in two-dimensions to verify the order of accuracy of the scheme for non-linear PDEs on curved meshes. Following this, the shallow water equations are solved in two dimensions. Afterwards, the ten-moment (Gaussian) closure is applied to two-dimensional Stokes flow past a cylinder, showing the abilities of both the closure and scheme to accurately compute classical viscous solutions. Finally, the one-dimensional fourteen-moment closure is solved. CFD Discontinuous Galerkin Statistical Thermodynamics Kinetic Theory PDE Numerical AMR Navier Stokes Parallel Computing
243	Simulace fyzikálních jevů s využitím celulárních automatů / Simulation of Physical Phenomena Using Cellular Automata Martinek, Dominik January 2010 (has links) This master's thesis deals with modelling and simulation of physical phenomena by cellular automata. The basic methods which model physical phenomena is enumerated and descibed in this thesis. One of the important part of this thesis is a set of demonstration models. Each model is focused on one selected area of physical phenomena. All models are described by transtition rules and the procedure of derivation of these rules is also presented here. There rules were used in implemented models. Another part of this thesis contains of a simulation application for these models. The real application had been implemented in accord with this design and it has been used to perform the simulation experiments with exemplary models. Results of the simulation experiments are discussed in conclusion of this thesis. One exemplary model had also been adapted for parallel processing. The performances on a computer with different count of working processors were measured and are also discussed in the conclusion of this thesis
244	Paralelizace Goertzelova algoritmu / Parallel implementation of Goertzel algorithm Skulínek, Zdeněk January 2017 (has links) Technical problems make impossible steadily increase processor's clock frequency. Their power are currently growing due to increasing number of cores. It brings need for new approaches in programming such parallel systems. This thesis shows how to use paralelism in digital signal processing. As an example, it will be presented here implementation of the Geortzel's algorithm using the processing power of the graphics chip.
245	Programming methodologies for ADAS applications in parallel heterogeneous architectures / Méthodologies de programmation d'applications ADAS sur des architectures parallèles et hétérogènes Dekkiche, Djamila 10 November 2017 (has links) La vision par ordinateur est primordiale pour la compréhension et l’analyse d’une scène routière afin de construire des systèmes d’aide à la conduite (ADAS) plus intelligents. Cependant, l’implémentation de ces systèmes dans un réel environnement automobile et loin d’être simple. En effet, ces applications nécessitent une haute performance de calcul en plus d’une précision algorithmique. Pour répondre à ces exigences, de nouvelles architectures hétérogènes sont apparues. Elles sont composées de plusieurs unités de traitement avec différentes technologies de calcul parallèle: GPU, accélérateurs dédiés, etc. Pour mieux exploiter les performances de ces architectures, différents langages sont nécessaires en fonction du modèle d’exécution parallèle. Dans cette thèse, nous étudions diverses méthodologies de programmation parallèle. Nous utilisons une étude de cas complexe basée sur la stéréo-vision. Nous présentons les caractéristiques et les limites de chaque approche. Nous évaluons ensuite les outils employés principalement en terme de performances de calcul et de difficulté de programmation. Le retour de ce travail de recherche est crucial pour le développement de futurs algorithmes de traitement d’images en adéquation avec les architectures parallèles avec un meilleur compromis entre les performances de calcul, la précision algorithmique et la difficulté de programmation. / Computer Vision (CV) is crucial for understanding and analyzing the driving scene to build more intelligent Advanced Driver Assistance Systems (ADAS). However, implementing CV-based ADAS in a real automotive environment is not straightforward. Indeed, CV algorithms combine the challenges of high computing performance and algorithm accuracy. To respond to these requirements, new heterogeneous circuits are developed. They consist of several processing units with different parallel computing technologies as GPU, dedicated accelerators, etc. To better exploit the performances of such architectures, different languages are required depending on the underlying parallel execution model. In this work, we investigate various parallel programming methodologies based on a complex case study of stereo vision. We introduce the relevant features and limitations of each approach. We evaluate the employed programming tools mainly in terms of computation performances and programming productivity. The feedback of this research is crucial for the development of future CV algorithms in adequacy with parallel architectures with a best compromise between computing performance, algorithm accuracy and programming efforts. ADAS Traitement parallèle Computer vision Systèmes embarqués ADAS Parallel computing Computer vision Embedded systems
246	Evaluierung und Erweiterung von MapReduce-Algorithmen zur Berechnung der transitiven Hülle ungerichteter Graphen für Entity ResolutionWorkflows Ziad, Sehili 16 April 2018 (has links) Im Bereich von Entity-Resolution oder deduplication werden aufgrund fehlender global eindeutiger Identifikatoren Match-Techniken verwendet, um zu bestimmen, ob verschiedene Datensätze dasselbe Realweltobjekt darstellen. Die inhärente quadratische Komplexität führt zu sehr langen Laufzeiten für große Datenmengen, was eine Parallelisierung dieses Prozesses erfordert. MapReduce ist wegen seiner Skalierbarkeit und Einsetzbarkeit in Cloud- Infrastrukturen eine gute Lösung zur Verbesserung der Laufzeit. Außerdem kann unter bestimmten Voraussetzungen die Qualität des Match-Ergebnisses durch die Berechnung der transitiven Hülle verbessert werden. info:eu-repo/classification/ddc/000 ddc:000
247	Comparative Evaluation of Spark andStratosphere Ni, Ze January 2013 (has links) Nowadays, although MapReduce is applied to the parallel processing on big data, it has some limitations: for instance, lack of generic but efficient and richly functional primitive parallel methods, incapability of entering multiple input parameters on the entry of parallel methods, and inefficiency in the way of handling iterative algorithms. Spark and Stratosphere are developed to deal with (partly) the shortcoming of MapReduce. The goal of this thesis is to evaluate Spark and Stratosphere both from the point of view of theoretical programming model and practical execution on specified application algorithms. In the introductory section of comparative programming models, we mainly explore and compare the features of Spark and Stratosphere that overcome the limitation of MapReduce. After the comparison in theoretical programming model, we further evaluate their practical performance by running three different classes of applications and assessing usage of computing resources and execution time. It is concluded that Spark has promising features for iterative algorithms in theory but it may not achieve the expected performance improvement to run iterative applications if the amount of memory used for cached operations is close to the actual available memory in the cluster environment. In that case, the reason for the poor results in performance is because larger amount of memory participates in the caching operation and in turn, only a small amount memory is available for computing operations of actual algorithms. Stratosphere shows favorable characteristics as a general parallel computing framework, but it has no support for iterative algorithms and spends more computing resources than Spark for the same amount of work. In another aspect, applications based on Stratosphere can achieve benefits by manually setting compiler hints when developing the code, whereas Spark has no corresponding functionality. Parallel Computing Framework Distributed Computing Cluster RDDs PACTs. Engineering and Technology Teknik och teknologier
248	Towards Scalable Performance Analysis of MPI Parallel Applications Aguilar, Xavier January 2015 (has links) A considerably fraction of science discovery is nowadays relying on computer simulations. High Performance Computing (HPC) provides scientists with the means to simulate processes ranging from climate modeling to protein folding. However, achieving good application performance and making an optimal use of HPC resources is a heroic task due to the complexity of parallel software. Therefore, performance tools and runtime systems that help users to execute applications in the most optimal way are of utmost importance in the landscape of HPC. In this thesis, we explore different techniques to tackle the challenges of collecting, storing, and using fine-grained performance data. First, we investigate the automatic use of real-time performance data in order to run applications in an optimal way. To that end, we present a prototype of an adaptive task-based runtime system that uses real-time performance data for task scheduling. This runtime system has a performance monitoring component that provides real-time access to the performance behavior of anapplication while it runs. The implementation of this monitoring component is presented and evaluated within this thesis. Secondly, we explore lossless compression approaches for MPI monitoring. One of the main problems that performance tools face is the huge amount of fine-grained data that can be generated from an instrumented application. Collecting fine-grained data from a program is the best method to uncover the root causes of performance bottlenecks, however, it is unfeasible with extremely parallel applications or applications with long execution times. On the other hand, collecting coarse-grained data is scalable but sometimes not enough to discern the root cause of a performance problem. Thus, we propose a new method for performance monitoring of MPI programs using event flow graphs. Event flow graphs provide very low overhead in terms of execution time and storage size, and can be used to reconstruct fine-grained trace files of application events ordered in time. / <p>QC 20150508</p> parallel computing performance monitoring performance tools event flow graphs Computer Systems Datorsystem
249	Optimization of Heterogeneous Parallel Computing Systems using Machine Learning Adurti, Devi Abhiseshu, Battu, Mohit January 2021 (has links) Background: Heterogeneous parallel computing systems utilize the combination of different resources CPUs and GPUs to achieve high performance and, reduced latency and energy consumption. Programming applications that target various processing units requires employing different tools and programming models/languages. Furthermore, selecting the most optimal implementation, which may either target different processing units (i.e. CPU or GPU) or implement the various algorithms, is not trivial for a given context. In this thesis, we investigate the use of machine learning to address the selection problem of various implementation variants for an application running on a heterogeneous system. Objectives: This study is focused on providing an approach for optimization of heterogeneous parallel computing systems at runtime by building the most efficient machine learning model to predict the optimal implementation variant of an application. Methods: The six machine learning models KNN, XGBoost, DTC, Random Forest Classifier, LightGBM, and SVM are trained and tested using stratified k-fold on the dataset generated from the matrix multiplication application for square matrix input dimension ranging from 16x16 to 10992x10992. Results: The results of each machine learning algorithm’s finding are presented through accuracy, confusion matrix, classification report for parameters precision, recall, and F-1 score, and a comparison between the machine learning models in terms of accuracy, run-time training, and run-time prediction are provided to determine the best model. Conclusions: The XGBoost, DTC, SVM algorithms achieved 100% accuracy. In comparison to the other machine learning models, the DTC is found to be the most suitable due to its low time required for training and prediction in predicting the optimal implementation variant of the heterogeneous system application. Hence the DTC is the best suitable algorithm for the optimization of heterogeneous parallel computing. Application Heterogeneous systems Parallel computing Machine learning Optimization Computer Sciences Datavetenskap (datalogi)
250	Reducing Inter-Process Communication Overhead in Parallel Sparse Matrix-Matrix Multiplication Ahmed, Salman, Houser, Jennifer, Hoque, Mohammad A., Raju, Rezaul, Pfeiffer, Phil 01 July 2017 (has links) Parallel sparse matrix-matrix multiplication algorithms (PSpGEMM) spend most of their running time on inter-process communication. In the case of distributed matrix-matrix multiplications, much of this time is spent on interchanging the partial results that are needed to calculate the final product matrix. This overhead can be reduced with a one-dimensional distributed algorithm for parallel sparse matrix-matrix multiplication that uses a novel accumulation pattern based on the logarithmic complexity of the number of processors (i.e., O (log (p)) where p is the number of processors). This algorithm's MPI communication overhead and execution time were evaluated on an HPC cluster, using randomly generated sparse matrices with dimensions up to one million by one million. The results showed a reduction of inter-process communication overhead for matrices with larger dimensions compared to another one dimensional parallel algorithm that takes O(p) run-time complexity for accumulating the results. communication overhead MPI communication parallel computing performance analysis scalability sparse matrix-matrix multiplication Computing

Search results