Global ETD Search

81	Integrace procedurálního kódu do proudových paralelních systémů / Procedural code integration in streaming environments Brabec, Michal January 2018 (has links) Title: Procedural code integration in streaming environments Author: Mgr. Michal Brabec Department: Department of Software Engineering Supervisor: David Bednárek, Ph.D. Abstract: Streaming environments and similar parallel platforms are widely used in image, signal, or general data processing as means of achieving high perfor- mance. Unfortunately, they are often associated with domain specific program- ming languages, and thus hardly accessible for non-experts. In this work, we present a framework for transformation of a procedural code to a streaming ap- plication. We selected a restricted version of the C# language as the interface for our system, because it is widely taught and many programmers are familiar with it. This approach will allow creating streaming applications or their parts using a widely known imperative language instead of the intricate languages specific to streaming. The transformation process is based on the Hybrid Flow Graph - a novel inter- mediate code which employs the streaming paradigm and can be further convert- ed into streaming applications. The intermediate code shares the features and limitations of the streaming environments, while representing the applications without platform specific technical details, which allows us to use well known graph algorithms to work with the...
82	Artificial intelligence models for large scale buildings energy consumption analysis / Modèles d'intelligence artificielle pour analyse énergétique des bâtiments de la consommation Zhao, Haixiang 28 September 2011 (has links) La performance énergétique dans les bâtiments est influencée par de nombreux facteurs, tels que les conditions météorologiques ambiantes, la structure du bâtiment et les caractéristiques, l'occupation et leurs comportements, l'opération de sous-composants de niveau comme le chauffage, de ventilation et de climatisation (CVC). Cette propriété rend complexe la prévision, l'analyse, ou faute de détection / diagnostic de la consommation énergétique du bâtiment est très difficile d'effectuer rapidement et avec précision. Cette thèse se concentre principalement sur la mise à jour des modèles d'intelligence artificielle avec des applications pour résoudre ces problèmes. Tout d'abord, nous passons en revue les modèles récemment développés pour résoudre ces problèmes, y compris des méthodes d'ingénierie détaillée et simplifiée, les méthodes statistiques et les méthodes d'intelligence artificielle. Puis nous simulons des profils de consommation d'énergie pour les bâtiments simples et multiples, et basé sur ces ensembles de données, des modèles de soutien vecteur de la machine sont formés et testés pour faire la prédiction. Les résultats des expériences montrent vaste précision de la prédiction haute et la robustesse de ces modèles. Deuxièmement, déterministe récursif Perceptron (RDP) modèle de réseau neuronal est utilisé pour détecter et diagnostiquer défectueuse consommation d'énergie du bâtiment. La consommation anormale est simulé par l'introduction manuelle d'une dégradation des performances des appareils électriques. Dans l'expérience, le modèle montre la capacité de détection RDP très élevé. Une nouvelle approche est proposée pour diagnostiquer des défauts. Il est basé sur l'évaluation des modèles RDP, dont chacun est capable de détecter une panne de matériel. Troisièmement, nous examinons comment la sélection des sous-ensembles caractéristiques de l'influence la performance du modèle. Les caractéristiques optimales sont choisis en fonction de la faisabilité de l'obtention eux et sur les scores qu'ils fournissent dans l'évaluation de deux méthodes de filtrage. Les résultats expérimentaux confirmer la validité de l'ensemble sélectionné et montrent que la proposé la méthode de sélection fonction peut garantir l'exactitude du modèle et réduit le temps de calcul. Un défi de la consommation énergétique du bâtiment est d'accélérer la prédiction de formation du modèle lorsque les données sont très importantes. Cette thèse propose une mise en œuvre efficace parallèle de Support Vector Machines basée sur la méthode de décomposition pour résoudre de tels problèmes. La parallélisation est réalisée sur le travail le plus fastidieux de formation, c'est à dire de mettre à jour le vecteur gradient de f. Les problèmes intérieurs sont traitées par solveur d'optimisation séquentielle minimale. Le parallélisme sous-jacente est réalisée par la version de mémoire partagée de Map-Reduce paradigme, qui rend le système particulièrement adapté pour être appliqué à des systèmes multi-core et multi-processeurs. Les résultats expérimentaux montrent que notre implémentation offre une augmentation de la vitesse élevée par rapport à libsvm, et il est supérieur à l'état de l'art Pisvm application MPI à la fois la rapidité et l'exigence de stockage. / The energy performance in buildings is influenced by many factors, such as ambient weather conditions, building structure and characteristics, occupancy and their behaviors, the operation of sub-level components like Heating, Ventilation and Air-Conditioning (HVAC) system. This complex property makes the prediction, analysis, or fault detection/diagnosis of building energy consumption very difficult to accurately and quickly perform. This thesis mainly focuses on up-to-date artificial intelligence models with the applications to solve these problems. First, we review recently developed models for solving these problems, including detailed and simplified engineering methods, statistical methods and artificial intelligence methods. Then we simulate energy consumption profiles for single and multiple buildings, and based on these datasets, support vector machine models are trained and tested to do the prediction. The results from extensive experiments demonstrate high prediction accuracy and robustness of these models. Second, Recursive Deterministic Perceptron (RDP) neural network model is used to detect and diagnose faulty building energy consumption. The abnormal consumption is simulated by manually introducing performance degradation to electric devices. In the experiment, RDP model shows very high detection ability. A new approach is proposed to diagnose faults. It is based on the evaluation of RDP models, each of which is able to detect an equipment fault.Third, we investigate how the selection of subsets of features influences the model performance. The optimal features are selected based on the feasibility of obtaining them and on the scores they provide under the evaluation of two filter methods. Experimental results confirm the validity of the selected subset and show that the proposed feature selection method can guarantee the model accuracy and reduces the computational time.One challenge of predicting building energy consumption is to accelerate model training when the dataset is very large. This thesis proposes an efficient parallel implementation of support vector machines based on decomposition method for solving such problems. The parallelization is performed on the most time-consuming work of training, i.e., to update the gradient vector f. The inner problems are dealt by sequential minimal optimization solver. The underlying parallelism is conducted by the shared memory version of Map-Reduce paradigm, making the system particularly suitable to be applied to multi-core and multiprocessor systems. Experimental results show that our implementation offers a high speed increase compared to Libsvm, and it is superior to the state-of-the-art MPI implementation Pisvm in both speed and storage requirement. Efficacité énergétique Bâtiment Calcul parallèle Energy efficiency Building Parallel computing
83	Accelerating the knapsack problem on GPUs Suri, Bharath January 2011 (has links) The knapsack problem manifests itself in many domains like cryptography, financial domain and bio-informatics. Knapsack problems are often inside optimization loops in system-level design and analysis of embedded systems as well. Given a set of items, each associated with a profit and a weight, the knapsack problem deals with how to choose a subset of items such that the profit is maximized and the total weight of the chosen items is less than the capacity of the knapsack. There exists several variants and extensions of this knapsack problem. In this thesis, we focus on the multiple-choice knapsack problem, where the items are grouped into disjoint classes. However, the multiple-choice knapsack problem is known to be NP-hard. While many different heuristics and approximation schemes have been proposed to solve the problem in polynomial-time, such techniques do not return the optimal solution. A dynamic programming algorithm to solve the problem optimally is known, but has a pseudo-polynomial running time. This leads to high running times of tools in various application domains where knapsack problems must be solved. Many system-level design tools in the embedded systems domain, in particular, would suffer from high running when such a knapsack problem must be solved inside a larger optimization loop. To mitigate the high running times of such algorithms, in this thesis, we propose a GPU-based technique to solve the multiple-choice knapsack problem. We study different approaches to map the dynamic programming algorithm on the GPU and compare their performance in terms of the running times. We employ GPU specific methods to further improve the running times like exploiting the GPU on-chip shared memory. Apart from results on synthetic test-cases, we also demonstrate the applicability of our technique in practice by considering a case-study from system-level design. Towards this, we consider the problem of instruction-set selection for customizable processors. gpgpu knapsack parallel computing Computer Sciences Datavetenskap (datalogi)
84	High-performance particle simulation using CUDA Kalms, Mikael January 2015 (has links) Over the past 15 years, modern PC graphics cards (GPUs) have changed from being pure graphics accelerators into parallel computing platforms.Several new parallel programming languages have emerged, including NVIDIA's parallel programming language for GPUs (CUDA). This report explores two related problems in parallel: How well-suited is CUDA for implementing algorithms that utilize non-trivial data structures?And, how does one develop a complex algorithm that uses a CUDA system efficiently? A guide for how to implement complex algorithms in CUDA is presented. Simulation of a dense 2D particle system is chosen as the problem domain foralgorithm optimization. Two algorithmic optimization strategies are presented which reduce the computational workload when simulating theparticle system. The strategies can either be used independently, or combined for slightly improved results. Finally, the resultingimplementations are benchmarked against a simpler implementation on a normal PC processor (CPU) as well as a simpler GPU-algorithm. A simple GPU solution is shown to run at least 10 times faster than a simple CPU solution. An improved GPU solution can thenyield another 10 times speed-up, while sacrificing some accuracy. CUDA parallel computing particle simulation GPU Computer Engineering Datorteknik
85	gcn.MOPS: accelerating cn.MOPS with GPU Alkhamis, Mohammad 16 June 2017 (has links) cn.MOPS is a model-based algorithm used to quantitatively detect copy-number variations in next-generation, DNA-sequencing data. The algorithm is implemented as an R package and can speed up processing with multi-CPU parallelism. However, the maximum achievable speedup is limited by the overhead of multi-CPU parallelism, which increases with the number of CPU cores used. In this thesis, an alternative mechanism of process acceleration is proposed. Using one CPU core and a GPU device, the proposed solution, gcn.MOPS, achieved a speedup factor of 159× and decreased memory usage by more than half. This speedup was substantially higher than the maximum achievable speedup in cn.MOPS, which was ∼20×. / Graduate / 0984 / 0544 / 0715 / alkhamis@uvic.ca GPU GPGPU cn.MOPS gcn.MOPS CUDA C++ parallel computing CNV
86	Parallel methods for classical and disordered Spin models Navarro Guerrero, Cristóbal Alejandro January 2015 (has links) Doctor en Ciencias, Mención Computación / En las últimas décadas han crecido la cantidad de trabajos que buscan encontrar metodos eficientes que describan el comportamiento macroscópico de los sistemas de spin, a partir de una definición microscópica. Los resultados que se obtienen de estos sistemas no solo sirven a la comunidad fı́sica, sino también a otras áreas como dinámica molecular, redes sociales o problemas de optimización, entre otros. El hecho de que los sistemas de spin puedan explicar fenómenos de otras áreas ha generado un interés global en el tema. El problema es, sin embargo, que el costo computacional de los métodos involucrados llega a ser muy alto para fines prácticos. Por esto, es de gran interés estudiar como la computación paralela, combinada con nuevas estrategias algorı́tmicas, puede generar una mejora en velocidad y eficiencia sobre los metodos actuales. En esta tesis se presentan dos contribuciones; (1) un algoritmo exacto multi-core distribuido de tipo transfer matrix y (2) un método Monte Carlo multi-GPU para la sim- ulación del modelo 3D Random Field Ising Model (RFIM). La primera contribución toma ventaja de las relaciones jerárquicas encontradas en el espacio de configuraciones del problema para agruparlas en árboles de familias que se solucionan en paralelo. La segunda contribución extiende el método Exchange Monte Carlo como un algoritmo paralelo multi-GPU que in- cluye una fase de adaptación de temperaturas para mejorar la calidad de la simulación en las zonas de temperatura mas complejas de manera dinámica. Los resultados muestran que el nuevo algoritmo de transfer matrix reduce el espacio de configuraciones desde O(4^m ) a O(3^m ) y logra un fixed-size speedup casi lineal con aproxi- madamente 90% de eficiencia al solucionar los problemas de mayor tamaño. Para el método multi-GPU Monte Carlo, se proponen dos niveles de paralelismo; local, que escala con GPUs mas rápidas y global, que escala con múltiples GPUs. El método logra una aceleración de entre uno y dos ordenes de magnitud respecto a una implementación de referencia en CPU, y su paralelismo escala con aproximadamente 99% de eficiencia. La estrategia adaptativa de distribución de temperaturas incrementa la taza de intercambio en las zonas que estaban mas comprometidas sin aumentar la taza en el resto de las zonas, generando una simulación mas rápida aun y de mejor calidad a que si se usara una distribución uniforme de temperaturas. Las contribuciones logradas han permitido obtener nuevos resultados para el área de la fı́sica, como el calculo de la matriz transferencia para el kagome lattice en m = 9 y la simulación del modelo 3D Random Field Ising Model en L = {32, 64}. Algoritmos computacionales Ciencia de la computación Parallel computing Sistemas de Spin GPU
87	Finding Community Structures In Social Activity Data Peng, Chengbin 19 May 2015 (has links) Social activity data sets are increasing in number and volume. Finding community structure in such data is valuable in many applications. For example, understand- ing the community structure of social networks may reduce the spread of epidemics or boost advertising revenue; discovering partitions in tra c networks can help to optimize routing and to reduce congestion; finding a group of users with common interests can allow a system to recommend useful items. Among many aspects, qual- ity of inference and e ciency in finding community structures in such data sets are of paramount concern. In this thesis, we propose several approaches to improve com- munity detection in these aspects. The first approach utilizes the concept of K-cores to reduce the size of the problem. The K-core of a graph is the largest subgraph within which each node has at least K connections. We propose a framework that accelerates community detection. It first applies a traditional algorithm that is relatively slow to the K-core, and then uses a fast heuristic to infer community labels for the remaining nodes. The second approach is to scale the algorithm to multi-processor systems. We de- vise a scalable community detection algorithm for large networks based on stochastic block models. It is an alternating iterative algorithm using a maximum likelihood ap- proach. Compared with traditional inference algorithms for stochastic block models, our algorithm can scale to large networks and run on multi-processor systems. The time complexity is linear in the number of edges of the input network. The third approach is to improve the quality. We propose a framework for non- negative matrix factorization that allows the imposition of linear or approximately linear constraints on each factor. An example of the applications is to find community structures in bipartite networks, which is useful in recommender systems. Our algorithms are compared with the results in recent papers and their quality and e ciency are verified by experiments. Community Detection Scalability Parallel Computing Constrained NMF Social Networks
88	On the Extensions of the Predictor-Corrector Proximal Multiplier (PCPM) Algorithm and Their Applications Run Chen (9739499) 15 December 2020 (has links) <div>Many real-world application problems can be modeled mathematically as constrained convex optimization problems. The scale of such problems can be extremely large, posing significant challenges to traditional centralized algorithms and calling for efficient and scalable distributed algorithms. However, most of the existing works on distributed optimization have been focusing on block-separable problems with simple, linear constraints, such as the consensus-type constraints. The focus of this dissertation is to propose distributed algorithms to solve (possibly non-separable) large-scale optimization problems with more complicated constraints with parallel updating (aka in Jacobi fashion), instead of sequential updating in the form of Gauss-Seidel iterations. In achieving so, this dissertation extends the predictor corrector proximal multiplier method (PCPM) to address three issues when solving a large-scale constrained convex optimization problem: (i) non-linear coupling constraints; (ii) asynchronous iterative scheme; (iii) non-separable objective function and coupling constraints. </div><div><br></div><div>The idea of the PCPM algorithm is to introduce a predictor variable for the Lagrangian multiplier to avoid the augmented term, hence removing the coupling of block variables while still achieving convergence without restrictive assumptions. Building upon this algorithmic idea, we propose three distributed algorithms: (i) N-block PCPM algorithm for solving N-block convex optimization problems with both linear and nonlinear coupling constraints; (ii) asynchronous N-block PCPM algorithm for solving linearly constrained N-block convex optimization problems; (iii) a distributed algorithm, named PC<sup>2</sup>PM, for solving large-scale convex quadratically constrained quadratic programs (QCQPs). The global convergence is established for each of the three algorithms. Extensive numerical experiments, using various data sets, are conducted on either a single-node computer or a multi-node computer cluster through message passing interface (MPI). Numerical results demonstrate the efficiency and scalability of the proposed algorithms.</div><div><br></div><div>A real application of the N-block PCPM algorithm to solve electricity capacity expansion models is also studied in this dissertation. A hybrid scenario-node-realization decomposition method, with extended nonanticipativity constraints, is proposed for solving the resulting large-scale optimization problem from a multi-scale, multi-stage stochastic program under various uncertainties with different temporal scales. A technique of orthogonal projection is exploited to simplify the iteration steps, which leads to a simplified N-block PCPM algorithm amenable to massive parallelization during iterations. Such an algorithm exhibits much more scalable numerical performance when compared with the widely used progressive hedging approach (PHA) for solving stochastic programming problems.</div> Operations Research Optimisation Large-scale optimization Distributed algorithm Parallel Computing
89	Accelerator-based look-up table for coarse-grained molecular dynamics computations Gangopadhyay, Ananya 13 May 2019 (has links) Molecular Dynamics (MD) is a simulation technique widely used by computational chemists and biologists to simulate and observe the physical properties of a system of particles or molecules. The method provides invaluable three-dimensional structural and transport property data for macromolecules that can be used in applications such as the study of protein folding and drug design. The most time-consuming and inefficient routines in MD packages, particularly for large systems, are the ones involving the computation of intermolecular energy and forces for each molecule. Many fully atomistic systems such as CHARMM and NAMD have been refined over the years to improve their efficiency. But, simulating complex long-time events such as protein folding remains out reach for atomistic simulations. The consensus view amongst computational chemists and biologists is that the development of a coarse-grained (CG) MD package will make the long timescales required for protein folding simulations possible. The shortcoming of this method remains an inability to produce accurate dynamics and results that are comparable with atomistic simulations. It is the objective of this dissertation to develop a coarse-grained method that is computationally faster than atomistic simulations, while being dynamically accurate enough to produce structural and transport property data comparable to results from the latter. Firstly, the accuracy of the Gay-Berne potential in modelling liquid benzene in comparison to fully atomistic simulations was investigated. Following this, the speed of a course-grained condensed phase benzene simulation employing a Gay-Berne potential was compared with that of a fully atomistic simulation. While coarse-graining algorithmically reduces the total number of particles in consideration, the execution time and efficiency scales poorly for large systems. Both fully-atomistic and coarse-grained developers have accelerated packages using high-performance parallel computing platforms such as multi-core CPU clusters, Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs). GPUs have especially gained popularity in recent years due to their massively parallel architecture on a single chip, making them a cheaper alternative to a CPU cluster. Their relatively shorter development time also gives them an advantage over FPGAs. NAMD is perhaps the most popular MD package that employs efficient use of a single GPU or a multi-GPU cluster to conduct simulations. The Scientific Computing Research Unit’s in-house generalised CG code, the Free Energy Force Induced (FEFI) coarse-grained MD package, was accelerated using a GPU to investigate the achievable speed-up in comparison to the CPU algorithm. To achieve this, a parallel version of the sequential force routine, i.e. the computation of the energy, force and torque per molecule, was developed and implemented on a GPU. The GPU-accelerated FEFI package was then used to simulate benzene, which is almost exclusively governed by van der Waal’s forces (i.e. dispersion effects), using the parameters for the Gay-Berne potential from a study by Golubkov and Ren in their work “Generalized coarse-grained model based on point multipole and Gay-Berne potentials”. The coarse-grained condensed phase structural properties, such as the radial and orientational distribution functions, proved to be inaccurate. Further, the transport properties such as diffusion were significantly more unsatisfactory compared to a CHARMM simulation. From this, a conclusion was reached that the Gay-Berne potential was not able to model the subtle effects of dispersion as observed in liquid benzene. In place of the analytic Gay-Berne potential, a more accurate approach would be to use a multidimensional free energy-based potential. Using the Free Energy from Adaptive Reaction Coordinate Forces (FEARCF) method, a four-dimensional Free Energy Volume (FEV) for two interacting benzene molecules was computed for liquid benzene. The focal point of this dissertation was to use this FEV as the coarse-grained interaction potential in FEFI to conduct CG simulations of condensed phase liquid benzene. The FEV can act as a numerical potential or Look-Up Table (LUT) from which the interaction energy and four partial derivatives required to compute the forces and torques can be obtained via numerical methods at each step of the CG MD simulation. A significant component of this dissertation was the development and implementation of four-dimensional LUT routines to use the FEV for accurate condensed phase coarse-grained simulations. To compute the energy and partial derivatives between the grid points of the surface, an interpolation algorithm was required. A four-dimensional cubic B-spline interpolation was developed because of the method’s superior accuracy and resistance to oscillations compared with other polynomial interpolation methods. When The algorithm’s introduction into the FEFI CG MD package for CPUs exhausted the single-core CPU architecture with its large number of interpolations for each MD step. It was therefore impractical for the high throughput interpolation required for MD simulations. The 4D cubic B-spline algorithm and the LUT routine were then developed and implemented on a GPU. Following evaluation, the LUT was integrated into the FEFI MD simulation package. A FEFI CG simulation of liquid benzene was run using the 4D FEV for a benzene molecular pair as the numerical potential. The structural and transport properties outperformed the analytical Gay-Berne CG potential, more closely approximating the atomistic predicted properties. The work done in this dissertation demonstrates the feasibility of a coarse-grained simulation using a free energy volume as a numerical potential to accurately simulate dispersion effects, a key feature needed for protein folding.
90	Simulation des réseaux à grande échelle sur les architectures de calculs hétérogènes / Large-scale network simulation over heterogeneous computing architecture Ben Romdhanne, Bilel 16 December 2013 (has links) La simulation est une étape primordiale dans l'évolution des systèmes en réseaux. L’évolutivité et l’efficacité des outils de simulation est une clef principale de l’objectivité des résultats obtenue, étant donné la complexité croissante des nouveaux des réseaux sans-fils. La simulation a évènement discret est parfaitement adéquate au passage à l'échelle, cependant les architectures logiciel existantes ne profitent pas des avancées récente du matériel informatique comme les processeurs parallèle et les coprocesseurs graphique. Dans ce contexte, l'objectif de cette thèse est de proposer des mécanismes d'optimisation qui permettent de surpasser les limitations des approches actuelles en combinant l’utilisation des ressources de calcules hétérogène. Pour répondre à la problématique de l’efficacité, nous proposons de changer la représentation d'événement, d'une représentation bijective (évènement-descripteur) à une représentation injective (groupe d'évènements-descripteur). Cette approche permet de réduire la complexité de l'ordonnancement d'une part et de maximiser la capacité d'exécuter massivement des évènements en parallèle d'autre part. Dans ce sens, nous proposons une approche d'ordonnancement d'évènements hybride qui se base sur un enrichissement du descripteur pour maximiser le degré de parallélisme en combinons la capacité de calcule du CPU et du GPU dans une même simulation. Les résultats comparatives montre un gain en terme de temps de simulation de l’ordre de 100x en comparaison avec une exécution équivalente sur CPU uniquement. Pour répondre à la problématique d’évolutivité du système, nous proposons une nouvelle architecture distribuée basée sur trois acteurs. / The simulation is a primary step on the evaluation process of modern networked systems. The scalability and efficiency of such a tool in view of increasing complexity of the emerging networks is a key to derive valuable results. The discrete event simulation is recognized as the most scalable model that copes with both parallel and distributed architecture. Nevertheless, the recent hardware provides new heterogeneous computing resources that can be exploited in parallel.The main scope of this thesis is to provide a new mechanisms and optimizations that enable efficient and scalable parallel simulation using heterogeneous computing node architecture including multicore CPU and GPU. To address the efficiency, we propose to describe the events that only differs in their data as a single entry to reduce the event management cost. At the run time, the proposed hybrid scheduler will dispatch and inject the events on the most appropriate computing target based on the event descriptor and the current load obtained through a feedback mechanisms such that the hardware usage rate is maximized. Results have shown a significant gain of 100 times compared to traditional CPU based approaches. In order to increase the scalability of the system, we propose a new simulation model, denoted as general purpose coordinator-master-worker, to address jointly the challenge of distributed and parallel simulation at different levels. The performance of a distributed simulation that relies on the GP-CMW architecture tends toward the maximal theoretical efficiency in a homogeneous deployment. The scalability of such a simulation model is validated on the largest European GPU-based supercomputer Calcul parallèle CPU/GPU Parallel computing CPU/GPU

Search results