Global ETD Search

171	The Thermal-Constrained Real-Time Systems Design on Multi-Core Platforms -- An Analytical Approach SHA, SHI 21 March 2018 (has links) Over the past decades, the shrinking transistor size enabled more transistors to be integrated into an IC chip, to achieve higher and higher computing performances. However, the semiconductor industry is now reaching a saturation point of Moore’s Law largely due to soaring power consumption and heat dissipation, among other factors. High chip temperature not only significantly increases packing/cooling cost, degrades system performance and reliability, but also increases the energy consumption and even damages the chip permanently. Although designing 2D and even 3D multi-core processors helps to lower the power/thermal barrier for single-core architectures by exploring the thread/process level parallelism, the higher power density and longer heat removal path has made the thermal problem substantially more challenging, surpassing the heat dissipation capability of traditional cooling mechanisms such as cooling fan, heat sink, heat spread, etc., in the design of new generations of computing systems. As a result, dynamic thermal management (DTM), i.e. to control the thermal behavior by dynamically varying computing performance and workload allocation on an IC chip, has been well-recognized as an effective strategy to deal with the thermal challenges. Over the past decades, the shrinking transistor size, benefited from the advancement of IC technology, enabled more transistors to be integrated into an IC chip, to achieve higher and higher computing performances. However, the semiconductor industry is now reaching a saturation point of Moore’s Law largely due to soaring power consumption and heat dissipation, among other factors. High chip temperature not only significantly increases packing/cooling cost, degrades system performance and reliability, but also increases the energy consumption and even damages the chip permanently. Although designing 2D and even 3D multi-core processors helps to lower the power/thermal barrier for single-core architectures by exploring the thread/process level parallelism, the higher power density and longer heat removal path has made the thermal problem substantially more challenging, surpassing the heat dissipation capability of traditional cooling mechanisms such as cooling fan, heat sink, heat spread, etc., in the design of new generations of computing systems. As a result, dynamic thermal management (DTM), i.e. to control the thermal behavior by dynamically varying computing performance and workload allocation on an IC chip, has been well-recognized as an effective strategy to deal with the thermal challenges. Different from many existing DTM heuristics that are based on simple intuitions, we seek to address the thermal problems through a rigorous analytical approach, to achieve the high predictability requirement in real-time system design. In this regard, we have made a number of important contributions. First, we develop a series of lemmas and theorems that are general enough to uncover the fundamental principles and characteristics with regard to the thermal model, peak temperature identification and peak temperature reduction, which are key to thermal-constrained real-time computer system design. Second, we develop a design-time frequency and voltage oscillating approach on multi-core platforms, which can greatly enhance the system throughput and its service capacity. Third, different from the traditional workload balancing approach, we develop a thermal-balancing approach that can substantially improve the energy efficiency and task partitioning feasibility, especially when the system utilization is high or with a tight temperature constraint. The significance of our research is that, not only can our proposed algorithms on throughput maximization and energy conservation outperform existing work significantly as demonstrated in our extensive experimental results, the theoretical results in our research are very general and can greatly benefit other thermal-related research. thermal power multi-core real-time temperature throughput CPU GPU embedded systems low-power design energy operating systems computer architecture Computer and Systems Architecture Hardware Systems Other Computer Engineering Power and Energy
172	DÉVELOPPEMENT D'UNE MÉTHODE IMPLICITE SANS MATRICE POUR LA SIMULATION 2D-3D DES ÉCOULEMENTS COMPRESSIBLES ET FAIBLEMENT COMPRESSIBLES EN MAILLAGES NON-STRUCTURÉS Kloczko, Thibaud 15 March 2006 (has links) (PDF) Les calculs d'écoulements stationnaires peuvent être considérés comme efficace si l?état stationnaire est atteint pour un temps CPU réduit mais aussi si la place mémoire utilisée reste faible; cette dernière exigence devient primordiale pour les applications industrielles où le nombre de points de calcul est très important. Ceci vaut également pour les écoulements instationnaires, désormais classiquement résolus via une approche pas-de-temps dual pour laquelle les états physiques successifs sont vus comme des états stationnaires vis-à-vis d'un temps fictif. Le besoin crucial de méthodes implicites à faible encombrement mémoire a conduit au développement de traitements sans matrice. Pour les applications qui intéressent le CEA, à savoir la simulation d'écoulements réactifs multi-espèces à l'intérieur d'une enceinte de réacteur nucléaire à eau pressurisée, les méthodes doivent être assez versatiles pour traiter la gamme d'écoulements allant du quasi-incompressible au fortement compressible. Le préconditionnement bas-Mach des équations de Navier-Stokes permet d'appliquer en régime incompressible les schémas initialement conçus pour la simulation des écoulements compressibles. Le présent travail montre comment obtenir un traitement implicite sans matrice pour tout régime d'écoulement lorsque la phase implicite contient une matrice de préconditionnement; l'efficacité intrinsèque du schéma implicite sans matrice couplé à une technique de relaxation de type Jacobi par point (PJ) ou Symmetric Gauss-Seidel (SGS) est étudiée grâce à une analyse de Von Neumann; puis des comparaisons avec des méthodes implicites blocs standards sont effectuées. La méthode implicite sans matrice est finalement implémentée au sein du code non-structuré CAST3M et elle est appliquée à la modélisation d'un Té de mélange à faible nombre de Mach. Le schéma implicite sans matrice constitue une alternative compétitive pour la simulation des écoulements compressibles et faiblement compressibles en maillages non-structurés. Méthode implicite sans matrice préconditionnement bas-Mach analyse de Von Neumann relaxation PJ et SGS maillages<br />non-structurés efficacité en temps CPU faible encombrement mémoire
173	Tuned and asynchronous stencil kernels for CPU/GPU systems Venkatasubramanian, Sundaresan 18 May 2009 (has links) We describe heterogeneous multi-CPU and multi-GPU implementations of Jacobi's iterative method for the 2-D Poisson equation on a structured grid, in both single- and double-precision. Properly tuned, our best implementation achieves 98% of the empirical streaming GPU bandwidth (66% of peak) on a NVIDIA C1060. Motivated to find a still faster implementation, we further consider "wildly asynchronous" implementations that can reduce or even eliminate the synchronization bottleneck between iterations. In these versions, which are based on the principle of a chaotic relaxation (Chazan and Miranker, 1969), we simply remove or delay synchronization between iterations, thereby potentially trading off more flops (via more iterations to converge) for a higher degree of asynchronous parallelism. Our relaxed-synchronization implementations on a GPU can be 1.2-2.5x faster than our best synchronized GPU implementation while achieving the same accuracy. Looking forward, this result suggests research on similarly "fast-and-loose" algorithms in the coming era of increasingly massive concurrency and relatively high synchronization or communication costs. Hybrid High performance computing Architecture Chaotic relaxation Tesla Linear system of equations Numerical methods Occupancy Algorithms Experimentation Performance Scientific computing Gauss siedel Shared memory Coalesced memory Bank conflicts GPU CUDA Nvidia Heterogenous CPU Iterative methods (Mathematics) Kernel functions
174	Finite Element Approximations of 2D Incompressible Navier-Stokes Equations Using Residual Viscosity Sjösten, William, Vadling, Victor January 2018 (has links) Chorin’s method, Incremental Pressure Correction Scheme (IPCS) and Crank-Nicolson’s method (CN) are three numerical methods that were investigated in this study. These methods were here used for solving the incompressible Navier-Stokes equations, which describe the motion of an incompressible fluid, in three different benchmark problems. The methods were stabilized using residual based artificial viscosity, which was introduced to avoid instability. The methods were compared in terms of accuracy and computational time. Furthermore, a theoretical study of adaptivity was made, based on an a posteriori error estimate and an adjoint problem. The implementation of the adaptivity is left for future studies. In this study we consider the following three well-known benchmark problems: laminar 2D flow around a cylinder, Taylor-Green vortex and lid-driven cavity problem. The difference of the computational time for the three methods were in general relatively small and differed depending on which problem that was investigated. Furthermore the accuracy of the methods also differed in the benchmark problems, but in general Crank-Nicolson’s method gave less accurate results. Moreover the stabilization technique worked well when the kinematic viscosity of the fluid was relatively low, since it managed to stabilize the numerical methods. In general the solution was affected in a negative way when the problem could be solved without stabilization for higher viscosities. Finite element method incompressible Navier-Stokes equations residual based artificial viscosity stabilization adaptive method Chorin's method IPCS Crank-Nicholson's method benchmark problems CPU-time accuracy Engineering and Technology Teknik och teknologier
175	Mitteilungen des URZ 2/2002 Heik,, Wegener,, Ziegler,, Jehmlich,, Krause,, Horbach,, Riedel,, Hübsch,, Wegener, J., Dippmann,, Brose,, Heide,, Fischer, 30 August 2002 (has links) Mitteilungen des URZ 2/2002 (Inhalt siehe Schlagwörter) Peer-to-Peer Filesharing DynShaper LaTeX Verteilte Datensicherung Chemnitzer Studentennetz (CSN) Archivieren (MONARCH) TWiki (Gruppenarbeitswerkzeug) CD-Service Konfiguration von Shells wxWindows/wxPython CPU-Server FTP-Service ddc:004 Windows NT Freenet
176	Valorisation d’options américaines et Value At Risk de portefeuille sur cluster de GPUs/CPUs hétérogène / American option pricing and computation of the portfolio Value at risk on heterogeneous GPU-CPU cluster Benguigui, Michaël 27 August 2015 (has links) Le travail de recherche décrit dans cette thèse a pour objectif d'accélérer le temps de calcul pour valoriser des instruments financiers complexes, tels des options américaines sur panier de taille réaliste (par exemple de 40 sousjacents), en tirant partie de la puissance de calcul parallèle qu'offrent les accélérateurs graphiques (Graphics Processing Units). Dans ce but, nous partons d'un travail précédent, qui avait distribué l'algorithme de valorisation de J.Picazo, basé sur des simulations de Monte Carlo et l'apprentissage automatique. Nous en proposons une adaptation pour GPU, nous permettant de diviser par 2 le temps de calcul de cette précédente version distribuée sur un cluster de 64 cœurs CPU, expérimentée pour valoriser une option américaine sur 40 actifs. Cependant, le pricing de cette option de taille réaliste nécessite quelques heures de calcul. Nous étendons donc ce premier résultat dans le but de cibler un cluster de calculateurs, hétérogènes, mixant GPUs et CPUs, via OpenCL. Ainsi, nous accélérons fortement le temps de valorisation, même si les entrainements des différentes méthodes de classification expérimentées (AdaBoost, SVM) sont centralisés et constituent donc un point de blocage. Pour y remédier, nous évaluons alors l'utilisation d'une méthode de classification distribuée, basée sur l'utilisation de forêts aléatoires, rendant ainsi notre approche extensible. La dernière partie réutilise ces deux contributions dans le cas de calcul de la Value at Risk d’un portefeuille d'options, sur cluster hybride hétérogène. / The research work described in this thesis aims at speeding up the pricing of complex financial instruments, like an American option on a realistic size basket of assets (e.g. 40) by leveraging the parallel processing power of Graphics Processing Units. To this aim, we start from a previous research work that distributed the pricing algorithm based on Monte Carlo simulation and machine learning proposed by J. Picazo. We propose an adaptation of this distributed algorithm to take advantage of a single GPU. This allows us to get performances using one single GPU comparable to those measured using a 64 cores cluster for pricing a 40-assets basket American option. Still, on this realistic-size option, the pricing requires a handful of hours. Then we extend this first contribution in order to tackle a cluster of heterogeneous devices, both GPUs and CPUs programmed in OpenCL, at once. Doing this, we are able to drastically accelerate the option pricing time, even if the various classification methods we experiment with (AdaBoost, SVM) constitute a performance bottleneck. So, we consider instead an alternate, distributable approach, based upon Random Forests which allow our approach to become more scalable. The last part reuses these two contributions to tackle the Value at Risk evaluation of a complete portfolio of financial instruments, on a heterogeneous cluster of GPUs and CPUs. Parallélisme Distribution GPGPU OpenCL Mathématiques financières Calcul de risque Option américaine Monte-Carlo Apprentissage automatique Parallel computing Distributed computed GPGPU OpenCL Hybrid GPU-CPU cluster Finacial mathematics Risk American option Monte-Carlo Machine learning
177	Realizace superpočítače pomocí grafické karty / Realization of supercomputer using graphic card Jasovský, Filip January 2014 (has links) This master´s thesis deals with realization of supercomputer using graphic card with CUDA technology. The theoretical part of this thesis describes the function and the possibility of graphic cards and desktop computers and processes taking place in the proces sof calculations on them. The practical part deals with creation system for calculations on the graphic card using the algorithm of artificial intelligence, more specifically artificial neural networks. Subsequently is the generated program used for data classification of large input data file. Finally the results are compared.
178	Mitteilungen des URZ 2/2002 Heik, Wegener, Ziegler, Jehmlich, Krause, Horbach, Riedel, Hübsch, Wegener, J., Dippmann, Brose, Heide, Fischer 30 August 2002 (has links) Mitteilungen des URZ 2/2002 (Inhalt siehe Schlagwörter) wxWindows/wxPython CPU-Server FTP-Service info:eu-repo/classification/ddc/004 ddc:004 Windows NT; Freenet
179	Investigation of hierarchical deep neural network structure for facial expression recognition Motembe, Dodi 01 1900 (has links) Facial expression recognition (FER) is still a challenging concept, and machines struggle to comprehend effectively the dynamic shifts in facial expressions of human emotions. The existing systems, which have proven to be effective, consist of deeper network structures that need powerful and expensive hardware. The deeper the network is, the longer the training and the testing. Many systems use expensive GPUs to make the process faster. To remedy the above challenges while maintaining the main goal of improving the accuracy rate of the recognition, we create a generic hierarchical structure with variable settings. This generic structure has a hierarchy of three convolutional blocks, two dropout blocks and one fully connected block. From this generic structure we derived four different network structures to be investigated according to their performances. From each network structure case, we again derived six network structures in relation to the variable parameters. The variable parameters under analysis are the size of the filters of the convolutional maps and the max-pooling as well as the number of convolutional maps. In total, we have 24 network structures to investigate, and six network structures per case. After simulations, the results achieved after many repeated experiments showed in the group of case 1; case 1a emerged as the top performer of that group, and case 2a, case 3c and case 4c outperformed others in their respective groups. The comparison of the winners of the 4 groups indicates that case 2a is the optimal structure with optimal parameters; case 2a network structure outperformed other group winners. Considerations were done when choosing the best network structure, considerations were; minimum accuracy, average accuracy and maximum accuracy after 15 times of repeated training and analysis of results. All 24 proposed network structures were tested using two of the most used FER datasets, the CK+ and the JAFFE. After repeated simulations the results demonstrate that our inexpensive optimal network architecture achieved 98.11 % accuracy using the CK+ dataset. We also tested our optimal network architecture with the JAFFE dataset, the experimental results show 84.38 % by using just a standard CPU and easier procedures. We also compared the four group winners with other existing FER models performances recorded recently in two studies. These FER models used the same two datasets, the CK+ and the JAFFE. Three of our four group winners (case 1a, case 2a and case 4c) recorded only 1.22 % less than the accuracy of the top performer model when using the CK+ dataset, and two of our network structures, case 2a and case 3c came in third, beating other models when using the JAFFE dataset. / Electrical and Mining Engineering Facial Expression Recognition (FER) Deep Learning Convolutional Neural Network (CNN) Deep Convolutional Neural Network (DCNN) Artificial Intelligence Face Detection Facial Feature Extraction Central Processing Unit (CPU) Graphics Processing Unit (GPU)
180	The Cost of Confidentiality in Cloud Storage Henziger, Eric January 2018 (has links) Cloud storage services allow users to store and access data in a secure and flexible manner. In recent years, cloud storage services have seen rapid growth in popularity as well as in technological progress and hundreds of millions of users use these services to store thousands of petabytes of data. Additionally, the synchronization of data that is essential for these types of services stands for a significant amount of the total internet traffic. In this thesis, seven cloud storage applications were tested under controlled experiments during the synchronization process to determine feature support and measure performance metrics. Special focus was put on comparing applications that perform client side encryption of user data to applicationsthat do not. The results show a great variation in feature support and performance between the different applications and that client side encryption introduces some limitations to other features but that it does not necessarily impact performance negatively. The results provide insights and enhances the understanding of the advantages and disadvantages that come with certain design choices of cloud storage applications. These insights will help future technological development of cloud storage services. cloud storage file synchronization client side encryption compression deduplication delta encoding cpu utilization memory utilization performance measurements dropbox google drive onedrive tresorit spideroak mega sync.com macOS comparison Computer Sciences Datavetenskap (datalogi)

Search results