Global ETD Search

421	Jack Rabbit : an effective Cell BE programming system for high performance parallelism Ellis, Apollo Isaac Orion 08 July 2011 (has links) The Cell processor is an example of the trade-offs made when designing a mass market power efficient multi-core machine, but the machine-exposing architecture and raw communication mechanisms of Cell are hard to manage for a programmer. Cell's design is simple and causes software complexity to go up in the areas of achieving low threading overhead, good bandwidth efficiency, and load balance. Several attempts have been made to produce efficient and effective programming systems for Cell, but the attempts have been too specialized and thus fall short. We present Jack Rabbit, an efficient thread pool work queue implementation, with load balancing mechanisms and double buffering. Our system incurs low threading overhead, gets good load balance, and achieves bandwidth efficiency. Our system represents a step towards an effective way to program Cell and any similar current or future processors. / text Cell processor Multi-core systems High performance computing Runtime Barnes Hut LU factorization Mandelbrot Double buffering Thread pool Work queue Load balance
422	Ordonnancement de E/S transversal : des applications à des dispositifs / Transversal I/O Scheduling : from Applications to Devices / Escalonamento de E/S Transversal para Sistemas de Arquivos Paralelos : das Aplicações aos Dispositivos Zanon Boito, Francieli 30 March 2015 (has links) Ordonnancement d’E/S Transversal pour les Systèmes de Fichiers Parallèles : desApplications aux DispositifsCette thèse porte sur l’utilisation de l’ordonnancement d’Entrées/Sorties (E/S) pour atténuer leseffets d’interférence et améliorer la performance d’E/S des systèmes de fichiers parallèles. Ilest commun pour les plates-formes de calcul haute performance (HPC) de fournir une infrastructurede stockage partagée pour les applications qui y sont hébergées. Dans cette situation,où plusieurs applications accèdent simultanément au système de fichiers parallèle partagé, leursaccès vont souffrir de l’interférence, ce qui compromet l’efficacité des stratégies d’optimisationd’E/S.Nous avons évalué la performance de cinq algorithmes d’ordonnancement dans les serveurs dedonnées d’un système de fichiers parallèle. Ces tests ont été exécutés sur différentes platesformeset sous différents modèles d’accès. Les résultats indiquent que la performance des ordonnanceursest affectée par les modèles d’accès des applications, car il est important pouraméliorer la performance obtenue grâce à un algorithme d’ordonnancement de surpasser sessurcoûts. En même temps, les résultats des ordonnanceurs sont affectés par les caractéristiquesdu système d’E/S sous-jacent - en particulier par des dispositifs de stockage. Différents dispositifsprésentent des niveaux de sensibilité à la séquentialité et la taille des accès distincts, ce quipeut influencer sur le niveau d’amélioration de obtenue grâce à l’ordonnancement d’E/S.Pour ces raisons, l’objectif principal de cette thèse est de proposer un modèle d’ordonnancementd’E/S avec une double adaptabilité : aux applications et aux dispositifs. Nous avons extraitdes informations sur les modèles d’accès des applications en utilisant des fichiers de trace,obtenus à partir de leurs exécutions précédentes. Ensuite, nous avons utilisé de l’apprentissageautomatique pour construire un classificateur capable d’identifier la spatialité et la taille desaccès à partir du flux de demandes antérieures. En outre, nous avons proposé une approche pourobtenir efficacement le ratio de débit séquentiel et aléatoire pour les dispositifs de stockage enexécutant des benchmarks pour un sous-ensemble des paramètres et en estimant les restantsavec des régressions linéaires.Nous avons utilisé les informations sur les caractéristiques des applications et des dispositifsde stockage pour décider automatiquement l’algorithme d’ordonnancement le plus appropriéen utilisant des arbres de décision. Notre approche améliore les performances jusqu’à 75% parrapport à une approche qui utilise le même algorithme d’ordonnancement dans toutes les situations,sans capacité d’adaptation. De plus, notre approche améliore la performance dans 64%de scénarios en plus, et diminue les performances dans 89% moins de situations. Nos résultatsmontrent que les deux aspects - des applications et des dispositifs - sont essentiels pour faire desbons choix d’ordonnancement. En outre, malgré le fait qu’il n’y a pas d’algorithme d’ordonnancementqui fournit des gains de performance pour toutes les situations, nous montrons queavec la double adaptabilité il est possible d’appliquer des techniques d’ordonnancement d’E/Spour améliorer la performance, tout en évitant les situations où cela conduirait à une diminutionde performance. / This thesis focuses on I/O scheduling as a tool to improve I/O performance on parallel file systemsby alleviating interference effects. It is usual for High Performance Computing (HPC)systems to provide a shared storage infrastructure for applications. In this situation, when multipleapplications are concurrently accessing the shared parallel file system, their accesses willaffect each other, compromising I/O optimization techniques’ efficacy.We have conducted an extensive performance evaluation of five scheduling algorithms at aparallel file system’s data servers. Experiments were executed on different platforms and underdifferent access patterns. Results indicate that schedulers’ results are affected by applications’access patterns, since it is important for the performance improvement obtained througha scheduling algorithm to surpass its overhead. At the same time, schedulers’ results are affectedby the underlying I/O system characteristics - especially by storage devices. Differentdevices present different levels of sensitivity to accesses’ sequentiality and size, impacting onhow much performance is improved through I/O scheduling.For these reasons, this thesis main objective is to provide I/O scheduling with double adaptivity:to applications and devices. We obtain information about applications’ access patternsthrough trace files, obtained from previous executions. We have applied machine learning tobuild a classifier capable of identifying access patterns’ spatiality and requests size aspects fromstreams of previous requests. Furthermore, we proposed an approach to efficiently obtain thesequential to random throughput ratio metric for storage devices by running benchmarks for asubset of the parameters and estimating the remaining through linear regressions.We use this information on applications’ and storage devices’ characteristics to decide the bestfit in scheduling algorithm though a decision tree. Our approach improves performance byup to 75% over an approach that uses the same scheduling algorithm to all situations, withoutadaptability. Moreover, our approach improves performance for up to 64% more situations, anddecreases performance for up to 89% less situations. Our results evidence that both aspects- applications and storage devices - are essential for making good scheduling choices. Moreover,despite the fact that there is no scheduling algorithm able to provide performance gainsfor all situations, we show that through double adaptivity it is possible to apply I/O schedulingtechniques to improve performance, avoiding situations where it would lead to performanceimpairment. / Esta tese se concentra no escalonamento de operações de entrada e saída (E/S) como uma soluçãopara melhorar o desempenho de sistemas de arquivos paralelos, aleviando os efeitos dainterferência. É usual que sistemas de computação de alto desempenho (HPC) ofereçam umainfraestrutura compartilhada de armazenamento para as aplicações. Nessa situação, em quemúltiplas aplicações acessam o sistema de arquivos compartilhado de forma concorrente, osacessos das aplicações causarão interferência uns nos outros, comprometendo a eficácia de técnicaspara otimização de E/S.Uma avaliação extensiva de desempenho foi conduzida, abordando cinco algoritmos de escalonamentotrabalhando nos servidores de dados de um sistema de arquivos paralelo. Foramexecutados experimentos em diferentes plataformas e sob diferentes padrões de acesso. Osresultados indicam que os resultados obtidos pelos escalonadores são afetados pelo padrão deacesso das aplicações, já que é importante que o ganho de desempenho provido por um algoritmode escalonamento ultrapasse o seu sobrecusto. Ao mesmo tempo, os resultados doescalonamento são afetados pelas características do subsistema local de E/S - especialmentepelos dispositivos de armazenamento. Dispositivos diferentes apresentam variados níveis desensibilidade à sequencialidade dos acessos e ao seu tamanho, afetando o quanto técnicas deescalonamento de E/S são capazes de aumentar o desempenho.Por esses motivos, o principal objetivo desta tese é prover escalonamento de E/S com duplaadaptabilidade: às aplicações e aos dispositivos. Informações sobre o padrão de acesso dasaplicações são obtidas através de arquivos de rastro, vindos de execuções anteriores. Aprendizadode máquina foi aplicado para construir um classificador capaz de identificar os aspectosespacialidade e tamanho de requisição dos padrões de acesso através de fluxos de requisiçõesanteriores. Além disso, foi proposta uma técnica para obter eficientemente a razão entre acessossequenciais e aleatórios para dispositivos de armazenamento, executando testes para apenas umsubconjunto dos parâmetros e estimando os demais através de regressões lineares.Essas informações sobre características de aplicações e dispositivos de armazenamento são usadaspara decidir a melhor escolha em algoritmo de escalonamento através de uma árvore dedecisão. A abordagem proposta aumenta o desempenho em até 75% sobre uma abordagem queusa o mesmo algoritmo para todas as situações, sem adaptabilidade. Além disso, essa técnicamelhora o desempenho para até 64% mais situações, e causa perdas de desempenho em até 89%menos situações. Os resultados obtidos evidenciam que ambos aspectos - aplicações e dispositivosde armazenamento - são essenciais para boas decisões de escalonamento. Adicionalmente,apesar do fato de não haver algoritmo de escalonamento capaz de prover ganhos de desempenhopara todas as situações, esse trabalho mostra que através da dupla adaptabilidade é possívelaplicar técnicas de escalonamento de E/S para melhorar o desempenho, evitando situações emque essas técnicas prejudicariam o desempenho. Ordonnancement d’E/S Systèmes de Fichiers Parallèles Calcul Haute Performance I/O Scheduling Parallel File Systems High Performance Computing Escalonamento de E/S Sistemas de Arquivos Paralelos Computação de Alto Desempenho. 004
423	Exploiting parallel features of modern computer architectures in bioinformatics : applications to genetics, structure comparison and large graph analysis Chapuis, Guillaume 18 December 2013 (has links) (PDF) The exponential growth in bioinformatics data generation and the stagnation of processor frequencies in modern processors stress the need for efficient implementations that fully exploit the parallel capabilities offered by modern computers. This thesis focuses on parallel algorithms and implementations for bioinformatics problems. Various types of parallelism are described and exploited. This thesis presents applications in genetics with a GPU parallel tool for QTL detection, in protein structure comparison with a multicore parallel tool for finding similar regions between proteins, and large graph analysis with a multi-GPU parallel implementation for a novel algorithm for the All-Pairs Shortest Path problem. Bioinformatics High performance computing
424	High-performance memory system architectures using data compression Baek, Seungcheol 22 May 2014 (has links) The Chip Multi-Processor (CMP) paradigm has cemented itself as the archetypal philosophy of future microprocessor design. Rapidly diminishing technology feature sizes have enabled the integration of ever-increasing numbers of processing cores on a single chip die. This abundance of processing power has magnified the venerable processor-memory performance gap, which is known as the ”memory wall”. To bridge this performance gap, a high-performing memory structure is needed. An attractive solution to overcoming this processor-memory performance gap is using compression in the memory hierarchy. In this thesis, to use compression techniques more efficiently, compressed cacheline size information is studied, and size-aware cache management techniques and hot-cacheline prediction for dynamic early decompression technique are proposed. Also, the proposed works in this thesis attempt to mitigate the limitations of phase change memory (PCM) such as low write performance and limited long-term endurance. One promising solution is the deployment of hybridized memory architectures that fuse dynamic random access memory (DRAM) and PCM, to combine the best attributes of each technology by using the DRAM as an off-chip cache. A dual-phase compression technique is proposed for high-performing DRAM/PCM hybrid environments and a multi-faceted wear-leveling technique is proposed for the long-term endurance of compressed PCM. This thesis also includes a new compression-based hybrid multi-level cell (MLC)/single-level cell (SLC) PCM management technique that aims to combine the performance edge of SLCs with the higher capacity of MLCs in a hybrid environment. Memory systems Cache compression Cache replacement Hybrid DRAM/PCM Data compression (Computer science) High performance computing Computer storage devices Cache memory
425	ZIH-Info 01 August 2014 (has links) (PDF) - BigData – Nationale Kompetenzzentren - WebCMS mit ZIH-Login - Terabit-Testbed für die Wissenschaft - Baumaßnahmen zur Datennetzmodernisierung - Virtuelle Desktops - ZIH-Kolloquium - Informatik@Girls: Logisch passt das! - Neue ZIH-Publikationen - Veranstaltungen ZIH Rechenzentrum data processing center ddc:004 ddc:621.39 rvk:AL rvk:SQ
426	ZIH-Info 01 August 2014 (has links) (PDF) - Deutscher Rechenzentrumspreis für TU Dresden - Probleme mit PKI-Zertifikaten - Speicherplatz für HPC-Projekte - Modellierungssoftware Morpheus - HPC-Wartung - Neuer SPEC Benchmark - ZIH-Kolloquium - Storage Summit 2014 - Neue ZIH-Publikationen - Veranstaltungen ZIH Rechenzentrum data processing center ddc:004 ddc:621.39 rvk:AL rvk:SQ
427	ZIH-Info 01 August 2014 (has links) (PDF) - Betriebssystem bei PC-Neuanschaffungen - Verwendung von Microsoft Office 365 - Altix zur Hälfte abgeschaltet - ZIH auf der ISC'14 - ZIH-Kolloquium - Lange Nacht der Wissenschaften 2014 - Mobilfunkverträge an der TU Dresden - Neue ZIH-Publikationen - Veranstaltungen ZIH Rechenzentrum data processing center ddc:004 ddc:621.39 rvk:AL rvk:SQ
428	ZIH-Info 01 August 2014 (has links) (PDF) - Wartungsarbeiten am Voice-over-IP-System - Zentraler SharePoint-Dienst - Neuer Cloudstore am ZIH - Kooperation mit Indiana University - 2. Runde - ZIH-Kolloquium - Internationale Energieeffizienz-Konferenz - Zentraler Backup-Service am ZIH - Neue ZIH-Publikationen - Veranstaltungen ZIH Rechenzentrum data processing center ddc:004 ddc:621.39 rvk:AL rvk:SQ
429	ROSENET: a remote server-based network emulation system Gu, Yan 08 January 2008 (has links) Network emulation has been widely used to aid in the development and evaluation of real-time applications. Many of today s applications and protocols need to be tested and evaluated in large scale network environments such as the Internet, which requires emulation tools that meet the requirements of scale, accuracy, timeliness. Due to physical resource constraints in network emulators, existing emulation tools fail to meet these requirements as they are either limited to small and static networks, use simplified network models, or fail to deliver timely emulation results. If more physical resources are devoted to network emulation by utilizing high performance computing facilities, the accuracy and scalability of network emulation can be greatly improved. However, for many users, high performance computing facilities may not be readily available in a local laboratory environment, and co-locating application code with a remote high performance computing facility may be cumbersome and inconvenient. This thesis proposes a network emulation approach called ROSENET (RemOte SErver-based Network EmulaTion) that utilizes a distributed server-based architecture in which local low-fidelity emulators provide real-time QoS predictions to distributed applications, coupled with a remote large scale high-fidelity simulator that continuously updates and calibrates the local low-fidelity emulators. A library-based modeling approach based on online simulation data collection is proposed and a system identification modeling technique is presented. Experimental results examining emulation end-to-end delay and loss show that ROSENET provides a promising approach to network emulation supporting accuracy and scale while meeting real-time constraints. Challenges faced in applying ROSENET to real world applications are addressed through two case studies including applying synthetic workload on DARPA s NMS network topology for large scale network simulation and a contemporary real-time distributed VoIP application Skype. Discrete-event simulation Performance evaluation Network simulation Network emulation High performance computing Parallel and distributed simulation Client/server Emulators (Computer programs) Computer networks--Computer simulation
430	Efficient and Reliable Simulation of Quantum Molecular Dynamics Kormann, Katharina January 2012 (has links) The time-dependent Schrödinger equation (TDSE) models the quantum nature of molecular processes. Numerical simulations based on the TDSE help in understanding and predicting the outcome of chemical reactions. This thesis is dedicated to the derivation and analysis of efficient and reliable simulation tools for the TDSE, with a particular focus on models for the interaction of molecules with time-dependent electromagnetic fields. Various time propagators are compared for this setting and an efficient fourth-order commutator-free Magnus-Lanczos propagator is derived. For the Lanczos method, several communication-reducing variants are studied for an implementation on clusters of multi-core processors. Global error estimation for the Magnus propagator is devised using a posteriori error estimation theory. In doing so, the self-adjointness of the linear Schrödinger equation is exploited to avoid solving an adjoint equation. Efficiency and effectiveness of the estimate are demonstrated for both bounded and unbounded states. The temporal approximation is combined with adaptive spectral elements in space. Lagrange elements based on Gauss-Lobatto nodes are employed to avoid nondiagonal mass matrices and ill-conditioning at high order. A matrix-free implementation for the evaluation of the spectral element operators is presented. The framework uses hybrid parallelism and enables significant computational speed-up as well as the solution of larger problems compared to traditional implementations relying on sparse matrices. As an alternative to grid-based methods, radial basis functions in a Galerkin setting are proposed and analyzed. It is found that considerably higher accuracy can be obtained with the same number of basis functions compared to the Fourier method. Another direction of research presented in this thesis is a new algorithm for quantum optimal control: The field is optimized in the frequency domain where the dimensionality of the optimization problem can drastically be reduced. In this way, it becomes feasible to use a quasi-Newton method to solve the problem. / eSSENCE time-dependent Schrödinger equation quantum optimal control exponential integrators spectral elements radial basis functions global error control and adaptivity

Search results