Global ETD Search

381	An Application-Attuned Framework for Optimizing HPC Storage Systems Paul, Arnab Kumar 19 August 2020 (has links) High performance computing (HPC) is routinely employed in diverse domains such as life sciences, and Geology, to simulate and understand the behavior of complex phenomena. Big data driven scientific simulations are resource intensive and require both computing and I/O capabilities at scale. There is a crucial need for revisiting the HPC I/O subsystem to better optimize for and manage the increased pressure on the underlying storage systems from big data processing. Extant HPC storage systems are designed and tuned for a specific set of applications targeting a range of workload characteristics, but they lack the flexibility in adapting to the ever-changing application behaviors. The complex nature of modern HPC storage systems along with the ever-changing application behaviors present unique opportunities and engineering challenges. In this dissertation, we design and develop a framework for optimizing HPC storage systems by making them application-attuned. We select three different kinds of HPC storage systems - in-memory data analytics frameworks, parallel file systems and object storage. We first analyze the HPC application I/O behavior by studying real-world I/O traces. Next we optimize parallelism for applications running in-memory, then we design data management techniques for HPC storage systems, and finally focus on low-level I/O load balance for improving the efficiency of modern HPC storage systems. / Doctor of Philosophy / Clusters of multiple computers connected through internet are often deployed in industry and laboratories for large scale data processing or computation that cannot be handled by standalone computers. In such a cluster, resources such as CPU, memory, disks are integrated to work together. With the increase in popularity of applications that read and write a tremendous amount of data, we need a large number of disks that can interact effectively in such clusters. This forms the part of high performance computing (HPC) storage systems. Such HPC storage systems are used by a diverse set of applications coming from organizations from a vast range of domains from earth sciences, financial services, telecommunication to life sciences. Therefore, the HPC storage system should be efficient to perform well for the different read and write (I/O) requirements from all the different sets of applications. But current HPC storage systems do not cater to the varied I/O requirements. To this end, this dissertation designs and develops a framework for HPC storage systems that is application-attuned and thus provides much improved performance than other state-of-the-art HPC storage systems without such optimizations. Parallel File Systems Object-Based Storage Data Management Load Balancing File System Indexing Metadata Management High Performance Computing
382	Towards the development of a reliable reconfigurable real-time operating system on FPGAs Hong, Chuan January 2013 (has links) In the last two decades, Field Programmable Gate Arrays (FPGAs) have been rapidly developed from simple “glue-logic” to a powerful platform capable of implementing a System on Chip (SoC). Modern FPGAs achieve not only the high performance compared with General Purpose Processors (GPPs), thanks to hardware parallelism and dedication, but also better programming flexibility, in comparison to Application Specific Integrated Circuits (ASICs). Moreover, the hardware programming flexibility of FPGAs is further harnessed for both performance and manipulability, which makes Dynamic Partial Reconfiguration (DPR) possible. DPR allows a part or parts of a circuit to be reconfigured at run-time, without interrupting the rest of the chip’s operation. As a result, hardware resources can be more efficiently exploited since the chip resources can be reused by swapping in or out hardware tasks to or from the chip in a time-multiplexed fashion. In addition, DPR improves fault tolerance against transient errors and permanent damage, such as Single Event Upsets (SEUs) can be mitigated by reconfiguring the FPGA to avoid error accumulation. Furthermore, power and heat can be reduced by removing finished or idle tasks from the chip. For all these reasons above, DPR has significantly promoted Reconfigurable Computing (RC) and has become a very hot topic. However, since hardware integration is increasing at an exponential rate, and applications are becoming more complex with the growth of user demands, highlevel application design and low-level hardware implementation are increasingly separated and layered. As a consequence, users can obtain little advantage from DPR without the support of system-level middleware. To bridge the gap between the high-level application and the low-level hardware implementation, this thesis presents the important contributions towards a Reliable, Reconfigurable and Real-Time Operating System (R3TOS), which facilitates the user exploitation of DPR from the application level, by managing the complex hardware in the background. In R3TOS, hardware tasks behave just like software tasks, which can be created, scheduled, and mapped to different computing resources on the fly. The novel contributions of this work are: 1) a novel implementation of an efficient task scheduler and allocator; 2) implementation of a novel real-time scheduling algorithm (FAEDF) and two efficacious allocating algorithms (EAC and EVC), which schedule tasks in real-time and circumvent emerging faults while maintaining more compact empty areas. 3) Design and implementation of a faulttolerant microprocessor by harnessing the existing FPGA resources, such as Error Correction Code (ECC) and configuration primitives. 4) A novel symmetric multiprocessing (SMP)-based architectures that supports shared memory programing interface. 5) Two demonstrations of the integrated system, including a) the K-Nearest Neighbour classifier, which is a non-parametric classification algorithm widely used in various fields of data mining; and b) pairwise sequence alignment, namely the Smith Waterman algorithm, used for identifying similarities between two biological sequences. R3TOS gives considerably higher flexibility to support scalable multi-user, multitasking applications, whereby resources can be dynamically managed in respect of user requirements and hardware availability. Benefiting from this, not only the hardware resources can be more efficiently used, but also the system performance can be significantly increased. Results show that the scheduling and allocating efficiencies have been improved up to 2x, and the overall system performance is further improved by ~2.5x. Future work includes the development of Network on Chip (NoC), which is expected to further increase the communication throughput; as well as the standardization and automation of our system design, which will be carried out in line with the enablement of other high-level synthesis tools, to allow application developers to benefit from the system in a more efficient manner. 621.39
383	A comparative analysis of the performance and deployment overhead of parallelized Finite Difference Time Domain (FDTD) algorithms on a selection of high performance multiprocessor computing systems Ilgner, Robert Georg 12 1900 (has links) Thesis (PhD)--Stellenbosch University, 2013. / ENGLISH ABSTRACT: The parallel FDTD method as used in computational electromagnetics is implemented on a variety of different high performance computing platforms. These parallel FDTD implementations have regularly been compared in terms of performance or purchase cost, but very little systematic consideration has been given to how much effort has been used to create the parallel FDTD for a specific computing architecture. The deployment effort for these platforms has changed dramatically with time, the deployment time span used to create FDTD implementations in 1980 ranging from months, to the contemporary scenario where parallel FDTD methods can be implemented on a supercomputer in a matter of hours. This thesis compares the effort required to deploy the parallel FDTD on selected computing platforms from the constituents that make up the deployment effort, such as coding complexity and time of coding. It uses the deployment and performance of the serial FDTD method on a single personal computer as a benchmark and examines the deployments of the parallel FDTD using different parallelisation techniques. These FDTD deployments are then analysed and compared against one another in order to determine the common characteristics between the FDTD implementations on various computing platforms with differing parallelisation techniques. Although subjective in some instances, these characteristics are quantified and compared in tabular form, by using the research information created by the parallel FDTD implementations. The deployment effort is of interest to scientists and engineers considering the creation or purchase of an FDTD-like solution on a high performance computing platform. Although the FDTD method has been considered to be a brute force approach to solving computational electromagnetic problems in the past, this was very probably a factor of the relatively weak computing platforms which took very long periods to process small model sizes. This thesis will describe the current implementations of the parallel FDTD method, made up of a combination of several techniques. These techniques can be easily deployed in a relatively quick time frame on computing architectures ranging from IBM’s Bluegene/P to the amalgamation of multicore processor and graphics processing unit, known as an accelerated processing unit. / AFRIKAANSE OPSOMMING: Die parallel Eindige Verskil Tyd Domein (Eng: FDTD) metode word gebruik in numeriese elektromagnetika en kan op verskeie hoë werkverrigting rekenaars geïmplementeer word. Hierdie parallele FDTD implementasies word gereeld in terme van werkverrigting of aankoop koste vergelyk, maar word bitter min sistematies oorweeg in terme van die hoeveelheid moeite wat dit geverg het om die parallele FDTD vir 'n spesifieke rekenaar argitektuur te skep. Mettertyd het die moeite om die platforms te ontplooi dramaties verander, in the 1980's het die ontplooings tyd tipies maande beloop waarteenoor dit vandag binne 'n kwessie van ure gedoen kan word. Hierdie tesis vergelyk die inspanning wat nodig is om die parallelle FDTD op geselekteerde rekenaar platforms te ontplooi deur te kyk na faktore soos die kompleksiteit van kodering en die tyd wat dit vat om 'n kode te implementeer. Die werkverrigting van die serie FDTD metode, geïmplementeer op 'n enkele persoonlike rekenaar word gebruik as 'n maatstaf om die ontplooing van die parallel FDTD met verskeie parallelisasie tegnieke te evalueer. Deur hierdie FDTD ontplooiings met verskillende parallelisasie tegnieke te ontleed en te vergelyk word die gemeenskaplike eienskappe bepaal vir verskeie rekenaar platforms. Alhoewel sommige gevalle subjektief is, is hierdie eienskappe gekwantifiseer en vergelyk in tabelvorm deur gebruik te maak van die navorsings inligting geskep deur die parallel FDTD implementasies. Die ontplooiings moeite is belangrik vir wetenskaplikes en ingenieurs wat moet besluit tussen die ontwikkeling of aankoop van 'n FDTD tipe oplossing op 'n höe werkverrigting rekenaar. Hoewel die FDTD metode in die verlede beskou was as 'n brute krag benadering tot die oplossing van elektromagnetiese probleme was dit waarskynlik weens die relatiewe swak rekenaar platforms wat lank gevat het om klein modelle te verwerk. Hierdie tesis beskryf die moderne implementering van die parallele FDTD metode, bestaande uit 'n kombinasie van verskeie tegnieke. Hierdie tegnieke kan maklik in 'n relatiewe kort tydsbestek ontplooi word op rekenaar argitekture wat wissel van IBM se BlueGene / P tot die samesmelting van multikern verwerkers en grafiese verwerkings eenhede, beter bekend as 'n versnelde verwerkings eenheid. Computational electromagnetics FDTD High performance computing GPU Computer electromagnetism Finite differences Time-domain analysis
384	ZIH-Info 08 November 2016 (has links) (PDF) - Neue IP-Adressen für DNS-Server - Housing-Konzept für den Trefftz-Bau - Informationsveranstaltung Identitätsmanagement - ZIH präsentiert sich auf der SC16 in Salt Lake City - ZIH-Kolloquium - ZIH-Publikationen - Veranstaltungen ZIH Rechenzentrum data processing center ddc:004 ddc:621.39 rvk:AL 51908 rvk:QX 840 rvk:AK 29000
385	ZIH-Info 08 November 2016 (has links) (PDF) - Wartungsarbeiten am Datennetz - Ablösung dezentraler Nutzerverwaltungen - Wissenswertes zu SharePoint - ZIH im neuen Web-Design - Deutschlandweite Forschungsdaten-Infrastruktur - Erster Cyber Security Day erfolgreich - ZIH-Publikationen - Veranstaltungen ZIH Rechenzentrum data processing center ddc:004 ddc:621.39 rvk:AL 51908 rvk:QX 840 rvk:AK 29000
386	ZIH-Info 08 November 2016 (has links) (PDF) - Aktualisierung Mailinglisten-Dienst - WLAN-Gastzugänge: Neuer Bereitstellungsprozess - Willensbildung und Datenmanagement - ZIH-Kolloquium - Workshop zu Strukturprinzipien indischer Musik - ICC 2016 an der TU Dresden Mitteilung aus dem Medienzentrum - Geschützte Inhalte im WebCMS - ZIH-Publikationen - Veranstaltungen ZIH Rechenzentrum data processing center ddc:004 ddc:621.39 rvk:AL 51908 rvk:QX 840 rvk:AK 29000
387	ZIH-Info 06 April 2017 (has links) (PDF) - Betriebsbereitschaft zum Jahreswechsel 2016/17 - Zentrale Firewall an der TU Dresden - Black-Building-Test im LZR - Neue Generation von digitalen Zertifikaten im DFN - Föderatives Gitlab an der TU Chemnitz - Performance-Engineering-Strukturen für HPC-Zentren - ZIH-Publikationen - Veranstaltungen ZIH Rechenzentrum data processing center ddc:004 ddc:621.39 rvk:AL 51908 rvk:QX 840 rvk:AK 29000
388	ZIH-Info 06 April 2017 (has links) (PDF) - Zentrale Firewall an der TU Dresden - Erweiterte Anzeige des Nutzerprofils im IDM - Erhöhte Sicherheit durch eigenes WLAN-Passwort - Bezug von Microsoft Office über Campus Sachsen - Dresden als Schmiede digitaler Zukunftsindustrien - Energieeffizienz-Meilenstein im LZR erreicht - Codename: Knights Landing (KNL) - ZIH-Kolloquium - ZIH-Publikationen - Veranstaltungen ZIH Rechenzentrum data processing center ddc:004 ddc:621.39 rvk:AL 51908 rvk:QX 840 rvk:AK 29000
389	Accelerating Finite State Projection through General Purpose Graphics Processing Trimeloni, Thomas 07 April 2011 (has links) The finite state projection algorithm provides modelers a new way of directly solving the chemical master equation. The algorithm utilizes the matrix exponential function, and so the algorithm’s performance suffers when it is applied to large problems. Other work has been done to reduce the size of the exponentiation through mathematical simplifications, but efficiently exponentiating a large matrix has not been explored. This work explores implementing the finite state projection algorithm on several different high-performance computing platforms as a means of efficiently calculating the matrix exponential function for large systems. This work finds that general purpose graphics processing can accelerate the finite state projection algorithm by several orders of magnitude. Specific biological models and modeling techniques are discussed as a demonstration of the algorithm implemented on a general purpose graphics processor. The results of this work show that general purpose graphics processing will be a key factor in modeling more complex biological systems. Finite State Projection Systems Biology High-Performance Computing Graphics Processing Modeling Gillespie Algorithm Chemical Master Equation Differential Equations Stochastic Simulation Engineering
390	Une approche dynamique pour l'optimisation des communications concurrentes sur réseaux hautes performance Brunet, Elisabeth 08 December 2008 (has links) Cette thèse cherche à optimiser les communications des applications de calcul intensif s'exécutant sur des grappes de PC. En raison de l'usage massif de processeurs multicoeurs, il est désormais impératif de gérer un grand nombre de flux de communication concurrents. Nous avons mis en évidence et analysé les performances décevantes des solutions actuelles dans un tel contexte. Nous avons ainsi proposé une architecture de communication centrée sur l'arbitrage de l'accès aux matériels. Son originalité réside dans la dissociation de l'activité de l'application de celle des cartes réseaux. Notre modèle exploite l'intervalle de temps introduit entre le dépot des requêtes de communication et la disponibilité des cartes réseaux pour appliquer des optimisations de manière opportuniste. NewMadeleine implémente ce concept et se révèle capable d'exploiter les réseaux les plus performants du moment. Des tests synthétiques et portages d'implémentations caractéristiques de MPI ont permis de valider l'architecture proposée. / The aim of this thesis is to optimize the communications of high performance applications, in the context of clusters computing. Given the massive use of multicore architectures, it is now crucial to handle a large number of concurrent communication flows. We highlighted and analyzed the shortcomings of existing solutions. We therefore designed a new way to schedule communication flows by focusing on the activity of the network cards. Its novelty consists in untying the activity of applications from that of the network cards. Our model takes advantage of the delay that exists between the deposal of the communication requests and the moment when the network cards become idle in order to apply some opportunistic optimizations. NewMadeleine implements this model, thus making possible to exploit last generation high speed networks. The approach of NewMadeleine is not only validated by synthetical tests but also by real applications. Informatique Calcul intensif Grappes de machines Communications haute performance Ordonnancement Optimisation MPI Réseau Parallélisme Computer science High performance computing Cluster computing High speed communication Scheduling Optimizations MPI Networking Parallelism

Search results