Global ETD Search

381	Towards the development of a reliable reconfigurable real-time operating system on FPGAs Hong, Chuan January 2013 (has links) In the last two decades, Field Programmable Gate Arrays (FPGAs) have been rapidly developed from simple “glue-logic” to a powerful platform capable of implementing a System on Chip (SoC). Modern FPGAs achieve not only the high performance compared with General Purpose Processors (GPPs), thanks to hardware parallelism and dedication, but also better programming flexibility, in comparison to Application Specific Integrated Circuits (ASICs). Moreover, the hardware programming flexibility of FPGAs is further harnessed for both performance and manipulability, which makes Dynamic Partial Reconfiguration (DPR) possible. DPR allows a part or parts of a circuit to be reconfigured at run-time, without interrupting the rest of the chip’s operation. As a result, hardware resources can be more efficiently exploited since the chip resources can be reused by swapping in or out hardware tasks to or from the chip in a time-multiplexed fashion. In addition, DPR improves fault tolerance against transient errors and permanent damage, such as Single Event Upsets (SEUs) can be mitigated by reconfiguring the FPGA to avoid error accumulation. Furthermore, power and heat can be reduced by removing finished or idle tasks from the chip. For all these reasons above, DPR has significantly promoted Reconfigurable Computing (RC) and has become a very hot topic. However, since hardware integration is increasing at an exponential rate, and applications are becoming more complex with the growth of user demands, highlevel application design and low-level hardware implementation are increasingly separated and layered. As a consequence, users can obtain little advantage from DPR without the support of system-level middleware. To bridge the gap between the high-level application and the low-level hardware implementation, this thesis presents the important contributions towards a Reliable, Reconfigurable and Real-Time Operating System (R3TOS), which facilitates the user exploitation of DPR from the application level, by managing the complex hardware in the background. In R3TOS, hardware tasks behave just like software tasks, which can be created, scheduled, and mapped to different computing resources on the fly. The novel contributions of this work are: 1) a novel implementation of an efficient task scheduler and allocator; 2) implementation of a novel real-time scheduling algorithm (FAEDF) and two efficacious allocating algorithms (EAC and EVC), which schedule tasks in real-time and circumvent emerging faults while maintaining more compact empty areas. 3) Design and implementation of a faulttolerant microprocessor by harnessing the existing FPGA resources, such as Error Correction Code (ECC) and configuration primitives. 4) A novel symmetric multiprocessing (SMP)-based architectures that supports shared memory programing interface. 5) Two demonstrations of the integrated system, including a) the K-Nearest Neighbour classifier, which is a non-parametric classification algorithm widely used in various fields of data mining; and b) pairwise sequence alignment, namely the Smith Waterman algorithm, used for identifying similarities between two biological sequences. R3TOS gives considerably higher flexibility to support scalable multi-user, multitasking applications, whereby resources can be dynamically managed in respect of user requirements and hardware availability. Benefiting from this, not only the hardware resources can be more efficiently used, but also the system performance can be significantly increased. Results show that the scheduling and allocating efficiencies have been improved up to 2x, and the overall system performance is further improved by ~2.5x. Future work includes the development of Network on Chip (NoC), which is expected to further increase the communication throughput; as well as the standardization and automation of our system design, which will be carried out in line with the enablement of other high-level synthesis tools, to allow application developers to benefit from the system in a more efficient manner. 621.39
382	A comparative analysis of the performance and deployment overhead of parallelized Finite Difference Time Domain (FDTD) algorithms on a selection of high performance multiprocessor computing systems Ilgner, Robert Georg 12 1900 (has links) Thesis (PhD)--Stellenbosch University, 2013. / ENGLISH ABSTRACT: The parallel FDTD method as used in computational electromagnetics is implemented on a variety of different high performance computing platforms. These parallel FDTD implementations have regularly been compared in terms of performance or purchase cost, but very little systematic consideration has been given to how much effort has been used to create the parallel FDTD for a specific computing architecture. The deployment effort for these platforms has changed dramatically with time, the deployment time span used to create FDTD implementations in 1980 ranging from months, to the contemporary scenario where parallel FDTD methods can be implemented on a supercomputer in a matter of hours. This thesis compares the effort required to deploy the parallel FDTD on selected computing platforms from the constituents that make up the deployment effort, such as coding complexity and time of coding. It uses the deployment and performance of the serial FDTD method on a single personal computer as a benchmark and examines the deployments of the parallel FDTD using different parallelisation techniques. These FDTD deployments are then analysed and compared against one another in order to determine the common characteristics between the FDTD implementations on various computing platforms with differing parallelisation techniques. Although subjective in some instances, these characteristics are quantified and compared in tabular form, by using the research information created by the parallel FDTD implementations. The deployment effort is of interest to scientists and engineers considering the creation or purchase of an FDTD-like solution on a high performance computing platform. Although the FDTD method has been considered to be a brute force approach to solving computational electromagnetic problems in the past, this was very probably a factor of the relatively weak computing platforms which took very long periods to process small model sizes. This thesis will describe the current implementations of the parallel FDTD method, made up of a combination of several techniques. These techniques can be easily deployed in a relatively quick time frame on computing architectures ranging from IBM’s Bluegene/P to the amalgamation of multicore processor and graphics processing unit, known as an accelerated processing unit. / AFRIKAANSE OPSOMMING: Die parallel Eindige Verskil Tyd Domein (Eng: FDTD) metode word gebruik in numeriese elektromagnetika en kan op verskeie hoë werkverrigting rekenaars geïmplementeer word. Hierdie parallele FDTD implementasies word gereeld in terme van werkverrigting of aankoop koste vergelyk, maar word bitter min sistematies oorweeg in terme van die hoeveelheid moeite wat dit geverg het om die parallele FDTD vir 'n spesifieke rekenaar argitektuur te skep. Mettertyd het die moeite om die platforms te ontplooi dramaties verander, in the 1980's het die ontplooings tyd tipies maande beloop waarteenoor dit vandag binne 'n kwessie van ure gedoen kan word. Hierdie tesis vergelyk die inspanning wat nodig is om die parallelle FDTD op geselekteerde rekenaar platforms te ontplooi deur te kyk na faktore soos die kompleksiteit van kodering en die tyd wat dit vat om 'n kode te implementeer. Die werkverrigting van die serie FDTD metode, geïmplementeer op 'n enkele persoonlike rekenaar word gebruik as 'n maatstaf om die ontplooing van die parallel FDTD met verskeie parallelisasie tegnieke te evalueer. Deur hierdie FDTD ontplooiings met verskillende parallelisasie tegnieke te ontleed en te vergelyk word die gemeenskaplike eienskappe bepaal vir verskeie rekenaar platforms. Alhoewel sommige gevalle subjektief is, is hierdie eienskappe gekwantifiseer en vergelyk in tabelvorm deur gebruik te maak van die navorsings inligting geskep deur die parallel FDTD implementasies. Die ontplooiings moeite is belangrik vir wetenskaplikes en ingenieurs wat moet besluit tussen die ontwikkeling of aankoop van 'n FDTD tipe oplossing op 'n höe werkverrigting rekenaar. Hoewel die FDTD metode in die verlede beskou was as 'n brute krag benadering tot die oplossing van elektromagnetiese probleme was dit waarskynlik weens die relatiewe swak rekenaar platforms wat lank gevat het om klein modelle te verwerk. Hierdie tesis beskryf die moderne implementering van die parallele FDTD metode, bestaande uit 'n kombinasie van verskeie tegnieke. Hierdie tegnieke kan maklik in 'n relatiewe kort tydsbestek ontplooi word op rekenaar argitekture wat wissel van IBM se BlueGene / P tot die samesmelting van multikern verwerkers en grafiese verwerkings eenhede, beter bekend as 'n versnelde verwerkings eenheid. Computational electromagnetics FDTD High performance computing GPU Computer electromagnetism Finite differences Time-domain analysis
383	ZIH-Info 08 November 2016 (has links) (PDF) - Neue IP-Adressen für DNS-Server - Housing-Konzept für den Trefftz-Bau - Informationsveranstaltung Identitätsmanagement - ZIH präsentiert sich auf der SC16 in Salt Lake City - ZIH-Kolloquium - ZIH-Publikationen - Veranstaltungen ZIH Rechenzentrum data processing center ddc:004 ddc:621.39 rvk:AL 51908 rvk:QX 840 rvk:AK 29000
384	ZIH-Info 08 November 2016 (has links) (PDF) - Wartungsarbeiten am Datennetz - Ablösung dezentraler Nutzerverwaltungen - Wissenswertes zu SharePoint - ZIH im neuen Web-Design - Deutschlandweite Forschungsdaten-Infrastruktur - Erster Cyber Security Day erfolgreich - ZIH-Publikationen - Veranstaltungen ZIH Rechenzentrum data processing center ddc:004 ddc:621.39 rvk:AL 51908 rvk:QX 840 rvk:AK 29000
385	ZIH-Info 08 November 2016 (has links) (PDF) - Aktualisierung Mailinglisten-Dienst - WLAN-Gastzugänge: Neuer Bereitstellungsprozess - Willensbildung und Datenmanagement - ZIH-Kolloquium - Workshop zu Strukturprinzipien indischer Musik - ICC 2016 an der TU Dresden Mitteilung aus dem Medienzentrum - Geschützte Inhalte im WebCMS - ZIH-Publikationen - Veranstaltungen ZIH Rechenzentrum data processing center ddc:004 ddc:621.39 rvk:AL 51908 rvk:QX 840 rvk:AK 29000
386	ZIH-Info 06 April 2017 (has links) (PDF) - Betriebsbereitschaft zum Jahreswechsel 2016/17 - Zentrale Firewall an der TU Dresden - Black-Building-Test im LZR - Neue Generation von digitalen Zertifikaten im DFN - Föderatives Gitlab an der TU Chemnitz - Performance-Engineering-Strukturen für HPC-Zentren - ZIH-Publikationen - Veranstaltungen ZIH Rechenzentrum data processing center ddc:004 ddc:621.39 rvk:AL 51908 rvk:QX 840 rvk:AK 29000
387	ZIH-Info 06 April 2017 (has links) (PDF) - Zentrale Firewall an der TU Dresden - Erweiterte Anzeige des Nutzerprofils im IDM - Erhöhte Sicherheit durch eigenes WLAN-Passwort - Bezug von Microsoft Office über Campus Sachsen - Dresden als Schmiede digitaler Zukunftsindustrien - Energieeffizienz-Meilenstein im LZR erreicht - Codename: Knights Landing (KNL) - ZIH-Kolloquium - ZIH-Publikationen - Veranstaltungen ZIH Rechenzentrum data processing center ddc:004 ddc:621.39 rvk:AL 51908 rvk:QX 840 rvk:AK 29000
388	Accelerating Finite State Projection through General Purpose Graphics Processing Trimeloni, Thomas 07 April 2011 (has links) The finite state projection algorithm provides modelers a new way of directly solving the chemical master equation. The algorithm utilizes the matrix exponential function, and so the algorithm’s performance suffers when it is applied to large problems. Other work has been done to reduce the size of the exponentiation through mathematical simplifications, but efficiently exponentiating a large matrix has not been explored. This work explores implementing the finite state projection algorithm on several different high-performance computing platforms as a means of efficiently calculating the matrix exponential function for large systems. This work finds that general purpose graphics processing can accelerate the finite state projection algorithm by several orders of magnitude. Specific biological models and modeling techniques are discussed as a demonstration of the algorithm implemented on a general purpose graphics processor. The results of this work show that general purpose graphics processing will be a key factor in modeling more complex biological systems. Finite State Projection Systems Biology High-Performance Computing Graphics Processing Modeling Gillespie Algorithm Chemical Master Equation Differential Equations Stochastic Simulation Engineering
389	Une approche dynamique pour l'optimisation des communications concurrentes sur réseaux hautes performance Brunet, Elisabeth 08 December 2008 (has links) Cette thèse cherche à optimiser les communications des applications de calcul intensif s'exécutant sur des grappes de PC. En raison de l'usage massif de processeurs multicoeurs, il est désormais impératif de gérer un grand nombre de flux de communication concurrents. Nous avons mis en évidence et analysé les performances décevantes des solutions actuelles dans un tel contexte. Nous avons ainsi proposé une architecture de communication centrée sur l'arbitrage de l'accès aux matériels. Son originalité réside dans la dissociation de l'activité de l'application de celle des cartes réseaux. Notre modèle exploite l'intervalle de temps introduit entre le dépot des requêtes de communication et la disponibilité des cartes réseaux pour appliquer des optimisations de manière opportuniste. NewMadeleine implémente ce concept et se révèle capable d'exploiter les réseaux les plus performants du moment. Des tests synthétiques et portages d'implémentations caractéristiques de MPI ont permis de valider l'architecture proposée. / The aim of this thesis is to optimize the communications of high performance applications, in the context of clusters computing. Given the massive use of multicore architectures, it is now crucial to handle a large number of concurrent communication flows. We highlighted and analyzed the shortcomings of existing solutions. We therefore designed a new way to schedule communication flows by focusing on the activity of the network cards. Its novelty consists in untying the activity of applications from that of the network cards. Our model takes advantage of the delay that exists between the deposal of the communication requests and the moment when the network cards become idle in order to apply some opportunistic optimizations. NewMadeleine implements this model, thus making possible to exploit last generation high speed networks. The approach of NewMadeleine is not only validated by synthetical tests but also by real applications. Informatique Calcul intensif Grappes de machines Communications haute performance Ordonnancement Optimisation MPI Réseau Parallélisme Computer science High performance computing Cluster computing High speed communication Scheduling Optimizations MPI Networking Parallelism
390	Un environnement pour le calcul intensif pair à pair / An environment for peer-to-peer high performance computing Nguyen, The Tung 16 November 2011 (has links) Le concept de pair à pair (P2P) a connu récemment de grands développements dans les domaines du partage de fichiers, du streaming vidéo et des bases de données distribuées. Le développement du concept de parallélisme dans les architectures de microprocesseurs et les avancées en matière de réseaux à haut débit permettent d'envisager de nouvelles applications telles que le calcul intensif distribué. Cependant, la mise en oeuvre de ce nouveau type d'application sur des réseaux P2P pose de nombreux défis comme l'hétérogénéité des machines, le passage à l'échelle et la robustesse. Par ailleurs, les protocoles de transport existants comme TCP et UDP ne sont pas bien adaptés à ce nouveau type d'application. Ce mémoire de thèse a pour objectif de présenter un environnement décentralisé pour la mise en oeuvre de calculs intensifs sur des réseaux pair à pair. Nous nous intéressons à des applications dans les domaines de la simulation numérique et de l'optimisation qui font appel à des modèles de type parallélisme de tâches et qui sont résolues au moyen d'algorithmes itératifs distribués or parallèles. Contrairement aux solutions existantes, notre environnement permet des communications directes et fréquentes entre les pairs. L'environnement est conçu à partir d'un protocole de communication auto-adaptatif qui peut se reconfigurer en adoptant le mode de communication le plus approprié entre les pairs en fonction de choix algorithmiques relevant de la couche application ou d'éléments de contexte comme la topologie au niveau de la couche réseau. Nous présentons et analysons des résultats expérimentaux obtenus sur diverses plateformes comme GRID'5000 et PlanetLab pour le problème de l'obstacle et des problèmes non linéaires de flots dans les réseaux. / The concept of peer-to-peer (P2P) has known great developments these years in the domains of file sharing, video streaming or distributed databases. Recent advances in microprocessors architecture and networks permit one to consider new applications like distributed high performance computing. However, the implementation of this new type of application on P2P networks gives raise to numerous challenges like heterogeneity, scalability and robustness. In addition, existing transport protocols like TCP and UDP are not well suited to this new type of application. This thesis aims at designing a decentralized and robust environment for the implementation of high performance computing applications on peer-to-peer networks. We are interested in applications in the domains of numerical simulation and optimization that rely on tasks parallel models and that are solved via parallel or distributed iterative algorithms. Unlike existing solutions, our environment allows frequent direct communications between peers. The environment is based on a self adaptive communication protocol that can reconfigure itself dynamically by choosing the most appropriate communication mode between any peers according to decisions concerning algorithmic choice made at the application level or elements of context at transport level, like topology. We present and analyze computational results obtained on several testeds like GRID’5000 and PlanetLab for the obstacle problem and nonlinear network flow problems. Calcul pair à pair Calcul distribué Calcul à haute performance Itérations asynchrones Environnement décentralisé de calcul Peer-to-peer Parallel computing High performance computing Communication protocol Self-adaptive Environment

Search results