Global ETD Search

21	Building and operating large-scale SpiNNaker machines Heathcote, Jonathan David January 2016 (has links) SpiNNaker is an unconventional supercomputer architecture designed to simulate up to one billion biologically realistic neurons in real-time. To achieve this goal, SpiNNaker employs a novel network architecture which poses a number of practical problems in scaling up from desktop prototypes to machine room filling installations. SpiNNaker's hexagonal torus network topology has received mostly theoretical treatment in the literature. This thesis tackles some of the challenges encountered when building `real-world' systems. Firstly, a scheme is devised for physically laying out hexagonal torus topologies in machine rooms which avoids long cables; this is demonstrated on a half-million core SpiNNaker prototype. Secondly, to improve the performance of existing routing algorithms, a more efficient process is proposed for finding (logically) short paths through hexagonal torus topologies. This is complemented by a formula which provides routing algorithms with greater flexibility when finding paths, potentially resulting in a more balanced network utilisation. The scale of SpiNNaker's network and the models intended for it also present their own challenges. Placement and routing algorithms are developed which assign processes to nodes and generate paths through SpiNNaker's network. These algorithms minimise congestion and tolerate network faults. The proposed placement algorithm is inspired by techniques used in chip design and is shown to enable larger applications to run on SpiNNaker than the previous state-of-the-art. Likewise the routing algorithm developed is able to tolerate network faults, inevitably present in large-scale systems, with little performance overhead. 004.1
22	Mitteilungen des URZ 2000 Becher,, Clauß,, Heik,, Hübsch,, Müller,, Richter,, Riedel,, Schier,, Wolf,, Ziegler, 23 November 2000 (has links) Inhalt: Das CLIC-Projekt; X2X - ein Werkzeug zur Darstellung von XML-Dokumenten; Software-Bereitstellung; Der Backupdienst des URZ; Betreuung von Ausbildungspools durch das URZ; Aktueller Netzausbauzustand; Komfortablere E-Mail-Bearbeitung mit IMAP; MagicPoint - Präsentationen unter Linux; 10 Jahre ¨UNIX-Stammtisch in Sachsen¨ CLIC X2X Backup MagicPoint URZ-Dienste Campusnetz Computerpools für die Ausbildung Zugriffsrechte Gruppen-Mailboxen Präsentations-Software UNIX-Stammtisch ddc:004 Supercomputer XML Software IMAP E-Mail
23	Implementace 2D ultrazvukových simulací / Implementation of 2D Ultrasound Simulations Šimek, Dominik January 2018 (has links) The work deals with design and implementation of 2D ultrasound simulation. Applications of the ultrasound simulation can be found in medicine, biophysic or image reconstruction. As an example of using the ultrasound simulation we can mention High Intensity Focused Ultrasound that is used for diagnosing and treating cancer. The program is part of the k-Wave toolbox designed for supercomputer systems, specifically for machines with shared memory architecture. The program is implemented in the C++ language and using OpenMP acceleration. Using the designed solution, it is possible to solve large-scale simulations in 2D space. The work also deals with merging and unification of the 2D and 3D simulation using modern C++. A realistic example of use is ultrasound simulation in transcranial neuromodulation and neurostimulation in large domains, which have more than 16384x16384 grid points. Simulation of such size may take several days if we use the original MATLAB 2D k-Wave. Speedup of the new implementation is up to 8 on the Anselm and Salomon supercomputers.
24	Paralelizace ultrazvukových simulací pomocí 2D dekompozice / Parallelization of Ultrasound Simulations Using 2D Decomposition Nikl, Vojtěch January 2014 (has links) This thesis is a part of the k-Wave project, which is a toolbox for the simulation and reconstruction of acoustic wave felds and one of its main contributions is the planning of focused ultrasound surgeries (HIFU). One simulation can take tens of hours and about 60% of the simulation time is taken by the calculation of the 3D Fast Fourier transforms. Up until now the 3D FFT has been calculated purely by the FFTW library and its 1D decomposition, whose major limitation is the maximum number of employable cores. Therefore we introduce a new approach, called the 2D hybrid decomposition of the 3D FFT (HybridFFT), where we combine both MPI processes and OpenMP threads to reach as best performance as possible. On a low number of cores, on the order of a few hundreds, we are about as fast or slightly faster than FFTW and pure MPI 2D decomposition libraries (PFFT and P3DFFT). One of the best results was achieved on a 512^3FFT using 512 cores, where our hybrid version run 31ms, FFTW run 39ms and PFFT run 44ms. The most significant performance advantage should be seen when employing around 8-16 thousand cores, however we haven't had an access to a machine with such resources. Almost a linear scalability has been proven for up to 2048 employed cores.
25	Mitteilungen des URZ 3/4/1994 Richter, Frank, Riedel, Wolfgang, Schier, Thomas, Schoeniger, Frank, Wagner, Jens, Ziegler, Christoph 22 August 1995 (has links) Supercomputer in Betrieb Chemnitzer Studentennetz eingeweiht Neue Compute-Server Neuer Dienst: PC-Integration TeX -Service des URZ Software-News Advent, Advent - Geschichtenzeit Software-News info:eu-repo/classification/ddc/050 ddc:050 info:eu-repo/classification/ddc/004 ddc:004 Parallelrechner; Supercomputer
26	A Survey of Barrier Algorithms for Coarse Grained Supercomputers Hoefler, Torsten, Mehlan, Torsten, Mietke, Frank, Rehm, Wolfgang 28 June 2005 (has links) There are several different algorithms available to perform a synchronization of multiple processors. Some of them support only shared memory architectures or very fine grained supercomputers. This work gives an overview about all currently known algorithms which are suitable for distributed shared memory architectures and message passing based computer systems (loosely coupled or coarse grained supercomputers). No absolute decision can be made for choosing a barrier algorithm for a machine. Several architectural aspects have to be taken into account. The overview about known barrier algorithms given in this work is mostly targeted to implementors of libraries supporting collective communication (such as MPI). info:eu-repo/classification/ddc/004 ddc:004 MPI <Schnittstelle> Mpi-Sprache Netzwerk <Graphentheorie> Supercomputer Barrier Collective Communication Kollektive Operationen MPI_Barrier
27	Optimizing Applications and Message-Passing Libraries for the QPACE Architecture Wunderlich, Simon 27 March 2009 (has links) The goal of the QPACE project is to build a novel cost-efficient massive parallel supercomputer optimized for LQCD (Lattice Quantum Chromodynamics) applications. Unlike previous projects which use custom ASICs, this is accomplished by using the general purpose multi-core CPU PowerXCell 8i processor tightly coupled with a custom network processor implemented on a modern FPGA. The heterogeneous architecture of the PowerXCell 8i processor and its core-independent OS-bypassing access to the custom network hardware and application-oriented 3D torus topology pose interesting challenges for the implementation of the applications. This work will describe and evaluate the implementation possibilities of message passing APIs: the more general MPI, and the more QCD-oriented QMP, and their performance in PPE centric or SPE centric scenarios. These results will then be employed to optimize HPL for the QPACE architecture. Finally, the developed approaches and concepts will be briefly discussed regarding their applicability to heterogeneous node/network architectures as is the case in the "High-speed Network Interface with Collective Operation Support for Cell BE (NICOLL)" project. info:eu-repo/classification/ddc/000 ddc:000
28	Metascheduling of HPC Jobs in Day-Ahead Electricity Markets Murali, Prakash January 2014 (has links) (PDF) High performance grid computing is a key enabler of large scale collaborative computational science. With the promise of exascale computing, high performance grid systems are expected to incur electricity bills that grow super-linearly over time. In order to achieve cost effectiveness in these systems, it is essential for the scheduling algorithms to exploit electricity price variations, both in space and time, that are prevalent in the dynamic electricity price markets. Typically, a job submission in the batch queues used in these systems incurs a variable queue waiting time before the resources necessary for its execution become available. In variably-priced electricity markets, the electricity prices ﬂuctuate over discrete intervals of time. Hence, the electricity prices incurred during a job execution will depend on the start and end time of the job. Our thesis consists of two parts. In the first part, we develop a method to predict the start and end time of a job at each system in the grid. In batch queue systems, similar jobs which arrive during similar system queue and processor states, experience similar queue waiting times. We have developed an adaptive algorithm for the prediction of queue waiting times on a parallel system based on spatial clustering of the history of job submissions at the system. We represent each job as a point in a feature space using the job characteristics, queue state and the state of the compute nodes at the time of job submission. For each incoming job, we use an adaptive distance function, which assigns a real valued distance to each history job submission based on its similarity to the incoming job. Using a spatial clustering algorithm and a simple empirical characterization of the system states, we identify an appropriate prediction model for the job from among standard deviation minimization method, ridge regression and k-weighted average. We have evaluated our adaptive prediction framework using historical production workload traces of many supercomputer systems with varying system and job characteristics, including two Top500 systems. Across workloads, our predictions result in up to 22% reduction in the average absolute error and up to 56% reduction in the percentage prediction errors over existing techniques. To predict the execution time of a job, we use a simple model based on the estimate of job runtime provided by the user at the time of job submission. In the second part of the thesis, we have developed a metascheduling algorithm that schedules jobs to the individual batch systems of a grid, to reduce both the electricity prices for the systems and response times for the users. We formulate the metascheduling problem as a Minimum Cost Maximum Flow problem and leverage execution period and electricity price predictions to accurately estimate the cost of job execution at a system. The network simplex algorithm is used to minimize the response time and electricity cost of job execution using an appropriate ﬂow network. Using trace based simulation with real and synthetic workload traces, and real electricity price data sets, we demonstrate our approach on two currently operational grids, XSEDE and NorduGrid. Our experimental setup collectively constitute more than 433K processors spread across 58 compute systems in 17 geographically distributed locations. Experiments show that our approach simultaneously optimizes the total electricity cost and the average response time of the grid, without being unfair to users of the local batch systems. Considering that currently operational HPC systems budget millions of dollars for annual operational costs, our approach which can save $167K in annual electricity bills, compared to a baseline strategy, for one of the grids in our test suite with over 76000 cores, is very relevant for reducing grid operational costs in the coming years. Grid Computing Electricity Markets High Performance Grid Computing Electricity Markets Metascheduling Grids Metascheduling Queue Waiting Time Predictions Metascheduling Algorithm HPC Job Metascheduling Day-Ahead Electricity Markets Grid Systems Queue Waiting Times Supercomputer Jobs HPC Jobs Computer Science
29	Realizace superpočítače pomocí grafické karty / Realization of supercomputer using graphic card Jasovský, Filip January 2014 (has links) This master´s thesis deals with realization of supercomputer using graphic card with CUDA technology. The theoretical part of this thesis describes the function and the possibility of graphic cards and desktop computers and processes taking place in the proces sof calculations on them. The practical part deals with creation system for calculations on the graphic card using the algorithm of artificial intelligence, more specifically artificial neural networks. Subsequently is the generated program used for data classification of large input data file. Finally the results are compared.
30	Paralelizace ultrazvukových simulací s využitím lokální Fourierovy dekompozice / Parallelisation of Ultrasound Simulations Using Local Fourier Decomposition Dohnal, Matěj January 2015 (has links) This document introduces a brand new method of the 1D, 2D and 3D decomposition with the use of local Fourier basis, its implementation and comparison with the currently used global 1D domain decomposition. The new method was designed, implemented and tested primarily for future use in the simulation software called The k-Wave toolbox, but it can be applied in many other spectral methods. Compared to the global 1D domain decomposition, the Local Fourier decomposition is up to 3 times faster and more efficient thanks to lower inter-process communication, however it is a little inaccurate. The final part of the thesis discusses the limitations of the new method and also introduces best practices to use 3D Local Fourier decomposition to achieve both more speed and accuracy.

Search results