Spelling suggestions: "subject:"infinidade"" "subject:"infini""
51 |
Evaluation of publicly available Barrier-Algorithms and Improvement of the Barrier-Operation for large-scale Cluster-Systems with special Attention on InfiniBand NetworksHoefler, Torsten 28 June 2005 (has links) (PDF)
The MPI_Barrier-collective operation, as a part of the MPI-1.1
standard, is extremely important for all parallel applications using it.
The latency of this operation increases the application run time and
can not be overlaid. Thus, the whole MPI performance can be decreased
by unsatisfactory barrier latency. The main goals of this work are to
lower the barrier latency for InfiniBand networks by analyzing well
known barrier algorithms with regards to their suitability within
InfiniBand networks, to enhance the barrier operation by utilizing
standard InfiniBand operations as much as possible, and to design a
constant time barrier for InfiniBand with special hardware support.
This partition into three main steps is retained throughout the whole
thesis. The first part evaluates publicly known models and proposes a
new more accurate model (LoP) for InfiniBand. All barrier algorithms are
evaluated within the well known LogP and this new model. Two new
algorithms which promise a better performance have been developed. A
constant time barrier integrated into InfiniBand as well as a cheap
separate barrier network is proposed in the hardware section. All
results have been implemented inside the Open MPI framework. This work
led to three new Open MPI collective modules. The first one implements
different barrier algorithms which are dynamically benchmarked and
selected during the startup phase to maximize the performance. The
second one offers a special barrier implementation for InfiniBand with RDMA
and performs up to 40% better than the best solution that has been
published so far. The third implementation offers a constant time
barrier in a separate network, leveraging commodity components, with a
latency of only 2.5 microseconds. All components have their specialty and can
be used to enhance the barrier performance significantly.
|
52 |
Global address spaces for efficient resource provisioning in the data centerYoung, Jeffrey Scott 13 January 2014 (has links)
The rise of large data sets, or "Big Data'', has coincided with the rise of clusters with large amounts of memory and GPU accelerators that can be used to process rapidly growing data footprints. However, the complexity and performance limitations of sharing memory and accelerators in a cluster limits the options for efficient management and allocation of resources for applications. The global address space model (GAS), and specifically hardware-supported GAS, is proposed as a means to provide a high-performance resource management platform upon which resource sharing between nodes and resource aggregation across nodes
can take place. This thesis builds on the initial concept of GAS with a model that is matched to "Big Data'' computing and its data transfer requirements.
The proposed model, Dynamic Partitioned Global Address Spaces (DPGAS), is implemented using a commodity converged interconnect, HyperTransport over Ethernet (HToE), and a software framework, the Oncilla runtime and API. The DPGAS model and associated hardware and software components are used to investigate two application spaces, resource sharing for time-varying workloads and
resource aggregation for GPU-accelerated data warehousing applications. This work demonstrates that hardware-supported GAS can be used improve the performance and power consumption of memory-intensive applications, and that it can be used to simplify host and accelerator resource management in the data center.
|
53 |
Networking Subsystem Configuration Interface / Networking Subsystem Configuration InterfaceLichtner, Ondrej January 2014 (has links)
Cílem diplomové práce je návrh síťové konfigurační knihovny s důrazem kladeným na přenositelnost mezi operačními systémy na bázi Linuxu a BSD a rozšiřitelnosti podpory knihovny. V druhé kapitole práce zkoumá dostupné konfigurační rozhraní obou operačních systémů. Detailně pak rozebírá vlastnosti rozhraní Netlink socketů, které je primárním konfiguračním rozhraním pro síťové prvky na Linuxu, a systémové volání ioctl, které má na Linuxu menší schopnosti, ale zato je primárně používané na BSD a jiných UNIX systémech. Jsou též zkoumané rozhraní pro konfiguraci rozdílných firewallů. V třetí kapitole je práce zameřená na konkrétní typy síťových zařízení, specifika jejich konfigurace a jejich návaznost na rozhraní jádra popsané v druhé kapitole. V čtvrté kapitole jsou formulovány požadavky na konfigurační knihovnu: jednoduchá rozšiřitelnost, přenositelnost na různé operační systémy, podpora sledování změn a událostí a rozšiřitelnost o různé typy uživatelských rozhraní. Na základě výzkumu z předcházejících dvou kapitol je přednesen návrh knihovny. Návrh definuje konfigurační rozhraní jako hierarchii abstraktních tříd, oddělených od implementace. To umožnuje mít současně několik implementací stejného konfiguračního rozhraní i v rámci jednoho operačního systému. Jako vstupní rozhraní knihovny je definovaná třída LibNCFG, která má na starosti tyto konfigurační objekty vytvořit namísto uživatele. Tímto je dosažená jednoduchá rozšiřitelnost knihovny o nové rozhraní operačních systémů i o podporu konfigurace nových síťových prvků. Podpora pro nové uživatelské rozhraní se dá implementovat jako nová služba, která zabaluje rozhraní knihovny a poskytuje jiná rozhraní. Pro podporu sledování změn poskytuje třída LibNCFG metody pro registraci zpětných volání pro definované události. Ve čtvrté kapitole práce detailně popisuje rozhraní třídy LibNCFG, modulu Common a tříd NetDevice, EthDevice a BondDevice, které definují konfigurační rozhraní příslušných typů síťových zařízení. Pro tyto třídy jsou implementované konkrétní třídy NetlinkNetDevice, NetlinkEthDevice a sysfsBondDevice a popsané jejich implementační detaily. V páté kapitole je popsaná ukázková aplikace, která byla implementovaná pro účely předvedení jednoduchosti použití konfigurační knihovny. Nakonec jsou v závěru shrnuté výsledky práce a je vedena diskuze o možných vylepšeních a o pokračování projektu.
|
54 |
Improving the Performance of Selected MPI Collective Communication Operations on InfiniBand NetworksViertel, Carsten 30 April 2007 (has links)
The performance of collective communication
operations is one of the deciding factors in
the overall performance of a MPI application.
Open MPI's component architecture offers an easy
way to implement new algorithms for collective
operations, but current implementations use the
point-to-point components to access the
InfiniBand network. Therefore it is tried to
improve the performance of a collective component
by accessing the InfiniBand network directly.
This should avoid overhead and make it possible
to tune the algorithms to this specific network.
The first part of this work gives a short overview
of the InfiniBand Architecture and Open MPI. In
the next part several models for parallel
computation are analyzed. Afterwards various
algorithms for the MPI_Scatter, MPI_Gather and
MPI_Allgather operations are presented. The
theoretical performance of the algorithms is
analyzed with the LogfP and LogGP models.
Selected algorithms are implemented
as part of an Open MPI collective component.
Finally the performance of different algorithms and
different MPI implementations is compared. The test
results show, that the performance of the
operations could be improved for several message
and communicator size ranges.
|
55 |
Analysis of Data Center Network Convergence TechnologiesLeBlanc, Robert-Lee Daniel 01 July 2014 (has links) (PDF)
The networks in traditional data centers have remained unchanged for decades and have grown large, complex and costly. Many data centers have a general purpose Ethernet network and one or more additional specialized networks for storage or high performance low latency applications. Network convergence promises to lower the cost and complexity of the data center network by virtualizing the different networks onto a single wire. There is little evidence, aside from vendors' claims, that validate network convergence actually achieves these goals. This work defines a framework for creating a series of unbiased tests to validate converged technologies and compare them to traditional configurations. A case study involving two different network converged technologies was developed to validate the defined methodology and framework. The study also shows that these two technologies do indeed perform similarly to non-virtualized network, reduce costs, cabling, power consumption and are easy to operate.
|
56 |
High Data Rate Signal Processing Architectures and Compilation Strategies for Scalable, Multi-Gigabit Digital SystemsNybo, Daniel Alexander 12 April 2024 (has links) (PDF)
In this study we present a high-performance computing architecture and hardware acceleration strategy for a heterogeneous multi-gigabit computing system. The system architecture integrates a BeeGFS distributed file system, capable of achieving 80 Gbps of sustained write throughput across five nodes, essential for managing the high data volumes generated by a 25 high performance computer (HPC) compute cluster. To ensure operational efficiency and scalability, the tasks performed on the Linux compute cluster consisting of 30 nodes are automated using Ansible, facilitating seamless deployment, management, and updates. We present compilation strategies for a hardware accelerated Polyphase Filter Bank (PFB) channelization routine optimized for Xilinx Ultrascale+ FPGAs, capable of simultaneously processing 2048 channels per 12 input streams. This setup shows the efficiency of High Level Sysnthesis of FPGA-based signal processing in handling demanding data analysis tasks. We also present the implementation and verification of a 1.6 Gsps Direct Memory Access (DMA) transfer from DDR4 memory to a modern Radio Frequency System on Chip (RFSoC) digital to analog converter. The combination of a high-throughput file system, streamlined automation, and advanced signal processing capabilities shows these system's ability to meet the needs of complex, real-time data analysis and processing applications, advancing the field of computational research.
|
57 |
Designing High-Performance And Scalable Clustered Network Attached Storage With InfinibandNoronha, Ranjit Mario 12 September 2008 (has links)
No description available.
|
58 |
Prédiction de performances d'applications de calcul haute performance sur réseau InfinibandVienne, Jérôme 01 July 2010 (has links) (PDF)
Afin de pouvoir répondre au mieux aux différents appels d'offres, les constructeurs de grappe de calcul ont besoin d'outils et de méthodes permettant d'aider au mieux la prise de décisions en terme de design architectural. Nos travaux se sont donc intéressés à l'estimation des temps de calcul et à l'étude de la congestion sur le réseau InfiniBand. Ces deux problèmes sont souvent abordés de manière globale. Néanmoins, une approche globale ne permet pas de comprendre les raisons des pertes de performance liées aux choix architecturaux. Notre approche s'est donc orientée vers une étude plus fine. Pour évaluer les temps de calcul, la démarche proposée s'appuie sur une analyse statique ou semistatique du code source afin de le découper en blocs, avant d'effectuer un micro-benchmarking de ces blocs sur l'architecture cible. Pour l'estimation des temps de communication, un modèle de répartition de bande passante pour le réseau InfiniBand a été développé, permettant ainsi de prédire l'impact lié aux communications concurrentes. Ce modèle a ensuite été intégré dans un simulateur pour être validé sur un ensemble de graphes de communication synthétiques et sur l'application Socorro.
|
59 |
Evaluation of publicly available Barrier-Algorithms and Improvement of the Barrier-Operation for large-scale Cluster-Systems with special Attention on InfiniBand NetworksHoefler, Torsten 01 April 2005 (has links)
The MPI_Barrier-collective operation, as a part of the MPI-1.1
standard, is extremely important for all parallel applications using it.
The latency of this operation increases the application run time and
can not be overlaid. Thus, the whole MPI performance can be decreased
by unsatisfactory barrier latency. The main goals of this work are to
lower the barrier latency for InfiniBand networks by analyzing well
known barrier algorithms with regards to their suitability within
InfiniBand networks, to enhance the barrier operation by utilizing
standard InfiniBand operations as much as possible, and to design a
constant time barrier for InfiniBand with special hardware support.
This partition into three main steps is retained throughout the whole
thesis. The first part evaluates publicly known models and proposes a
new more accurate model (LoP) for InfiniBand. All barrier algorithms are
evaluated within the well known LogP and this new model. Two new
algorithms which promise a better performance have been developed. A
constant time barrier integrated into InfiniBand as well as a cheap
separate barrier network is proposed in the hardware section. All
results have been implemented inside the Open MPI framework. This work
led to three new Open MPI collective modules. The first one implements
different barrier algorithms which are dynamically benchmarked and
selected during the startup phase to maximize the performance. The
second one offers a special barrier implementation for InfiniBand with RDMA
and performs up to 40% better than the best solution that has been
published so far. The third implementation offers a constant time
barrier in a separate network, leveraging commodity components, with a
latency of only 2.5 microseconds. All components have their specialty and can
be used to enhance the barrier performance significantly.
|
60 |
Topology-Aware MPI Communication and Scheduling for High Performance Computing SystemsSubramoni, Hari 02 October 2013 (has links)
No description available.
|
Page generated in 0.0454 seconds