Spelling suggestions: "subject:"highperformance"" "subject:"highperformance""
701 |
Proposta e avaliação de desempenho de um algoritmo de balanceamento de carga para ambientes distribuídos heterogêneos escaláveis / Proposal and performance evaluation of a load balancing algorithm for heterogeneous scalable distributed environmentsRodrigo Fernandes de Mello 27 November 2003 (has links)
Algoritmos de balanceamento de carga são utilizados em sistemas distribuídos para homogeneizar a ocupação dos recursos computacionais disponíveis. A homogeneidade na ocupação do ambiente permite otimizar a alocação de recursos e, conseqüentemente, aumentar o desempenho das aplicações. Com o advento dos sistemas distribuídos de alta escala, fazem-se necessárias pesquisas para a construção de algoritmos de balanceamento de carga que sejam capazes de gerir com eficiência esses sistemas. Essa eficiência é medida através do número de mensagens geradas no ambiente, do suporte a ambientes heterogêneos, do uso de políticas que consomem poucos recursos do sistema, da estabilidade em alta carga, da escalabilidade do sistema e dos baixos tempos de resposta. Com o objetivo de atender as necessidades dos sistemas distribuídos de alta escala, este doutorado propõe, apresenta e avalia um novo algoritmo de balanceamento de carga denominado TLBA (Tree Load Balancing Algorithm). Esse algoritmo organiza os computadores do sistema em uma topologia lógica na forma de árvore, sobre a qual são executadas operações de balanceamento de carga. Para validar o TLBA foi construído um simulador que, submetido a testes, permitiu comprovar suas contribuições, que incluem: o baixo número de mensagens geradas pelas operações de balanceamento de carga; a estabilidade em altas cargas; os baixos tempos médios de resposta de processos. Para validar os resultados de simulação, foi construído um protótipo do TLBA. Esse protótipo confirmou os resultados de simulação e, conseqüentemente, as contribuições do algoritmo. / Load balancing algorithms are applied in distributed systems to homogenize the occupation of the available computational resources. The homogeneity of the environment occupation allows optimising the resource allocation and consequently, increasing the application performance. With the advent of the large-scale distributed systems, it was necessary to start researching the construction of load balancing algorithms which are able to manage these systems with efficiency. This efficiency is measured through the number of messages generated on the environment; the support to heterogeneous environments and the load balance policies which should spend the minimal resources time; the stability in overloaded situations; the system scalability; and the processes average response times, that should be small. With the aim to achieve the large-scale distributed systems requirements, this Ph.D. proposes, presents and evaluates a new load balancing algorithm named TLBA (Tree Load Balancing Algorithm). This algorithm arranges the computers on a logical network topology with a tree format. The load balancing operations are executed over this tree. To evaluate the TLBA algorithm, a simulator was built that was submitted to tests that confirmed the following characteristics: the small number of messages generated by the load balancing operations; the stability in overloaded situations; the small average processes response times. To validate the simulation results a TLBA prototype was implemented. This prototype confirmed the simulation results and consequently the contributions of the proposed algorithm.
|
702 |
Avaliação do impacto da comunicação intra e entre-nós em nuvens computacionais para aplicações de alto desempenho / Evaluation of impact from inter and intra-node communication in cloud computing for HPC applicationsThiago Kenji Okada 07 November 2016 (has links)
Com o advento da computação em nuvem, não é mais necessário ao usuário investir grandes quantidades de recursos financeiros em equipamentos computacionais. Ao invés disto, é possível adquirir recursos de processamento, armazenamento ou mesmo sistemas completos por demanda, usando um dos diversos serviços disponibilizados por provedores de nuvem como a Amazon, o Google, a Microsoft, e a própria USP. Isso permite um controle maior dos gastos operacionais, reduzindo custos em diversos casos. Por exemplo, usuários de computação de alto desempenho podem se beneficiar desse modelo usando um grande número de recursos durante curtos períodos de tempo, ao invés de adquirir um aglomerado computacional de alto custo inicial. Nosso trabalho analisa a viabilidade de execução de aplicações de alto desempenho, comparando o desempenho de aplicações de alto desempenho em infraestruturas com comportamento conhecido com a nuvem pública oferecida pelo Google. Em especial, focamos em diferentes configurações de paralelismo com comunicação interna entre processos no mesmo nó, chamado de intra-nós, e comunicação externa entre processos em diferentes nós, chamado de entre-nós. Nosso caso de estudo para esse trabalho foi o NAS Parallel Benchmarks, um benchmark bastante popular para a análise de desempenho de sistemas paralelos e de alto desempenho. Utilizamos aplicações com implementações puramente MPI (para as comunicações intra e entre-nós) e implementações mistas onde as comunicações internas foram feitas utilizando OpenMP (comunicação intra-nós) e as comunicações externas foram feitas usando o MPI (comunicação entre-nós). / With the advent of cloud computing, it is no longer necessary to invest large amounts of money on computing resources. Instead, it is possible to obtain processing or storage resources, and even complete systems, on demand, using one of the several available services from cloud providers like Amazon, Google, Microsoft, and USP. Cloud computing allows greater control of operating expenses, reducing costs in many cases. For example, high-performance computing users can benefit from this model using a large number of resources for short periods of time, instead of acquiring a computer cluster with high initial cost. Our study examines the feasibility of running high-performance applications, comparing the performance of high-performance applications in a known infrastructure compared to the public cloud offering from Google. In particular, we focus on various parallel configurations with internal communication between processes on the same node, called intra-node, and external communication between processes on different nodes, called inter-nodes. Our case study for this work was the NAS Parallel Benchmarks, a popular benchmark for performance analysis of parallel systems and high performance computing. We tested applications with MPI-only implementations (for intra and inter-node communications) and mixed implementations where internal communications were made using OpenMP (intra-node communications) and external communications were made using the MPI (inter-node communications).
|
703 |
Designing a Modern Skeleton Programming Framework for Parallel and Heterogeneous SystemsErnstsson, August January 2020 (has links)
Today's society is increasingly software-driven and dependent on powerful computer technology. Therefore it is important that advancements in the low-level processor hardware are made available for exploitation by a growing number of programmers of differing skill level. However, as we are approaching the end of Moore's law, hardware designers are finding new and increasingly complex ways to increase the accessible processor performance. It is getting more and more difficult to effectively target these processing resources without expert knowledge in parallelization, heterogeneous computation, communication, synchronization, and so on. To ensure that the software side can keep up, advanced programming environments and frameworks are needed to bridge the widening gap between hardware and software. One such example is the pattern-centric skeleton programming model and in particular the SkePU project. The work presented in this thesis first redesigns the SkePU framework based on modern C++ variadic template metaprogramming and state-of-the-art compiler technology. It then explores new ways to improve performance: by providing new patterns, improving the data access locality of existing ones, and using both static and dynamic knowledge about program flow. The work combines novel ideas with practical evaluation of the approach on several applications. The advancements also include the first skeleton API that allows variadic skeletons, new data containers, and finally an approach to make skeleton programming more customizable without compromising universal portability. / <p>Ytterligare forskningsfinansiärer: EU H2020 project EXA2PRO (801015); SeRC.</p>
|
704 |
Biaxial Behavior of Ultra-High Performance Concrete and Untreated UHPC Waffle Slab Bridge Deck Design and TestingD'Alessandro, Kacie Caple 28 August 2013 (has links)
Ultra-high performance concrete (UHPC) was evaluated as a potential material for future bridge deck designs. Material characterization tests took place to identify potential challenges in mixing, placing, and curing UHPC. Biaxial testing was performed to evaluate behavior of UHPC in combined tension and compression stress states. A UHPC bridge deck was designed to perform similarly to a conventional concrete bridge deck, and a single unit bridge deck section was tested to evaluate the design methods used for untreated UHPC.
Material tests identified challenges with placing UHPC. A specified compressive strength was determined for structural design using untreated UHPC, which was identified as a cost-effective alternative to steam treated UHPC.
UHPC was tested in biaxial tension-compression stress states. A biaxial test method was developed for UHPC to directly apply tension and compression. The influence of both curing method and fiber orientation were evaluated. The failure envelope developed for untreated UHPC with random fiber orientation was suggested as a conservative estimate for future analysis of UHPC. Digital image correlation was also evaluated as a means to estimate surface strains of UHPC, and recommendations are provided to improve consistency in future tests using DIC methods.
A preliminary bridge deck design was completed for untreated UHPC and using established material models. Prestressing steel was used as primary reinforcement in the transverse direction. Preliminary testing was used to evaluate three different placement scenarios, and results showed that fiber settling was a potential placement problem resulting in reduced tensile strength. The UHPC bridge deck was redesigned to incorporate preliminary test results, and two single unit bridge deck sections were tested to evaluate the incorporated design methods for both upside down and right-side up placement techniques. Test results showed that the applied design methods would be conservative for either placement method. / Ph. D.
|
705 |
Algorithmes d'étiquetage en composantes connexes efficaces pour architectures hautes performances / Efficient Connected Component Labeling Algorithms for High Performance ArchitecturesCabaret, Laurent 28 September 2016 (has links)
Ces travaux de thèse, dans le domaine de l'adéquation algorithme architecture pour la vision par ordinateur, ont pour cadre l'étiquetage en composantes connexes (ECC) dans le contexte parallèle des architectures hautes performances. Alors que les architectures généralistes modernes sont multi-coeur, les algorithmes d'ECC sont majoritairement séquentiels, irréguliers et utilisent une structure de graphe pour représenter les relations d'équivalences entre étiquettes ce qui rend complexe leur parallélisation. L'ECC permet à partir d'une image binaire, de regrouper sous une même étiquette tous les pixels connexes, il fait ainsi le pont entre les traitements bas niveaux tels que le filtrage et ceux de haut niveau tels que la reconnaissance de forme ou la prise de décision. Il est donc impliqué dans un grand nombre de chaînes de traitements qui nécessitent l'analyse d'image segmentées. L'accélération de cette étape représente donc un enjeu pour tout un ensemble d'algorithmes.Les travaux de thèse se sont tout d'abord concentrés sur les performances comparées des algorithmes de l'état de l'art tant pour l'ECC que pour l'analyse des caractéristiques des composantes connexes (ACC) afin d'en dégager une hiérarchie et d’identifier les composantes déterminantes des algorithmes. Pour cela, une méthode d'évaluation des performances, reproductible et indépendante du domaine applicatif, a été proposée et appliquée à un ensemble représentatif des algorithmes de l'état de l'art. Les résultats montrent que l'algorithme séquentiel le plus rapide est l'algorithme LSL qui manipule des segments contrairement aux autres algorithmes qui manipulent des pixels.Dans un deuxième temps, une méthode de parallélisation des algorithmes directs utilisant OpenMP a été proposé avec pour objectif principal de réaliser l’ACC à la volée et de diminuer le coût de la communication entre les threads. Pour cela, l'image binaire est découpée en bandes traitées en parallèle sur chaque coeur du l'architecture, puis une étape de fusion pyramidale d'ensembles deux à deux disjoint d'étiquettes permet d'obtenir l'image complètement étiquetée sans avoir de concurrence d'accès aux données entre les différents threads. La procédure d'évaluation des performances appliquée a des machines de degré de parallélisme variés, a démontré que la méthode de parallélisation proposée était efficace et qu'elle s'appliquait à tous les algorithmes directs. L'algorithme LSL s'est encore avéré être le plus rapide et le seul adapté à l'augmentation du nombre de coeurs du fait de son approche «segments». Pour une architecture à 60 coeurs, l'algorithme LSL permet de traiter de 42,4 milliards de pixels par seconde pour des images de taille 8192x8192, tandis que le plus rapide des algorithmes pixels est limité par la bande passante et sature à 5,8 milliards de pixels par seconde.Après ces travaux, notre attention s'est portée sur les algorithmes d'ECC itératifs dans le but de développer des algorithmes pour les architectures manycore et GPU. Les algorithmes itératifs se basant sur un mécanisme de propagation des étiquettes de proche en proche, aucune autre structure que l'image n'est nécessaire ce qui permet d'en réaliser une implémentation massivement parallèle (MPAR). Ces travaux ont menés à la création de deux nouveaux algorithmes.- Une amélioration incrémentale de MPAR utilisant un ensemble de mécanismes tels qu'un balayage alternatif, l'utilisation d'instructions SIMD ainsi qu'un mécanisme de tuiles actives permettant de répartir la charge entre les différents coeurs tout en limitant le traitement des pixels aux zones actives de l'image et à leurs voisines.- Un algorithme mettant en œuvre la relation d’équivalence directement dans l’image pour réduire le nombre d'itérations nécessaires à l'étiquetage. Une implémentation pour GPU basée sur les instructions atomic avec un pré-étiquetage en mémoire locale a été réalisée et s'est révélée efficace dès les images de petite taille. / This PHD work take place in the field of algorithm-architecture matching for computer vision, specifically for the connected component labeling (CCL) for high performance parallel architectures.While modern architectures are overwhelmingly multi-core, CCL algorithms are mostly sequential, irregular and they use a graph structure to represent the equivalences between labels. This aspects make their parallelization challenging.CCL processes a binary image and gathers under the same label all the connected pixels, doing so CCL is a bridge between low level operations like filtering and high level ones like shape recognition and decision-making.It is involved in a large number of processing chains that require segmented image analysis. The acceleration of this step is therefore an issue for a variety of algorithms.At first, the PHD work focused on the comparative performance of the State-of-the-Art algorithms, as for CCL than for the features analysis of the connected components (CCA) in order to identify a hierarchy and the critical components of the algorithms. For this, a benchmarking method, reproducible and independent of the application domain was proposed and applied to a representative set of State-of-the-Art algorithms. The results show that the fastest sequential algorithm is the LSL algorithm which manipulates segments unlike other algorithms that manipulate pixels.Secondly, a parallelization framework of directs algorithms based on OpenMP was proposed with the main objective to compute the CCA on the fly and reduce the cost of communication between threads.For this, the binary image is divided into bands processed in parallel on each core of the architecture and a pyramidal fusion step that processes the generated disjoint sets of labels provides the fully labeled image without concurrent access to data between threads.The benchmarking procedure applied to several machines of various parallelism level, shows that the proposed parallelization framework applies to all the direct algorithms.The LSL algorithm is once again the fastest and the only one suitable when the number of cores increases due to its run-based conception. With an architecture of 60 cores, the LSL algorithm can process 42.4 billion pixels per second for images of 8192x8192 pixels, while the fastest pixel-based algorithm is limited by the bandwidth and saturates at 5.8 billion pixels per second.After these works, our attention focused on iterative CCL algorithms in order to develop new algorithms for many-core and GPU architectures. The Iterative algorithms are based on a local propagation mechanism without supplementary equivalence structure which allows to achieve a massively parallel implementation (MPAR). This work led to the creation of two new algorithms.- An incremental improvement of MPAR using a set of mechanisms such as an alternative scanning, the use of SIMD instructions and an active tile mechanism to distribute the load between the different cores while limiting the processing of the pixels to the active areas of the image and to their neighbors.- An algorithm that implements the equivalence relation directly into the image to reduce the number of iterations required for labeling. An implementation for GPU, based on atomic instructions with a pre-labeling in the local memory has been realized and it has proven effective from the small images.
|
706 |
Extraction and Purification of Biologically Active Metabolites from Rhodococcus sp. MTM3W5.2Alenazi, Mohrah 01 December 2018 (has links)
Rhodococcushas been recognized as a potential antibiotic producer. Recently, a strain of Rhodococcussp. MTM3W5.2 was isolated from a soil sample collected in Morristown, Tennessee and was found to produce an inhibitory compound which is active against other related species. The purpose of this research is to extract, purify and analyze the active metabolite. The compound was extracted from RM broth cultures and purified by preliminary fractionation of crude extract through a Sephadex LH-20 column. Further purification was completed using semi-preparative reversed phase column chromatography. Final purification was obtained using multiple rounds of an analytical C18 HPLC column. Based on the results achieved in the UV-Vis spectroscopy and high-resolution mass spectroscopy, the two desired compounds at a retention time of at 57 and 72 min could be polyketides with the molecular formulas C52H78O13 and C19H32O1N1/C13H34O1N1, respectively.
|
707 |
Critical Velocity of High-Performance Yarn Transversely Impacted by Different IndentersBoon Him Lim (6504827) 15 May 2019 (has links)
Critical velocity is defined as projectile striking velocity that causes instantaneous rupture of the specimen under transverse impact. The main goal of this dissertation was to determine the critical velocities of a Twaron<sup>®</sup> 2040 warp yarn impacted by different round indenters. Special attention was placed to develop models to predict the critical velocities when transversely impacted by the indenters. An MTS 810 load frame was utilized to perform quasi-static transverse and uniaxial tension experiments to examine the stress concentration and the constitutive mechanical properties of the yarn which were used as an input to the models. A gas/powder gun was utilized to perform ballistic experiments to evaluate the critical velocities of a Twaron<sup>®</sup> 2040 warp yarn impacted by four different type of round projectiles. These projectiles possessed a radius of curvature of 2 μm, 20 μm, 200 μm and 2 mm. The results showed that as the projectile radius of curvature increased, the critical velocity also increased. However, these experimental critical velocities showed a demonstrated reduction as compared to the classical theory. Post-mortem analysis via scanning electron microscopy on the recovered specimens revealed that the fibers failure surfaces changed from shear to fibrillation as the radius of curvature of the projectile increased. To improve the prediction capability, two additional models, Euler-Bernoulli beam and Hertzian contact, were developed to predict the critical velocity. For the Euler–Bernoulli beam model, the critical velocity was obtained by assuming the specimen ruptured instantaneously when the maximum flexural strain reached the ultimate tensile strain of the yarn upon impact. On the other hand, for the Hertzian contact model, the yarn was assumed to fail when the indentation depth was equivalent to the diameter of the yarn. Unlike Smith theory, the Euler-Bernoulli beam model underestimated the critical velocity for all cases. The Hertzian model was capable of predicting the critical velocities of a Twaron<sup>®</sup> 2040 yarn transversely impacted by 2 μm and 20 μm round projectiles.
|
708 |
Isoporous Block Copolymer Membranes: Novel Modification Routes and Selected ApplicationsShevate, Rahul 11 1900 (has links)
The primary aim of this work is to explore the potential applications of isoporous block copolymer membranes. Block copolymers (BCPs) have demonstrated their versatility in the formation of isoporous membranes. However, application spectrum of these isoporous membranes can be further broadened by exploring the technical aspects, such as desired surface chemistry, well-defined pore size, appropriate pore density, stimuli responsive behavior, and by imparting desired functionalities through chemical modifications. We believe, by exploring these possibilities, isoporous membranes hold tremendous potential as high performance next generation separation membranes. Motivated by these attractive prospects we systematically investigated novel routes for modification of isoporous membranes and their implications on properties and performance of the membranes for various applications.
In this work, polystyrene-block-poly(4-vinyl pyridine) (PS-b-P4VP) has been selected to fabricate isoporous membranes using non-solvent induced phase separation (NIPS). We selected PS-b-P4VP since its well-defined isoporous morphology is studied in detail and it is extensively characterized. In order to further widen the application bandwidth of BCP membranes, it is desirable to integrate different functionalities in the BCP architecture through a straightforward approach like post-membrane-modification or fabrication of composite membranes to impart anticipated functionalities. The most critical challenge in this approach is to retain the well-defined nanoporous morphology of BCP membranes.
We focused on exploring new routes for chemical functionalization of isoporous PS-b-P4VP membranes via various in-situ and post-membrane fabrication approaches. To date, most of the work reported in the literature on PS-b-P4VP presented different routes to fabricate isoporous membranes and their conventional performance in liquid separations. Few efforts have been dedicated to alter the chemistry of PS-b-P4VP membranes by tuning the reactivity of the chemically active P4VP block or the surface chemistry to enhance the membrane performance for desired applications. During the Ph.D. study, we primarily focused on: (i) post modification approach, (ii) surface modification and (iii) in-situ membrane modification approach for fabrication of the mixed-matrix nanoporous membranes without altering the isoporous morphology of the membrane. The membranes fabricated using the mentioned above routes were tested for different applications like stimuli-responsive separations, self-cleaning membranes, protein separations and high-performance humidity sensors.
|
709 |
Design of an Optimized Supervisor Module for Tomographic Adaptive Optics Systems of Extremely Large TelescopesDoucet, Nicolas 08 January 2020 (has links)
The recent advent of next generation ground-based telescopes, code-named Extremely Large Telescopes (ELT), highlights the beginning of a forced march toward an era of deploying instruments capable of exploiting starlight captured by mirrors at an unprecedented scale. This confronts the astronomy community with both a daunting challenge and a unique opportunity. The challenge arises from the mismatch between the complexity of current instruments and their expected scaling with the square of the future telescope diameters, on which astronomy applications have relied to produce better science. To deliver on the promise of tomorrow’s ELT, astronomers must design new technologies that can effectively enhance the performance of the instrument at scale, while compensating for the atmospheric turbulence in real-time. This is an unsolved problem. This problem presents an opportunity because the astronomy community is now compelled to rethink essential components of the optical systems and their traditional hardware/software ecosystems in order to achieve high optical performance with a near real-time computational response. In order to realize the full potential of such instruments, we investigate a technique supporting Adaptive Optics (AO), i.e., a dedicated concept relying on turbulence tomography. In particular, a critical part of AO systems is the supervisor module, which is responsible for providing the system with a Tomographic Reconstructor (ToR) at a regular pace, as the atmospheric turbulence evolves over an observation window. In this thesis, we implement an optimized supervisor module and assess it under real configurations of the future European ELT (E-ELT) with a 39m diameter, the largest and most complex optical telescope ever conceived. This necessitates manipulating large matrix sizes (i.e., up to 100k × 100k) that contain measurements captured by multiple wavefront sensors. To address the complexity bottleneck, we employ high performance computing software solutions based on cutting-edge numerical algorithms using asynchronous, fine-grained computations as well as approximations techniques that leverage the resulting matrix data structure. Furthermore, GPU-based hardware accelerators are used in conjunction with the software solutions to ensure reasonable time-to-solution to cope with rapidly evolving atmospheric turbulence. The proposed software/hardware solution permits to reconstruct an image with high accuracy. We demonstrate the validity of the AO systems with a third-party testbed simulating at the E-ELT scale, which is intended to pave the way for a first prototype installed on-site
|
710 |
Účinnost odstranění vybraných léčiv z vody různými sorpčními materiály / Removal efficiency of selected drugs by various sorptive materials from waterŠtofko, Jakub January 2019 (has links)
This thesis deals with sorption of selected drugs from model water by various sorption materials. Contamination of water resources by the pharmaceutical industry is a major problem today. Wastewater treatment plants, whose technological processes are unable to completely remove them, have a significant share in the penetration of these substances into the environment. At present, attention is paid to alternative materials that are capable of eliminating these substances. One of the potential sorption materials is biochar as one of the main pyrolysis products. This work focused on the assessment of the sorption properties of the different types of biochar and commercially used active charcoal. The sorption properties of the individual materials were compared with respect to the non-steroidal anti-inflammatory substance ibuprofen and the sulphonamide antibiotic sulfamethoxazole. The results of vial experiments were analysed on a liquid chromatograph with mass detection.
|
Page generated in 0.0803 seconds