Global ETD Search

101	Efficient Implementation of 3D Finite Difference Schemes on Recent Processor Architectures / Effektiv implementering av finita differensmetoder i 3D på senaste processorarkitekturer Ceder, Frederick January 2015 (has links) Efficient Implementation of 3D Finite Difference Schemes on Recent Processors Abstract In this paper a solver is introduced that solves a problem set modelled by the Burgers equation using the finite difference method: forward in time and central in space. The solver is parallelized and optimized for Intel Xeon Phi 7120P as well as Intel Xeon E5-2699v3 processors to investigate differences in terms of performance between the two architectures. Optimized data access and layout have been implemented to ensure good cache utilization. Loop tiling strategies are used to adjust data access with respect to the L2 cache size. Compiler hints describing aligned memory access are used to support vectorization on both processors. Additionally, prefetching strategies and streaming stores have been evaluated for the Intel Xeon Phi. Parallelization was done using OpenMP and MPI. The parallelisation for native execution on Xeon Phi is based on OpenMP and yielded a raw performance of nearly 100 GFLOP/s, reaching a speedup of almost 50 at a 83\% parallel efficiency. An OpenMP implementation on the E5-2699v3 (Haswell) processors produced up to 292 GFLOP/s, reaching a speedup of almost 31 at a 85\% parallel efficiency. For comparison a mixed implementation using interleaved communications with computations reached 267 GFLOP/s at a speedup of 28 with a 87\% parallel efficiency. Running a pure MPI implementation on the PDC's Beskow supercomputer with 16 nodes yielded a total performance of 1450 GFLOP/s and for a larger problem set it yielded a total of 2325 GFLOP/s, reaching a speedup and parallel efficiency at resp. 170 and 33,3\% and 290 and 56\%. An analysis based on the roofline performance model shows that the computations were memory bound to the L2 cache bandwidth, suggesting good L2 cache utilization for both the Haswell and the Xeon Phi's architectures. Xeon Phi performance can probably be improved by also using MPI. Keeping technological progress for computational cores in the Haswell processor in mind for the comparison, both processors perform well. Improving the stencil computations to a more compiler friendly form might improve performance more, as the compiler can possibly optimize more for the target platform. The experiments on the Cray system Beskow showed an increased efficiency from 33,3\% to 56\% for the larger problem, illustrating good weak scaling. This suggests that problem sizes should increase accordingly for larger number of nodes in order to achieve high efficiency. Frederick Ceder / Effektiv implementering av finita differensmetoder i 3D på moderna processorarkitekturer Sammanfattning Denna uppsats diskuterar implementationen av ett program som kan lösa problem modellerade efter Burgers ekvation numeriskt. Programmet är byggt ifrån grunden och använder sig av finita differensmetoder och applicerar FTCS metoden (Forward in Time Central in Space). Implementationen paralleliseras och optimeras på Intel Xeon Phi 7120P Coprocessor och Intel Xeon E5-2699v3 processorn för att undersöka skillnader i prestanda mellan de två modellerna. Vi optimerade programmet med omtanke på dataåtkomst och minneslayout för att få bra cacheutnyttjande. Loopblockningsstrategier används också för att dela upp arbetsminnet i mindre delar för att begränsa delarna i L2 cacheminnet. För att utnyttja vektorisering till fullo så används kompilatordirektiv som beskriver minnesåtkomsten, vilket ska hjälpa kompilatorn att förstå vilka dataaccesser som är alignade. Vi implementerade också prefetching strategier och streaming stores på Xeon Phi och disskuterar deras värde. Paralleliseringen gjordes med OpenMP och MPI. Parallelliseringen för Xeon Phi:en är baserad på bara OpenMP och exekverades direkt på chipet. Detta gav en rå prestanda på nästan 100 GFLOP/s och nådde en speedup på 50 med en 83% effektivitet. En OpenMP implementation på E5-2699v3 (Haswell) processorn fick upp till 292 GFLOP/s och nådde en speedup på 31 med en effektivitet på 85%. I jämnförelse fick en hybrid implementation 267 GFLOP/s och nådde en speedup på 28 med en effektivitet på 87%. En ren MPI implementation på PDC's Beskow superdator med 16 noder gav en total prestanda på 1450 GFLOP/s och för en större problemställning gav det totalt 2325 GFLOP/s, med speedup och effektivitet på respektive 170 och 33% och 290 och 56%. En analys baserad på roofline modellen visade att beräkningarna var minnesbudna till L2 cache bandbredden, vilket tyder på bra L2-cache användning för både Haswell och Xeon Phi:s arkitekturer. Xeon Phis prestanda kan förmodligen förbättras genom att även använda MPI. Håller man i åtanke de tekniska framstegen när det gäller beräkningskärnor på de senaste åren, så preseterar både arkitekturer bra. Beräkningskärnan av implementationen kan förmodligen anpassas till en mer kompilatorvänlig variant, vilket eventuellt kan leda till mer optimeringar av kompilatorn för respektive plattform. Experimenten på Cray-systemet Beskow visade en ökad effektivitet från 33,3% till 56% för större problemställningar, vilket visar tecken på bra weak scaling. Detta tyder på att effektivitet kan uppehållas om problemställningen växer med fler antal beräkningsnoder. Frederick Ceder hpc high performance computing intel haswell xeon phi knights corner burgers equation finite difference methods fdm ftcs parallel programming vectorization simd Computer Sciences Datavetenskap (datalogi)
102	Supporting Applications Involving Dynamic Data Structures and Irregular Memory Access on Emerging Parallel Platforms Ren, Bin 09 September 2014 (has links) No description available. Computer Science Irregular Data Structure Fine Grained Parallelism SIMD MIMD SSE GPUs Xeon Phi Static Analysis Runtime Analysis Offloading Python Redundancy Elimination
103	FORECASTER WORKLOAD AND TASK ANALYSIS IN THE 2016 PROBABILISTIC HAZARD INFORMATION SYSTEM HAZARDOUS WEATHER TESTBED James, Joseph J. 14 September 2018 (has links) No description available. Systems Design Industrial Engineering Mechanical Engineering
104	Residue Associations In Protein Family Alignments Ozer, Hatice Gulcin 24 June 2008 (has links) No description available. Bioinformatics Biophysics Family Alignment Positional Dependency Amino Acid Correlation Residue Correlation Residue Association Protein Sequence Protein Structure Pfam database Bioinformatics Fisher Exact test Phi coefficient
105	Parallélisation de simulations interactives de champs ultrasonores pour le contrôle non destructif / Parallelization of ultrasonic field simulations for non destructive testing Lambert, Jason 03 July 2015 (has links) La simulation est de plus en plus utilisée dans le domaine industriel du Contrôle Non Destructif. Elle est employée tout au long du processus de contrôle, que ce soit pour en accélérer la mise au point ou en comprendre les résultats. Les travaux menés au cours de cette thèse présentent une méthode de calcul rapide de champ ultrasonore rayonné par un capteur multi-éléments dans une pièce isotrope, permettant un usage interactif des simulations. Afin de tirer parti des architectures parallèles communément disponibles, un modèle régulier (qui limite au maximum les branchements divergents) dérivé du modèle générique présent dans la plateforme logicielle CIVA a été mis au point. Une première implémentation de référence a permis de le valider par rapport aux résultats CIVA et d'analyser son comportement en termes de performances. Le code a ensuite été porté et optimisé sur trois classes d'architectures parallèles aujourd'hui disponibles dans les stations de calcul : le processeur généraliste central (GPP), le coprocesseur manycore (Intel MIC) et la carte graphique (nVidia GPU). Concernant le processeur généraliste et le coprocesseur manycore, l'algorithme a été réorganisé et le code implémenté afin de tirer parti des deux niveaux de parallélisme disponibles, le multithreading et les instructions vectorielles. Sur la carte graphique, les différentes étapes de simulation de champ ont été découpées en une série de noyaux CUDA. Enfin, des bibliothèques de calculs spécifiques à ces architectures, Intel MKL et nVidia cuFFT, ont été utilisées pour effectuer les opérations de Transformées de Fourier Rapides. Les performances et la bonne adéquation des codes produits ont été analysées en détail pour chaque architecture. Dans plusieurs cas, sur des configurations de contrôle réalistes, des performances autorisant l'interactivité ont été atteintes. Des perspectives pour traiter des configurations plus complexes sont dressées. Enfin la problématique de l'industrialisation de ce type de code dans la plateforme logicielle CIVA est étudiée. / The Non Destructive Testing field increasingly uses simulation.It is used at every step of the whole control process of an industrial part, from speeding up control development to helping experts understand results. During this thesis, a simulation tool dedicated to the fast computation of an ultrasonic field radiated by a phase array probe in an isotropic specimen has been developped. Its performance enables an interactive usage. To benefit from the commonly available parallel architectures, a regular model (aimed at removing divergent branching) derived from the generic CIVA model has been developped. First, a reference implementation was developped to validate this model against CIVA results, and to analyze its performance behaviour before optimization. The resulting code has been optimized for three kinds of parallel architectures commonly available in workstations: general purpose processors (GPP), manycore coprocessors (Intel MIC) and graphics processing units (nVidia GPU). On the GPP and the MIC, the algorithm was reorganized and implemented to benefit from both parallelism levels, multhreading and vector instructions. On the GPU, the multiple steps of field computing have been divided in multiple successive CUDA kernels.Moreover, libraries dedicated to each architecture were used to speedup Fast Fourier Transforms, Intel MKL on GPP and MIC and nVidia cuFFT on GPU. Performance and hardware adequation of the produced algorithms were thoroughly studied for each architecture. On multiple realistic control configurations, interactive performance was reached. Perspectives to adress more complex configurations were drawn. Finally, the integration and the industrialization of this code in the commercial NDT plateform CIVA is discussed. Contrôle non destructif Programmation parallèle Simulation de champ ultrasonore Processeurs généralistes multicoeurs Processeurs graphiques GPGPU SIMD Parallélisme (informatique) Xeon Phi CUDA Manycore Non destructive testing Parallel programming Ultrasonic field simulation Multicore general purpose processors Graphic processing units GPGPU SIMD Parallelism Xeon Phi CUDA Manycore
106	Die Na+/H+-Austauscher-abhängige pH-Regulation in Vorhof- und Ventrikelmyozyten / The Na+/H+-exchanger (NHE-1)-dependent pHi regulation in atrial and ventricular myocytes Yan, Hui 26 October 2011 (has links) No description available. 610 Medizin, Gesundheit Medicine Der Na+/H+-Austauscher (NHE) Vorhof- und Ventrikelmyozyten HOE642 (Cariporide) 5-Carboxy-SNARF®-1 Die NHE-1- the Na+/H+-exchanger (NHE) HOE642 (Cariporide) 5-Carboxy-SNARF®-1 the rate of H+ efflux at pH 6.9 (JpH6,9) expression 44.85 MED 411: Kardiologie
107	Escapando de predadores: múltiplas abordagens para a compreensão das decisões econômicas de fuga / Escaping from predators: multiple approaches to understanding the economic escape decisions Samia, Diogo Soares Menezes 17 August 2015 (has links) Submitted by Marlene Santos (marlene.bc.ufg@gmail.com) on 2016-06-15T18:33:41Z No. of bitstreams: 2 Tese - Diogo Soares Menezes Samia - 2015.pdf: 4976311 bytes, checksum: 317908c75deb4de1770cd68c544a5190 (MD5) license_rdf: 19874 bytes, checksum: 38cb62ef53e6f513db2fb7e337df6485 (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2016-06-28T11:59:17Z (GMT) No. of bitstreams: 2 Tese - Diogo Soares Menezes Samia - 2015.pdf: 4976311 bytes, checksum: 317908c75deb4de1770cd68c544a5190 (MD5) license_rdf: 19874 bytes, checksum: 38cb62ef53e6f513db2fb7e337df6485 (MD5) / Made available in DSpace on 2016-06-28T11:59:17Z (GMT). No. of bitstreams: 2 Tese - Diogo Soares Menezes Samia - 2015.pdf: 4976311 bytes, checksum: 317908c75deb4de1770cd68c544a5190 (MD5) license_rdf: 19874 bytes, checksum: 38cb62ef53e6f513db2fb7e337df6485 (MD5) Previous issue date: 2015-08-17 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / Optimal escape theory states that animals should counterbalance the costs and benefits of flight when escaping from a potential predator. However, in apparent contradiction with this well-established optimality model, birds and mammals generally initiate escape soon after beginning to monitor an approaching threat; a phenomena codified as the “Flush Early and Avoid the Rush” (FEAR) hypothesis. Typically, the FEAR hypothesis is tested using correlational statistics and is supported when there is a strong relationship between the distance at which an individual first responds behaviorally to an approaching predator (alert distance; AD), and its flight initiation distance (the distance at which it flees the approaching predator; FID). However, such correlational statistics are both inadequate to analyze relationships constrained by an envelope (such as that in the AD-FID relationship) and are sensitive to outliers with high leverage, which can lead one to erroneous conclusions. To overcome these statistical concerns we develop the phi index (Φ), a distribution-free metric to evaluate the goodness of fit of a 1:1 relationship in a constraint envelope (the prediction of the FEAR hypothesis). Using both simulation and empirical data, we conclude that Φ is superior to traditional correlational analyses because it explicitly tests the FEAR prediction, is robust to outliers, and it controls for the disproportionate influence of observations from large predictor values (caused by the constrained envelope in AD-FID relationship). Importantly, by analyzing the empirical data we corroborate the strong effect that alertness has on flight as stated by the FEAR hypothesis. / A teoria do escape ótimo afirma que os animais devem contrabalançar os custos e benefícios da fuga quando vão escapar de um predador. No entanto, em aparente contradição com este bem-estabelecido modelo ótimo, aves e mamíferos geralmente empreendem fuga logo após inicio do monitoramento do predador em potencial; um fenômeno denominado “Flush Early and Avoid the Rush hypothesis” (a hipótese FEAR). A hipótese FEAR é geralmente testada usando estatísticas correlativas e seu suporte se dá por uma forte relação positiva entre a distância na qual um indivíduo responde comportamentalmente a um predador que se aproxima (distância de alerta; alert distance, AD) e a distância do início da fuga (flight initiation distance, FID). No entanto, o uso de estatísticas correlativas para testar a hipótese FEAR pode levar a conclusões errôneas, já que estatísticas correlativas são inadequadas para análise de relações em envelope (tais como a relação entre AD e FID) e são sensíveis a valores discrepantes (outliers) com elevado efeito de alavancagem. Por isso, nós desenvolvemos o índice fi (Φ), uma métrica não-paramétrica destinada a avaliar a qualidade do ajuste de uma relação 1:1 restrita por um envelope (tal como a observada na hipótese FEAR). Usando simulações numéricas e dados empíricos, nós concluímos que Φ é uma métrica superior às análises de correlação tradicionais porque testa explicitamente a predição da hipótese FEAR, é robusta a outiliers, e ainda controla pela influência disproporcional dos altos de AD (causado pela relação em envelope entre AD e FID). Como predito pela hipótese FEAR, a análise dos dados empíricos corroborou o forte efeito que a distância de alerta tem sobre a decisão de fuga das presas. Distância de alerta Distância de detecção Distância do início da fuga Flush early and avoid the rush Índice phi Alert distance Detection distance Fear, metrics to quantify predation risk Flight initiation distance Flush early and avoid the rush Phi index CIENCIAS BIOLOGICAS::ECOLOGIA
108	Solving dense linear systems on accelerated multicore architectures / Résoudre des systèmes linéaires denses sur des architectures composées de processeurs multicœurs et d’accélerateurs Rémy, Adrien 08 July 2015 (has links) Dans cette thèse de doctorat, nous étudions des algorithmes et des implémentations pour accélérer la résolution de systèmes linéaires denses en utilisant des architectures composées de processeurs multicœurs et d'accélérateurs. Nous nous concentrons sur des méthodes basées sur la factorisation LU. Le développement de notre code s'est fait dans le contexte de la bibliothèque MAGMA. Tout d'abord nous étudions différents solveurs CPU/GPU hybrides basés sur la factorisation LU. Ceux-ci visent à réduire le surcoût de communication dû au pivotage. Le premier est basé sur une stratégie de pivotage dite "communication avoiding" (CALU) alors que le deuxième utilise un préconditionnement aléatoire du système original pour éviter de pivoter (RBT). Nous montrons que ces deux méthodes surpassent le solveur utilisant la factorisation LU avec pivotage partiel quand elles sont utilisées sur des architectures hybrides multicœurs/GPUs. Ensuite nous développons des solveurs utilisant des techniques de randomisation appliquées sur des architectures hybrides utilisant des GPU Nvidia ou des coprocesseurs Intel Xeon Phi. Avec cette méthode, nous pouvons éviter l'important surcoût du pivotage tout en restant stable numériquement dans la plupart des cas. L'architecture hautement parallèle de ces accélérateurs nous permet d'effectuer la randomisation de notre système linéaire à un coût de calcul très faible par rapport à la durée de la factorisation. Finalement, nous étudions l'impact d'accès mémoire non uniformes (NUMA) sur la résolution de systèmes linéaires denses en utilisant un algorithme de factorisation LU. En particulier, nous illustrons comment un placement approprié des processus légers et des données sur une architecture NUMA peut améliorer les performances pour la factorisation du panel et accélérer de manière conséquente la factorisation LU globale. Nous montrons comment ces placements peuvent améliorer les performances quand ils sont appliqués à des solveurs hybrides multicœurs/GPU. / In this PhD thesis, we study algorithms and implementations to accelerate the solution of dense linear systems by using hybrid architectures with multicore processors and accelerators. We focus on methods based on the LU factorization and our code development takes place in the context of the MAGMA library. We study different hybrid CPU/GPU solvers based on the LU factorization which aim at reducing the communication overhead due to pivoting. The first one is based on a communication avoiding strategy of pivoting (CALU) while the second uses a random preconditioning of the original system to avoid pivoting (RBT). We show that both of these methods outperform the solver using LU factorization with partial pivoting when implemented on hybrid multicore/GPUs architectures. We also present new solvers based on randomization for hybrid architectures for Nvidia GPU or Intel Xeon Phi coprocessor. With this method, we can avoid the high cost of pivoting while remaining numerically stable in most cases. The highly parallel architecture of these accelerators allow us to perform the randomization of our linear system at a very low computational cost compared to the time of the factorization. Finally we investigate the impact of non-uniform memory accesses (NUMA) on the solution of dense general linear systems using an LU factorization algorithm. In particular we illustrate how an appropriate placement of the threads and data on a NUMA architecture can improve the performance of the panel factorization and consequently accelerate the global LU factorization. We show how these placements can improve the performance when applied to hybrid multicore/GPU solvers. Systèmes linéaires denses Factorisation LU Bibliothèque MAGMA Calcul hybride multicœur/GPU Processeurs graphiques Intel Xeon Phi . ccNUMA Communication-avoiding Randomisation Placement des processus légers Dense linear systems LU factorization Dense linear algebra libraries MAGMA library Hybrid multicore/GPU computing Graphics process units Intel Xeon Phi . ccNUMA Communication-avoiding algorithms Randomization Thread placement
109	Optimální odhad stavu modelu navigačního systému / Optimal state estimation of a navigation model system Papež, Milan January 2013 (has links) This thesis presents an investigation of the possibility of using the fixed-point arithmetic in the inertial navigation systems, which use the local level navigation frame mechanization equations. Two square root filtering methods, the Potter's square root Kalman filter and UD factorized Kalman filter, are compared with respect to the conventional Kalman filter and its Joseph's stabilized form. The effect of rounding errors to the Kalman filter optimality and the covariance matrix or its factors conditioning is evaluated for a various lengths of the fractional part of the fixed-point computational word. Main contribution of this research lies in an evaluation of the minimal fixed-point arithmetic word length for the Phi-angle error model with noise statistics which correspond to the tactical grade inertial measurements units.
110	Mesure de la section efficace différentielle de production du boson Z se désintégrant en paires électron-position, dans l'expérience ATLAS Doan, Thi Kieu Oanh 28 November 2012 (has links) (PDF) La première mesure du spectre en phi_eta du boson Z à 7 TeV a été réalisée dans cette thèse. Cette variable permet de sonder la dynamique de production des Z de façon fine. L'échantillon complet des données enregistrées par ATLAS en 2011 a été utilisé ce qui correspond à 4.7/fb de luminosité intégrée. Les résultats de cette mesure sont publiés dans la Ref. [18] fondé sur la note interne Ref. [69]. La section efficace différentielle de Z->ee en fonction phi_eta a été mesurée et comparée aux calculs perturbatifs à ordre fixé, avec/sans resommation pour la région des petits phi_eta. Le code RESBOS fournit la meilleure description des données, cependant il est incapable de reproduire, à mieux de 4%, la forme détaillée de la section efficace mesurée. La section efficace différentielle a également été comparée aux prédictions de différents générateurs Monte Carlo interfacés avec un algorithme de parton shower. Les meilleures descriptions du spectre en phi_eta mesuré sont données par les générateurs SHERPA et POWHEG+PYTHIA8. La mesure précise de la section efficace différentielle en phi_eta fournit des informations précieuses pour l'ajustement des codes Monte Carlo. La précision expérimentale typique de cette mesure (~0.5%) est dix fois meilleure que la précision des calculs théoriques et elle est donc aussi précieuse pour contraindre la théorie. La mesure du spectre en ptZ a également été faite pour quantifier l'incertitude systématique de cette mesure en utilisant la grande statistique de l'échantillon de données. Cela permet de comparer deux mesures qui traitent de l'impulsion transverse du boson Z. Dans la plupart du domaine en phi_eta l'incertitude systématique de la mesure de ptZ est deux fois plus grande que celle de la mesure de phi_eta. Cette comparaison confirme l'intérêt de la variable phi_eta. Les résultats présentés dans cette thèse ont beaucoup d'implications pour les études futures. Ajustant les générateurs Monte Carlo en utilisant les résultats de la mesure précise du spectre en phi_eta minimisera l'incertitude sur leurs paramètres. Une mesure de la section efficace doublement différentielle en ptZ et phi_eta est intéressante pour mieux comprendre la corrélation entre ces deux variables. La mesure précise du spectre en ptZ utilisant la variable phi_eta peut être appliquée au spectre en ptW et on sait que des mesures plus fines du ptW sont importante pour une détermination précise de la masse du boson W. De plus, une compréhension précise du spectre en ptZ est importante pour comprendre les propriétés cinématiques de la production du boson de Higgs. [SPI:OTHER] Engineering Sciences/Other Modèle Standard Boson Z Électron-positon Section efficace différentielle Angle Phi_eta Impulsion transverse PtZ 7 TeV ATLAS LHC QCD RESBOS FEWZ Générateurs Monte Carlo QED FSR PHOTOS SHERPA Mesure précise Incertitude systématiqu

Search results