Global ETD Search

41	Compilation pour machines à mémoire répartie : une approche multipasse / Compilation for distributed memory machines : a multipass approach Lossing, Nelson 03 April 2017 (has links) Les grilles de calculs sont des architectures distribuées couramment utilisées pour l'exécution de programmes scientifiques ou de simulation. Les programmeurs doivent ainsi acquérir de nouvelles compétences pour pouvoir tirer partie au mieux de toutes les ressources offertes. Ils doivent apprendre à écrire un code parallèle, et, éventuellement, à gérer une mémoire distribuée.L'ambition de cette thèse est de proposer une chaîne de compilation permettant de générer automatiquement un code parallèle distribué en tâches à partir d'un code séquentiel. Pour cela, le compilateur source-à-source PIPS est utilisé. Notre approche a deux atouts majeurs : 1) une succession de transformations simples et modulaires est appliquée, permettant à l'utilisateur de comprendre les différentes transformations appliquées, de les modifier, de les réutiliser dans d'autres contextes, et d'en ajouter de nouvelles; 2) une preuve de correction de chacune des transformations est donnée, permettant de garantir que le code généré est équivalent au code initial.Cette génération automatique de code parallèle distribué de tâches offre également une interface de programmation simple pour les utilisateurs. Une version parallèle du code est automatiquement générée à partir d'un code séquentiel annoté.Les expériences effectuées sur deux machines parallèles, sur des noyaux de Polybench, montrent une accélération moyenne linéaire voire super-linéaire sur des exemples de petites tailles et une accélération moyenne égale à la moitié du nombre de processus sur des exemples de grandes tailles. / Scientific and simulation programs often use clusters for their execution. Programmers need new programming skills to fully take advantage of all the available resources. They have to learn how to write parallel codes, and how to manage the potentially distributed memory.This thesis aims at generating automatically a distributed parallel code for task parallelisation from a sequential code. A source-to-source compiler, PIPS, is used to achieve this goal. Our approach has two main advantages: 1) a chain of simple and modular transformations to apply, thus visible and intelligible by the users, editable and reusable, and that make new optimisations possible; 2) a proof of correctness of the parallelisation process is made, allowing to insure that the generated code is correct and has the same result as the sequential one.This automatic generation of distributed-task program for distributed-memory machines provide a simple programming interface for the users to write a task oriented code. A parallel code can thus automatically be generated with our compilation process.The experimental results obtained on two parallel machines, using Polybench kernels, show a linear to super-linear average speedup on small data sizes. For large ones, average speedup is equal to half the number of processes. Langages parallèles Compilation Mémoire distribuée Architecture parallèle distribuée Parallélisation de tâche Génération automatique de code Vérification de code Parallel languages Compilation Distributed memory Distribued parallel architecture Task parallelisation Automatic code generation Code verification 621.39 004.5
42	Effiziente parallele Sortier- und Datenumverteilungsverfahren für Partikelsimulationen auf Parallelrechnern mit verteiltem Speicher Hofmann, Michael 09 March 2012 (has links) Partikelsimulationen repräsentieren eine Klasse von daten- und rechenintensiven Simulationsanwendungen, die in unterschiedlichen Bereichen der Wissenschaft und der industriellen Forschung zum Einsatz kommen. Der hohe Berechnungsaufwand der eingesetzten Lösungsmethoden und die großen Datenmengen, die zur Modellierung realistischer Probleme benötigt werden, machen die Nutzung paralleler Rechentechnik hierfür unverzichtbar. Parallelrechner mit verteiltem Speicher stellen dabei eine weit verbreitete Architektur dar, bei der eine Vielzahl an parallel arbeitenden Rechenknoten über ein Verbindungsnetzwerk miteinander Daten austauschen können. Die Berechnung von Wechselwirkungen zwischen Partikeln stellt oft den Hauptaufwand einer Partikelsimulation dar und wird mit Hilfe schneller Lösungsmethoden, wie dem Barnes-Hut-Algorithmus oder der Schnellen Multipolmethode, durchgeführt. Effiziente parallele Implementierungen dieser Algorithmen benötigen dabei eine Sortierung der Partikel nach ihren räumlichen Positionen. Die Sortierung ist sowohl notwendig, um einen effizienten Zugriff auf die Partikeldaten zu erhalten, als auch Teil von Optimierungen zur Erhöhung der Lokalität von Speicherzugriffen, zur Minimierung der Kommunikation und zur Verbesserung der Lastbalancierung paralleler Berechnungen. Die vorliegende Dissertation beschäftigt sich mit der Entwicklung eines effizienten parallelen Sortierverfahrens und der dafür benötigten Kommunikationsoperationen zur Datenumverteilung in Partikelsimulationen. Hierzu werden eine Vielzahl existierender paralleler Sortierverfahren für verteilten Speicher analysiert und mit den Anforderungen von Seiten der Partikelsimulationsanwendungen verglichen. Besondere Herausforderungen ergeben sich dabei hinsichtlich der Aufteilung der Partikeldaten auf verteilten Speicher, der Gewichtung zu sortierender Daten zur verbesserten Lastbalancierung, dem Umgang mit doppelten Schlüsselwerten sowie der Verfügbarkeit und Nutzung speichereffizienter Kommunikationsoperationen. Um diese Anforderungen zu erfüllen, wird ein neues paralleles Sortierverfahren entwickelt und in die betrachteten Anwendungsprogramme integriert. Darüber hinaus wird ein neuer In-place-Algorithmus für der MPI_Alltoallv-Kommunikationsoperation vorgestellt, mit dem der Speicherverbrauch für die notwendige Datenumverteilung innerhalb der parallelen Sortierung deutlich reduziert werden kann. Das Verhalten aller entwickelten Verfahren wird jeweils isoliert und im praxisrelevanten Einsatz innerhalb verschiedener Anwendungsprogramme und unter Verwendung unterschiedlicher, insbesondere auch hochskalierbarer Parallelrechner untersucht. info:eu-repo/classification/ddc/005 ddc:005
43	Optimization of memory management on distributed machine / Optimisation de la gestion mémoire sur machines distribuées Ha, Viet Hai 05 October 2012 (has links) Afin d'exploiter les capacités des architectures parallèles telles que les grappes, les grilles, les systèmes multi-processeurs, et plus récemment les nuages et les systèmes multi-cœurs, un langage de programmation universel et facile à utiliser reste à développer. Du point de vue du programmeur, OpenMP est très facile à utiliser en grande partie grâce à sa capacité à supporter une parallélisation incrémentale, la possibilité de définir dynamiquement le nombre de fils d'exécution, et aussi grâce à ses stratégies d'ordonnancement. Cependant, comme il a été initialement conçu pour des systèmes à mémoire partagée, OpenMP est généralement très limité pour effectuer des calculs sur des systèmes à mémoire distribuée. De nombreuses solutions ont été essayées pour faire tourner OpenMP sur des systèmes à mémoire distribuée. Les approches les plus abouties se concentrent sur l’exploitation d’une architecture réseau spéciale et donc ne peuvent fournir une solution ouverte. D'autres sont basées sur une solution logicielle déjà disponible telle que DMS, MPI ou Global Array, et par conséquent rencontrent des difficultés pour fournir une implémentation d'OpenMP complètement conforme et à haute performance. CAPE — pour Checkpointing Aided Parallel Execution — est une solution alternative permettant de développer une implémentation conforme d'OpenMP pour les systèmes à mémoire distribuée. L'idée est la suivante : en arrivant à une section parallèle, l'image du thread maître est sauvegardé et est envoyée aux esclaves ; puis, chaque esclave exécute l'un des threads ; à la fin de la section parallèle, chaque threads esclaves extraient une liste de toutes modifications ayant été effectuées localement et la renvoie au thread maître ; le thread maître intègre ces modifications et reprend son exécution. Afin de prouver la faisabilité de cette approche, la première version de CAPE a été implémentée en utilisant des points de reprise complets. Cependant, une analyse préliminaire a montré que la grande quantité de données transmises entre les threads et l’extraction de la liste des modifications depuis les points de reprise complets conduit à de faibles performances. De plus, cette version est limitée à des problèmes parallèles satisfaisant les conditions de Bernstein, autrement dit, il ne permet pas de prendre en compte les données partagées. L'objectif de cette thèse est de proposer de nouvelles approches pour améliorer les performances de CAPE et dépasser les restrictions sur les données partagées. Tout d'abord, nous avons développé DICKPT (Discontinuous Incremental ChecKPoinTing), une technique points de reprise incrémentaux qui supporte la possibilité de prendre des points de reprise discontinue lors de l'exécution d'un processus. Basé sur DICKPT, la vitesse d'exécution de la nouvelle version de CAPE a été considérablement augmenté. Par exemple, le temps de calculer une grande multiplication matrice-matrice sur un cluster des ordinateurs bureaux est devenu très similaire à la durée d'exécution d'un programme MPI optimisé. En outre, l'accélération associée à cette nouvelle version pour divers nombre de threads est assez linéaire pour différentes tailles du problème. Pour des données partagées, nous avons proposé UHLRC (Updated Home-based Lazy Relaxed Consistency), une version modifiée de la HLRC (Home-based Lazy Relaxed Consistency) modèle de mémoire, pour le rendre plus adapté aux caractéristiques de CAPE. Les prototypes et les algorithmes à mettre en œuvre la synchronisation des données et des directives et clauses de données partagées sont également précisées. Ces deux travaux garantit la possibilité pour CAPE de respecter des demandes de données partagées d'OpenMP / In order to explore further the capabilities of parallel computing architectures such as grids, clusters, multi-processors and more recently, clouds and multi-cores, an easy-to-use parallel language is an important challenging issue. From the programmer's point of view, OpenMP is very easy to use with its ability to support incremental parallelization, features for dynamically setting the number of threads and scheduling strategies. However, as initially designed for shared memory systems, OpenMP is usually limited on distributed memory systems to intra-nodes' computations. Many attempts have tried to port OpenMP on distributed systems. The most emerged approaches mainly focus on exploiting the capabilities of a special network architecture and therefore cannot provide an open solution. Others are based on an already available software solution such as DMS, MPI or Global Array and, as a consequence, they meet difficulties to become a fully-compliant and high-performance implementation of OpenMP. As yet another attempt to built an OpenMP compliant implementation for distributed memory systems, CAPE − which stands for Checkpointing Aide Parallel Execution − has been developed which with the following idea: when reaching a parallel section, the master thread is dumped and its image is sent to slaves; then, each slave executes a different thread; at the end of the parallel section, slave threads extract and return to the master thread the list of all modifications that has been locally performed; the master includes these modifications and resumes its execution. In order to prove the feasibility of this paradigm, the first version of CAPE was implemented using complete checkpoints. However, preliminary analysis showed that the large amount of data transferred between threads and the extraction of the list of modifications from complete checkpoints lead to weak performance. Furthermore, this version was restricted to parallel problems satisfying the Bernstein's conditions, i.e. it did not solve the requirements of shared data. This thesis aims at presenting the approaches we proposed to improve CAPE' performance and to overcome the restrictions on shared data. First, we developed DICKPT which stands for Discontinuous Incremental Checkpointing, an incremental checkpointing technique that supports the ability to save incremental checkpoints discontinuously during the execution of a process. Based on the DICKPT, the execution speed of the new version of CAPE was significantly increased. For example, the time to compute a large matrix-matrix product on a desktop cluster has become very similar to the execution time of the same optimized MPI program. Moreover, the speedup associated with this new version for various number of threads is quite linear for different problem sizes. In the side of shared data, we proposed UHLRC, which stands for Updated Home-based Lazy Release Consistency, a modified version of the Home-based Lazy Release Consistency (HLRC) memory model, to make it more appropriate to the characteristics of CAPE. Prototypes and algorithms to implement the synchronization and OpenMP data-sharing clauses and directives are also specified. These two works ensures the ability for CAPE to respect shared-data behavior CAPE Open MP DICKPT UHLRC Mémoire distribuée Parallèlisme CAPE Chekpointing aided parallel execution Open MP compliance DICKPT Discontinuous incremental checkpointing UHLRC Distributed memory system Parallel computing
44	Simulation de la dynamique des dislocations à très grande échelle / Hybrid parallelism on large scale dislocation dynamic simulation Etcheverry, Arnaud 23 November 2015 (has links) Le travail réalisé durant cette thèse vise à offrir à un code de simulation en dynamique des dislocations les composantes essentielles pour permettre le passage à l’échelle sur les calculateurs modernes. Nous abordons plusieurs aspects de la simulation numérique avec tout d’abord des considérations algorithmiques. Pour permettre de réaliser des simulations efficaces en terme de complexité algorithmique pour des grandes simulations, nous explorons les contraintes des différentes étapes de la simulation en offrant une analyse et des améliorations aux algorithmes. Ensuite, une considération particulière est apportée aux structures de données. En prenant en compte les nouveaux algorithmes, nous proposons une structure de données pour bénéficier d’accès performants à travers la hiérarchie mémoire. Cette structure est modulaire pour faire face à deux types d’algorithmes, avec d’un côté la gestion du maillage nécessitant une gestion dynamique de la mémoire et de l’autre les phases de calcul intensifs avec des accès rapides. Pour cela cette structure modulaire est complétée par un octree pour gérer la décomposition de domaine et aussi les algorithmes hiérarchiques comme le calcul du champ de contrainte et la détection des collisions. Enfin nous présentons les aspects parallèles du code. Pour cela nous introduisons une approche hybride, avec un parallélisme à grain fin à base de threads, et un parallélisme à gros grain de type MPI nécessitant une décomposition de domaine et un équilibrage de charge.Finalement, ces contributions sont testées pour valider les apports pour la simulation numérique. Deux cas d’étude sont présentés pour observer et analyser le comportement des différentes briques de la simulation. Tout d’abord une simulation extrêmement dynamique, composée de sources de Frank-Read dans un cristal de zirconium est utilisée, avant de présenter quelques résultats sur une simulation cible contenant une forte densité de défauts d’irradiation. / This research work focuses on bringing performances in 3D dislocation dynamics simulation, to run efficiently on modern computers. First of all, we introduce some algorithmic technics, to reduce the complexity in order to target large scale simulations. Second of all, we focus on data structure to take into account both memory hierachie and algorithmic data access. On one side we build this adaptive data structure to handle dynamism of data and on the other side we use an Octree to combine hierachie decompostion and data locality in order to face intensive arithmetics with force field computation and collision detection. Finnaly, we introduce some parallel aspects of our simulation. We propose a classical hybrid parallelism, with task based openMP threads and domain decomposition technics for MPI. Dynamique des dislocations Scalabilité MPI Mémoire distribuée OpenMP Mémoire partagée Parallélisme hybride Méthode multipôle rapide Hiérarchie mémoire Structure de données Problème à N-corps Simulation Scalability MPI Distributed memory Shared memory OpenMP task Hybrid Parallelism Fast Multipol method Memory hierarchie Cache efficient Data structure N-body problem 3D Dislocation dynamics
45	Environnement d'exécution parallèle : conception et architecture Costa, Celso Maciel da January 1993 (has links) L'objectif de cette thèse est l'étude d'un environnement d'exécution pour machines parallèles sans mémoire commune. Elle comprend la définition d'un modèle de programme parallèle, basé sur l'échange de message offrant une forme restreinte de mémoire partagée. La communication est indirecte, via des portes; les processus utilisent les barrières pour la synchronisation. Les entités du système. processus, portes et barrières, sont créées dynamiquement, et placées sur un processeur quelconque du réseau de processeurs de façon explicite. Nous proposons une implantation de ce modèle comme la mise en oeuvre systématique d'une architecture client/serveur. Cette implantation a été efféctuée sur une machine Supemode. La base est un Micro Noyau Parallèle, où le composant principal est un mécanisme d'appel de procédure à distance minimal. / This thesis describes an execution environment for parallel machines without shared memory. A parallel programming model based on message passing, with a special shared memory. In this model, process communication occurs indirectly, via ports, and the processes use barriers for synchronization. All the entities of the system, such as processes, ports and barriers, are created dynamically and loaded on any processor of the network of processors. The implementation architecture of our model is a systematic realization of the client/server model. An implementation is proposed in a Supernode parallel machine as a parallel micro kernel. The principal parallel micro kernel component is a minimal remote procedure call mechanism. Modèles de programmes parallèles Environnement d'exécution Modèle client/serveur Appel de procédureà distance Micro noyau parallèle Maquinas paralelas Programação paralela Cliente/servidor Distributed memory parallel machine Parallel programming model Execution environment Client/server model Remote procedure call Parallel Micro Kernel
46	Environnement d'exécution parallèle : conception et architecture Costa, Celso Maciel da January 1993 (has links) L'objectif de cette thèse est l'étude d'un environnement d'exécution pour machines parallèles sans mémoire commune. Elle comprend la définition d'un modèle de programme parallèle, basé sur l'échange de message offrant une forme restreinte de mémoire partagée. La communication est indirecte, via des portes; les processus utilisent les barrières pour la synchronisation. Les entités du système. processus, portes et barrières, sont créées dynamiquement, et placées sur un processeur quelconque du réseau de processeurs de façon explicite. Nous proposons une implantation de ce modèle comme la mise en oeuvre systématique d'une architecture client/serveur. Cette implantation a été efféctuée sur une machine Supemode. La base est un Micro Noyau Parallèle, où le composant principal est un mécanisme d'appel de procédure à distance minimal. / This thesis describes an execution environment for parallel machines without shared memory. A parallel programming model based on message passing, with a special shared memory. In this model, process communication occurs indirectly, via ports, and the processes use barriers for synchronization. All the entities of the system, such as processes, ports and barriers, are created dynamically and loaded on any processor of the network of processors. The implementation architecture of our model is a systematic realization of the client/server model. An implementation is proposed in a Supernode parallel machine as a parallel micro kernel. The principal parallel micro kernel component is a minimal remote procedure call mechanism. Modèles de programmes parallèles Environnement d'exécution Modèle client/serveur Appel de procédureà distance Micro noyau parallèle Maquinas paralelas Programação paralela Cliente/servidor Distributed memory parallel machine Parallel programming model Execution environment Client/server model Remote procedure call Parallel Micro Kernel
47	Environnement d'exécution parallèle : conception et architecture Costa, Celso Maciel da January 1993 (has links) L'objectif de cette thèse est l'étude d'un environnement d'exécution pour machines parallèles sans mémoire commune. Elle comprend la définition d'un modèle de programme parallèle, basé sur l'échange de message offrant une forme restreinte de mémoire partagée. La communication est indirecte, via des portes; les processus utilisent les barrières pour la synchronisation. Les entités du système. processus, portes et barrières, sont créées dynamiquement, et placées sur un processeur quelconque du réseau de processeurs de façon explicite. Nous proposons une implantation de ce modèle comme la mise en oeuvre systématique d'une architecture client/serveur. Cette implantation a été efféctuée sur une machine Supemode. La base est un Micro Noyau Parallèle, où le composant principal est un mécanisme d'appel de procédure à distance minimal. / This thesis describes an execution environment for parallel machines without shared memory. A parallel programming model based on message passing, with a special shared memory. In this model, process communication occurs indirectly, via ports, and the processes use barriers for synchronization. All the entities of the system, such as processes, ports and barriers, are created dynamically and loaded on any processor of the network of processors. The implementation architecture of our model is a systematic realization of the client/server model. An implementation is proposed in a Supernode parallel machine as a parallel micro kernel. The principal parallel micro kernel component is a minimal remote procedure call mechanism. Modèles de programmes parallèles Environnement d'exécution Modèle client/serveur Appel de procédureà distance Micro noyau parallèle Maquinas paralelas Programação paralela Cliente/servidor Distributed memory parallel machine Parallel programming model Execution environment Client/server model Remote procedure call Parallel Micro Kernel
48	Realisierung einer Schedulingumgebung für gemischt-parallele Anwendungen und Optimierung von layer-basierten Schedulingalgorithmen Kunis, Raphael 20 January 2011 (has links) Eine Herausforderung der Parallelverarbeitung ist das Erreichen von Skalierbarkeit großer paralleler Anwendungen für verschiedene parallele Systeme. Das zentrale Problem ist, dass die Ausführung einer Anwendung auf einem parallelen System sehr gut sein kann, die Portierung auf ein anderes System in der Regel jedoch zu schlechten Ergebnissen führt. Durch die Verwendung des Programmiermodells der parallelen Tasks mit Abhängigkeiten kann die Skalierbarkeit für viele parallele Algorithmen deutlich verbessert werden. Die Programmierung mit parallelen Tasks führt zu Task-Graphen mit Abhängigkeiten zur Darstellung einer parallelen Anwendung, die auch als gemischt-parallele Anwendung bezeichnet wird. Die Grundlage für eine effiziente Abarbeitung einer gemischt-parallelen Anwendung bildet ein geeigneter Schedule, der eine effiziente Abbildung der parallelen Tasks auf die Prozessoren des parallelen Systems vorgibt. Für die Berechnung eines Schedules werden Schedulingalgorithmen eingesetzt. Ein zentrales Problem bei der Bestimmung eines Schedules für gemischt-parallele Anwendungen besteht darin, dass das Scheduling bereits für Single-Prozessor-Tasks mit Abhängigkeiten und ein paralleles System mit zwei Prozessoren NP-hart ist. Daher existieren lediglich Approximationsalgorithmen und Heuristiken um einen Schedule zu berechnen. Eine Möglichkeit zur Berechnung eines Schedules sind layerbasierte Schedulingalgorithmen. Diese Schedulingalgorithmen bilden zuerst Layer unabhängiger paralleler Tasks und berechnen den Schedule für jeden Layer separat. Eine Schwachstelle dieser Schedulingalgorithmen ist das Zusammenfügen der einzelnen Schedules zum globalen Schedule. Der vorgestellte Algorithmus Move-blocks bietet eine elegante Möglichkeit das Zusammenfügen zu verbessern. Dies geschieht durch eine Verschmelzung der Schedules aufeinander folgender Layer. Obwohl eine Vielzahl an Schedulingalgorithmen für gemischt-parallele Anwendungen existiert, gibt es bislang keine umfassende Unterstützung des Schedulings durch Programmierwerkzeuge. Im Besonderen gibt es keine Schedulingumgebung, die eine Vielzahl an Schedulingalgorithmen in sich vereint. Die Vorstellung der flexiblen, komponentenbasierten und erweiterbaren Schedulingumgebung SEParAT ist der zweite Fokus dieser Dissertation. SEParAT unterstützt verschiedene Nutzungsszenarien, die weit über das reine Scheduling hinausgehen, z.B. den Vergleich von Schedulingalgorithmen und die Erweiterung und Realisierung neuer Schedulingalgorithmen. Neben der Vorstellung der Nutzungsszenarien werden sowohl die interne Verarbeitung eines Schedulingdurchgangs als auch die komponentenbasierte Softwarearchitektur detailliert vorgestellt. info:eu-repo/classification/ddc/004 ddc:004
49	Parallel distributed-memory particle methods for acquisition-rate segmentation and uncertainty quantifications of large fluorescence microscopy images Afshar, Yaser 08 November 2016 (has links) (PDF) Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. Another issue is the information loss during image acquisition due to limitations of the optical imaging systems. Analysis of the acquired images may, therefore, find multiple solutions (or no solution) due to imaging noise, blurring, and other uncertainties introduced during image acquisition. In this thesis, we address the computational processing time and memory issues by developing a distributed parallel algorithm for segmentation of large fluorescence-microscopy images. The method is based on the versatile Discrete Region Competition (Cardinale et al., 2012) algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collective solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 10^10 pixels) but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data inspection and interactive experiments. Second, we estimate the segmentation uncertainty on large images that do not fit the main memory of a single computer. We there- fore develop a distributed parallel algorithm for efficient Markov- chain Monte Carlo Discrete Region Sampling (Cardinale, 2013). The parallel algorithm provides a measure of segmentation uncertainty in a statistically unbiased way. It approximates the posterior probability densities over the high-dimensional space of segmentations around the previously found segmentation. / Moderne Fluoreszenzmikroskopie, wie zum Beispiel Lichtblattmikroskopie, erlauben die Aufnahme hochaufgelöster, 3-dimensionaler Bilder. Dies führt zu einen Engpass bei der Bearbeitung und Analyse der aufgenommenen Bilder, da die Aufnahmerate die Datenverarbeitungsrate übersteigt. Zusätzlich können diese Bilder so groß sein, dass sie die Speicherkapazität eines einzelnen Computers überschreiten. Hinzu kommt der aus Limitierungen des optischen Abbildungssystems resultierende Informationsverlust während der Bildaufnahme. Bildrauschen, Unschärfe und andere Messunsicherheiten können dazu führen, dass Analysealgorithmen möglicherweise mehrere oder keine Lösung für Bildverarbeitungsaufgaben finden. Im Rahmen der vorliegenden Arbeit entwickeln wir einen verteilten, parallelen Algorithmus für die Segmentierung von speicherintensiven Fluoreszenzmikroskopie-Bildern. Diese Methode basiert auf dem vielseitigen "Discrete Region Competition" Algorithmus (Cardinale et al., 2012), der sich bereits in anderen Anwendungen als nützlich für die Segmentierung von Mikroskopie-Bildern erwiesen hat. Das hier präsentierte Verfahren unterteilt das Eingangsbild in kleinere Unterbilder, welche auf die Speicher mehrerer Computer verteilt werden. Die Koordinierung des globalen Segmentierungsproblems wird durch die Benutzung von Netzwerkkommunikation erreicht. Dies erlaubt die Segmentierung von sehr großen Bildern, wobei wir die Anwendung des Algorithmus auf Bildern mit bis zu 10^10 Pixeln demonstrieren. Zusätzlich wird die Segmentierungsgeschwindigkeit erhöht und damit vergleichbar mit der Aufnahmerate des Mikroskops. Dies ist eine Grundvoraussetzung für die intelligenten Mikroskope der Zukunft, und es erlaubt die Online-Betrachtung der aufgenommenen Daten, sowie interaktive Experimente. Wir bestimmen die Unsicherheit des Segmentierungsalgorithmus bei der Anwendung auf Bilder, deren Größe den Speicher eines einzelnen Computers übersteigen. Dazu entwickeln wir einen verteilten, parallelen Algorithmus für effizientes Markov-chain Monte Carlo "Discrete Region Sampling" (Cardinale, 2013). Dieser Algorithmus quantifiziert die Segmentierungsunsicherheit statistisch erwartungstreu. Dazu wird die A-posteriori-Wahrscheinlichkeitsdichte über den hochdimensionalen Raum der Segmentierungen in der Umgebung der zuvor gefundenen Segmentierung approximiert. Parallelverarbeitung verteilter Speicher Bildsegmentierung Segmentierungsunsicherheit Partikelmethode Aufnahmerate des Mikroskops Fluoreszenzmikroskopie-Bildern große Bilder 3-dimensionale Bilder verteilter paralleler Algorithmus Discrete Region Competition Discrete Region Sampling Markov-chain Monte Carlo parallel computing distributed-memory image segmentation segmentation uncertainty particle method acquisition-rate fluorescence microscopy images large images three-dimensional images distributed parallel algorithm Discrete Region Competition Discrete Region Sampling Markov-chain Monte Carlo ddc:004 rvk:ST 330
50	Parallel distributed-memory particle methods for acquisition-rate segmentation and uncertainty quantifications of large fluorescence microscopy images Afshar, Yaser 17 October 2016 (has links) Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. Another issue is the information loss during image acquisition due to limitations of the optical imaging systems. Analysis of the acquired images may, therefore, find multiple solutions (or no solution) due to imaging noise, blurring, and other uncertainties introduced during image acquisition. In this thesis, we address the computational processing time and memory issues by developing a distributed parallel algorithm for segmentation of large fluorescence-microscopy images. The method is based on the versatile Discrete Region Competition (Cardinale et al., 2012) algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collective solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 10^10 pixels) but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data inspection and interactive experiments. Second, we estimate the segmentation uncertainty on large images that do not fit the main memory of a single computer. We there- fore develop a distributed parallel algorithm for efficient Markov- chain Monte Carlo Discrete Region Sampling (Cardinale, 2013). The parallel algorithm provides a measure of segmentation uncertainty in a statistically unbiased way. It approximates the posterior probability densities over the high-dimensional space of segmentations around the previously found segmentation. / Moderne Fluoreszenzmikroskopie, wie zum Beispiel Lichtblattmikroskopie, erlauben die Aufnahme hochaufgelöster, 3-dimensionaler Bilder. Dies führt zu einen Engpass bei der Bearbeitung und Analyse der aufgenommenen Bilder, da die Aufnahmerate die Datenverarbeitungsrate übersteigt. Zusätzlich können diese Bilder so groß sein, dass sie die Speicherkapazität eines einzelnen Computers überschreiten. Hinzu kommt der aus Limitierungen des optischen Abbildungssystems resultierende Informationsverlust während der Bildaufnahme. Bildrauschen, Unschärfe und andere Messunsicherheiten können dazu führen, dass Analysealgorithmen möglicherweise mehrere oder keine Lösung für Bildverarbeitungsaufgaben finden. Im Rahmen der vorliegenden Arbeit entwickeln wir einen verteilten, parallelen Algorithmus für die Segmentierung von speicherintensiven Fluoreszenzmikroskopie-Bildern. Diese Methode basiert auf dem vielseitigen "Discrete Region Competition" Algorithmus (Cardinale et al., 2012), der sich bereits in anderen Anwendungen als nützlich für die Segmentierung von Mikroskopie-Bildern erwiesen hat. Das hier präsentierte Verfahren unterteilt das Eingangsbild in kleinere Unterbilder, welche auf die Speicher mehrerer Computer verteilt werden. Die Koordinierung des globalen Segmentierungsproblems wird durch die Benutzung von Netzwerkkommunikation erreicht. Dies erlaubt die Segmentierung von sehr großen Bildern, wobei wir die Anwendung des Algorithmus auf Bildern mit bis zu 10^10 Pixeln demonstrieren. Zusätzlich wird die Segmentierungsgeschwindigkeit erhöht und damit vergleichbar mit der Aufnahmerate des Mikroskops. Dies ist eine Grundvoraussetzung für die intelligenten Mikroskope der Zukunft, und es erlaubt die Online-Betrachtung der aufgenommenen Daten, sowie interaktive Experimente. Wir bestimmen die Unsicherheit des Segmentierungsalgorithmus bei der Anwendung auf Bilder, deren Größe den Speicher eines einzelnen Computers übersteigen. Dazu entwickeln wir einen verteilten, parallelen Algorithmus für effizientes Markov-chain Monte Carlo "Discrete Region Sampling" (Cardinale, 2013). Dieser Algorithmus quantifiziert die Segmentierungsunsicherheit statistisch erwartungstreu. Dazu wird die A-posteriori-Wahrscheinlichkeitsdichte über den hochdimensionalen Raum der Segmentierungen in der Umgebung der zuvor gefundenen Segmentierung approximiert. info:eu-repo/classification/ddc/004 ddc:004

Search results