Global ETD Search

21	The Value Proposition of Campus High-Performance Computing Facilities to Institutional Productivity - A Production Function Model Preston M Smith (13119846) 21 July 2022 (has links) <p>This disseration measurses the ROI of the institution’s investment in HPC facilities, and an application of a production function to create a model that will measure the HPC facility investment’s impact on the financial, academic, and reputational outputs of the institution.</p> HPC ROI
22	Towards a Resource Efficient Framework for Distributed Deep Learning Applications Han, Jingoo 24 August 2022 (has links) Distributed deep learning has achieved tremendous success for solving scientific problems in research and discovery over the past years. Deep learning training is quite challenging because it requires training on large-scale massive dataset, especially with graphics processing units (GPUs) in latest high-performance computing (HPC) supercomputing systems. HPC architectures bring different performance trends in training throughput compared to the existing studies. Multiple GPUs and high-speed interconnect are used for distributed deep learning on HPC systems. Extant distributed deep learning systems are designed for non-HPC systems without considering efficiency, leading to under-utilization of expensive HPC hardware. In addition, increasing resource heterogeneity has a negative effect on resource efficiency in distributed deep learning methods including federated learning. Thus, it is important to focus on an increasing demand for both high performance and high resource efficiency for distributed deep learning systems, including latest HPC systems and federated learning systems. In this dissertation, we explore and design novel methods and frameworks to improve resource efficiency of distributed deep learning training. We address the following five important topics: performance analysis on deep learning for supercomputers, GPU-aware deep learning job scheduling, topology-aware virtual GPU training, heterogeneity-aware adaptive scheduling, and token-based incentive algorithm. In the first chapter (Chapter 3), we explore and focus on analyzing performance trend of distributed deep learning on latest HPC systems such as Summitdev supercomputer at Oak Ridge National Laboratory. We provide insights by conducting a comprehensive performance study on how deep learning workloads have effects on the performance of HPC systems with large-scale parallel processing capabilities. In the second part (Chapter 4), we design and develop a novel deep learning job scheduler MARBLE, which considers efficiency of GPU resource based on non-linear scalability of GPUs in a single node and improves GPU utilization by sharing GPUs with multiple deep learning training workloads. The third part of this dissertation (Chapter 5) proposes topology-aware virtual GPU training systems TOPAZ, specifically designed for distributed deep learning on recent HPC systems. In the fourth chapter (Chapter 6), we conduct exploration on an innovative holistic federated learning scheduling that employs a heterogeneity-aware adaptive selection method for improving resource efficiency and accuracy performance, coupled with resource usage profiling and accuracy monitoring to achieve multiple goals. In the fifth part of this dissertation (Chapter 7), we are focused on how to provide incentives to participants according to contribution for reaching high performance of final federated model, while tokens are used as a means of paying for the services of providing participants and the training infrastructure. / Doctor of Philosophy / Distributed deep learning is widely used for solving critical scientific problems with massive datasets. However, to accelerate the scientific discovery, resource efficiency is also important for the deployment on real-world systems, such as high-performance computing (HPC) systems. Deployment of existing deep learning applications on these distributed systems may lead to underutilization of HPC hardware resources. In addition, extreme resource heterogeneity has negative effects on distributed deep learning training. However, much of the prior work has not focused on specific challenges in distributed deep learning including HPC systems and heterogeneous federated systems, in terms of optimizing resource utilization.This dissertation addresses the challenges in improving resource efficiency of distributed deep learning applications, through performance analysis on deep learning for supercomputers, GPU-aware deep learning job scheduling, topology-aware virtual GPU training, and heterogeneity-aware adaptive federated learning scheduling and incentive algorithms. Deep Learning Federated Learning HPC Distributed Systems
23	Remote High Performance Visualization of Big Data for Immersive Science Abidi, Faiz Abbas 15 June 2017 (has links) Remote visualization has emerged as a necessary tool in the analysis of big data. High-performance computing clusters can provide several benefits in scaling to larger data sizes, from parallel file systems to larger RAM profiles to parallel computation among many CPUs and GPUs. For scalable data visualization, remote visualization tools and infrastructure is critical where only pixels and interaction events are sent over the network instead of the data. In this paper, we present our pipeline using VirtualGL, TurboVNC, and ParaView to render over 40 million points using remote HPC clusters and project over 26 million pixels in a CAVE-style system. We benchmark the system by varying the video stream compression parameters supported by TurboVNC and establish some best practices for typical usage scenarios. This work will help research scientists and academicians in scaling their big data visualizations for real time interaction. / Master of Science / With advancements made in the technology sector, there are now improved and more scientific ways to see the data. 10 years ago, nobody would have thought what a 3D movie is or how it would feel to watch a movie in 3D. Some may even have questioned if it is possible. But watching 3D cinema is typical now and we do not care much about what goes behind the scenes to make this experience possible. Similarly, is it possible to see and interact with 3D data in the same way Tony Stark does in the movie Iron Man? The answer is yes, it is possible with several tools available now and one of these tools is called ParaView, which is mostly used for scientific visualization of data like climate research, computational fluid dynamics, astronomy among other things. You can either visualize this data on a 2D screen or in a 3D environment where a user will feel a sense of immersion as if they are within the scene looking and interacting with the data. But where is this data actually drawn? And how much time does it take to draw if we are dealing with large datasets? Do we want to draw all this 3D data on a local machine or can we make use of powerful remote machines that do the drawing part and send the final image through a network to the client? In most cases, drawing on a remote machine is a better solution when dealing with big data and the biggest bottleneck is how fast can data be sent to and received from the remote machines. In this work, we seek to understand the best practices of drawing big data on remote machines using ParaView and visualizing it in a 3D projection room like a CAVE (see section 2.2 for details on what is a CAVE). Remote rendering CAVE HPC ParaView Big Data
24	Relational Computing Using HPC Resources: Services and Optimizations Soundarapandian, Manikandan 15 September 2015 (has links) Computational epidemiology involves processing, analysing and managing large volumes of data. Such massive datasets cannot be handled efficiently by using traditional standalone database management systems, owing to their limitation in the degree of computational efficiency and bandwidth to scale to large volumes of data. In this thesis, we address management and processing of large volumes of data for modeling, simulation and analysis in epidemiological studies. Traditionally, compute intensive tasks are processed using high performance computing resources and supercomputers whereas data intensive tasks are delegated to standalone databases and some custom programs. DiceX framework is a one-stop solution for distributed database management and processing and its main mission is to leverage and utilize supercomputing resources for data intensive computing, in particular relational data processing. While standalone databases are always on and a user can submit queries at any time for required results, supercomputing resources must be acquired and are available for a limited time period. These resources are relinquished either upon completion of execution or at the expiration of the allocated time period. This kind of reservation based usage style poses critical challenges, including building and launching a distributed data engine onto the supercomputer, saving the engine and resuming from the saved image, devising efficient optimization upgrades to the data engine and enabling other applications to seamlessly access the engine . These challenges and requirements cause us to align our approach more closely with cloud computing paradigms of Infrastructure as a Service(IaaS) and Platform as a Service(PaaS). In this thesis, we propose cloud computing like workflows, but using supercomputing resources to manage and process relational data intensive tasks. We propose and implement several services including database freeze and migrate and resume, ad-hoc resource addition and table redistribution. These services assist in carrying out the workflows defined. We also propose an optimization upgrade to the query planning module of postgres-XC, the core relational data processing engine of the DiceX framework. With a knowledge of domain semantics, we have devised a more robust data distribution strategy that would enable to push down most time consuming sql operations forcefully to the postgres-XC data nodes, bypassing its query planner's default shippability criteria without compromising correctness. Forcing query push down reduces the query processing time by a factor of almost 40%-60% for certain complex spatio-temporal queries on our epidemiology datasets. As part of this work, a generic broker service has also been implemented, which acts as an interface to the DiceX framework by exposing restful apis, which applications can make use of to query and retrieve results irrespective of the programming language or environment. / Master of Science distributed databases HPC supercomputers computational epidemiology
25	Towards Using Free Memory to Improve Microarchitecture Performance Panwar, Gagandeep 18 May 2020 (has links) A computer system's memory is designed to accommodate the worst-case workloads with the highest memory requirement; as such, memory is underutilized when a system runs workloads with common-case memory requirements. Through a large-scale study of four production HPC systems, we find that memory underutilization problem in HPC systems is very severe. As unused memory is wasted memory, we propose exposing a compute node's unused memory to its CPU(s) through a user-transparent CPU-OS codesign. This can enable many new microarchitecture techniques that transparently leverage unused memory locations to help improve microarchitecture performance. We refer to these techniques as Free-memory-aware Microarchitecture Techniques (FMTs). In the context of HPC systems, we present a detailed example of an FMT called Free-memory-aware Replication (FMR). FMR replicates in-use data to unused memory locations to effectively reduce average memory read latency. On average across five HPC benchmark suites, FMR provides 13% performance and 8% system-level energy improvement. / M.S. / Random-access memory (RAM) or simply memory, stores the temporary data of applications that run on a computer system. Its size is determined by the worst-case application workload that the computer system is supposed to run. Through our memory utilization study of four large multi-node high-performance computing (HPC) systems, we find that memory is underutilized severely in these systems. Unused memory is a wasted resource that does nothing. In this work, we propose techniques that can make use of this wasted memory to boost computer system performance. We call these techniques Free-memory-aware Microarchitecture Techniques (FMTs). We then present an FMT for HPC systems in detail called Free-memory-aware Replication (FMR) that provides performance improvement of over 13%. Computer Architecture Memory DRAM HPC systems
26	Utilization-adaptive Memory Architectures Panwar, Gagandeep 14 June 2024 (has links) DRAM contributes significantly to a server system's cost and global warming potential. To make matters worse, DRAM density scaling has not kept up with the scaling in logic and storage technologies. An effective way to reduce DRAM's monetary and environmental cost is to increase its effective utilization and extract the best possible performance in all utilization scenarios. To this end, this dissertation proposes Utilization-adaptive Memory Architectures that enhance the memory controller with the ability to adapt to current memory utilization and implement techniques to boost system performance. These techniques fall under two categories: (i) The techniques under Utilization-adaptive Hardware Memory Replication target the scenario where memory is underutilized and aim to boost performance versus a conventional system without replication, and (ii) The techniques under Utilization-adaptive Hardware Memory Compression target the scenario where memory utilization is high and aim to significantly increase memory capacity while closing the performance gap versus a conventional system that has sufficient memory and does not require compression. / Doctor of Philosophy / A computer system's memory stores information for the system's immediate use (e.g., data and instructions for in-use programs). The performance and capacity of the dominant memory technology – Dynamic Random Access Memory (DRAM) – has not kept up with advancements in computing devices such as CPUs. Furthermore, DRAM significantly contributes to a server's carbon footprint because a server can have over a thousand DRAM chips – substantially more than any other type of chip. DRAM's manufacturing cycle and lifetime energy use make it the most carbon-unfriendly component on today's servers. To reduce the environmental impact of DRAM, an intuitive way is to increase its utilization. To this end, this dissertation explores Utilization-adaptive Memory Architectures which enable the memory controller to adapt to the system's current memory through a variety of techniques such as: (i) Utilization-adaptive Hardware Memory Replication which copies in-use data to free memory and uses the extra copy to improve performance, and (ii) Utilization-adaptive Hardware Memory Compression which uses dense representation for data to save memory and allows the system to run applications that require more memory than the physically installed memory. Compared to conventional systems that do not feature these techniques, these techniques improve performance for different memory utilization scenarios ranging from low to high. Memory DRAM Utilization Replication Compression HPC Cloud
27	Improving Hydrologic Connectivity Delineation Based on High-Resolution DEMs and Geospatial Artificial Intelligence Wu, Di 01 August 2024 (has links) (PDF) Hydrological connectivity is crucial for understanding and managing water resources, ecological processes, and landscape dynamics. High-Resolution Digital Elevation Models (HRDEMs) derived from Light Detection and Ranging (LiDAR) data offer unprecedented detail and accuracy in representing terrain features, making them invaluable for mapping hydrological networks and analyzing landscape connectivity. However, challenges persist in accurately delineating flow networks, identifying flow barriers, and optimizing computational efficiency, particularly in large-scale applications and complex terrain conditions. This dissertation addresses these challenges through a comprehensive exploration of advanced techniques in deep learning, spatial analysis, and parallel computing. A common practice is to breach the elevation of roads near drainage crossing locations to remove flow barriers, which, however, are often unavailable or with variable quality. Thus, developing a reliable drainage crossing dataset is essential to improve the HRDEMs for hydrographic delineation. Deep learning models were developed for classifying images that contain the locations of flow barriers. Based on HRDEMs and aerial orthophotos, different Convolutional Neural Network (CNN) models were trained and compared to assess their effectiveness in image classification in four different watersheds across the U.S. Midwest. The results show that most deep learning models can consistently achieve over 90% accuracies. The CNN model with a batch size of 16, a learning rate of 0.01, an epoch of 100, and the HRDEM as the sole input feature exhibits the best performance with 93% accuracy. The addition of aerial orthophotos and their derived spectral indices is insignificant to or even worsens the model’s accuracy. Transferability assessments across geographic regions show promising potential of best-fit model for broader applications, albeit with varying accuracies influenced by hydrography complexity. Based on identified drainage crossing locations, Drainage Barrier Processing (DBP), such as HRDEM excavation, is employed to remove the flow barriers. However, there's a gap in quantitatively assessing the impact of DBP on HRDEM-derived flowlines, especially at finer scales. HRDEM-derived flowlines generated with different flow direction algorithms were evaluated by developing a framework to measure the effects of flow barrier removal. The results show that the primary factor influencing flowline quality is the presence of flow accumulation artifacts. Quality issues also stem from differences between natural and artificial flow paths, unrealistic flowlines in flat areas, complex canal networks, and ephemeral drainageways. Notably, the improvement achieved by DBP is demonstrated to be more than 6%, showcasing its efficacy in reducing the impact of flow barriers on hydrologic connectivity. To overcome the computational intensity and speed up data processing, the efficiency of parallel computing techniques for GeoAI and hydrological modeling was evaluated. The performance of CPU parallel processing on High-Performance Computing (HPC) systems was compared with serial processing on desktop computers and GPU processing using Graphics Processing Units (GPUs). Results demonstrated substantial performance enhancements with GPU processing, particularly in accelerating computationally intensive tasks such as deep learning-based feature detection and hydrological modeling. However, efficiency trends exhibit nonlinear patterns influenced by factors such as communication overhead, task distribution, and resource contention. In summary, this dissertation presents a GeoAI-Hydro framework that significantly advances the quality of hydrological connectivity modeling. By integrating deep learning for accurate flow barrier identification, employing DBP to enhance flowline quality, and utilizing parallel computing to address computational demands, the framework offers a robust solution for high-quality hydrological network mapping and analysis. It paves the way for contributions to more effective water resource management, ecological conservation, and landscape planning. Deep learning GeoAI HEDEM HPC Hydrography LiDAR
28	Le choix des architectures hybrides, une stratégie réaliste pour atteindre l'échelle exaflopique. / The choice of hybrid architectures, a realistic strategy to reach the Exascale. Loiseau, Julien 14 September 2018 (has links) La course à l'Exascale est entamée et tous les pays du monde rivalisent pour présenter un supercalculateur exaflopique à l'horizon 2020-2021.Ces superordinateurs vont servir à des fins militaires, pour montrer la puissance d'une nation, mais aussi pour des recherches sur le climat, la santé, l'automobile, physique, astrophysique et bien d'autres domaines d'application.Ces supercalculateurs de demain doivent respecter une enveloppe énergétique de 1 MW pour des raisons à la fois économiques et environnementales.Pour arriver à produire une telle machine, les architectures classiques doivent évoluer vers des machines hybrides équipées d'accélérateurs tels que les GPU, Xeon Phi, FPGA, etc.Nous montrons que les benchmarks actuels ne nous semblent pas suffisants pour cibler ces applications qui ont un comportement irrégulier.Cette étude met en place une métrique ciblant les aspects limitants des architectures de calcul: le calcul et les communications avec un comportement irrégulier.Le problème mettant en avant la complexité de calcul est le problème académique de Langford.Pour la communication nous proposons notre implémentation du benchmark du Graph500.Ces deux métriques mettent clairement en avant l'avantage de l'utilisation d'accélérateurs, comme des GPUs, dans ces circonstances spécifiques et limitantes pour le HPC.Pour valider notre thèse nous proposons l'étude d'un problème réel mettant en jeu à la fois le calcul, les communications et une irrégularité extrême.En réalisant des simulations de physique et d'astrophysique nous montrons une nouvelle fois l'avantage de l'architecture hybride et sa scalabilité. / The countries of the world are already competing for Exascale and the first exaflopics supercomputer should be release by 2020-2021.These supercomputers will be used for military purposes, to show the power of a nation, but also for research on climate, health, physics, astrophysics and many other areas of application.These supercomputers of tomorrow must respect an energy envelope of 1 MW for reasons both economic and environmental.In order to create such a machine, conventional architectures must evolve to hybrid machines equipped with accelerators such as GPU, Xeon Phi, FPGA, etc.We show that the current benchmarks do not seem sufficient to target these applications which have an irregular behavior.This study sets up a metrics targeting the walls of computational architectures: computation and communication walls with irregular behavior.The problem for the computational wall is the Langford's academic combinatorial problem.We propose our implementation of the Graph500 benchmark in order to target the communication wall.These two metrics clearly highlight the advantage of using accelerators, such as GPUs, in these specific and representative problems of HPC.In order to validate our thesis we propose the study of a real problem bringing into play at the same time the computation, the communications and an extreme irregularity.By performing simulations of physics and astrophysics we show once again the advantage of the hybrid architecture and its scalability. Hpc Benchmark Architectures hybrides Accélérateurs Simulation Sph Hpc Benchmark Architectures hybrides Accelerators Simulation Sph 004.2
29	Métodos bacteriológicos aplicados à tuberculose bovina: comparação de três métodos de descontaminação e de três protocolos para criopreservação de isolados / Bacteriologic methods applied to bovine tuberculosis: comparison of three decontamination methods and three protocols for cryopreservation of isolates Ambrosio, Simone Rodrigues 09 December 2005 (has links) Dada a importância do Programa Nacional de Controle e Erradicação da Brucelose e Tuberculose (PNCEBT), a necessidade de uma eficiente caracterização bacteriológica dos focos como ponto fundamental do sistema de vigilância e as dificuldades encontradas pelos laboratórios quanto aos métodos de isolamento de Mycobacterium bovis fizeram crescer o interesse do meio científico por estudos, sobretudo moleculares, de isolados M. bovis. Para a realização dessas técnicas moleculares, é necessária abundância de massa bacilar, obtida através da manutenção dos isolados em laboratório e repiques em meios de cultura. Entretanto o crescimento fastidioso do M. bovis em meios de cultura traz grandes dificuldades para essas operações. Assim sendo, o presente estudo teve por objetivos: 1º) Comparar três métodos de descontaminação para homogeneizados de órgãos, etapa que precede a semeadura em meios de cultura, onde 60 amostras de tecidos com lesões granulomatosas, provenientes de abatedouros bovinos do Estado de São Paulo, foram colhidas e imersas em solução saturada de Borato de Sódio e transportadas para o Laboratório de Zoonoses Bacterianas do VPS-FMVZ-USP, onde foram processadas até 60 dias após a colheita. Essas amostras foram submetidas a três métodos de descontaminação: Básico (NaOH 4%), Ácido (H2SO4 12%) e 1- Hexadecylpyridinium chloride a 1,5% (HPC) e o quarto método foi representado pela simples diluição com solução salina (controle). Os resultados foram submetidos à comparação de proporções, pelo teste de χ², na qual verificou-se que o método HPC foi o que apresentou menor proporção de contaminação (3%) e maior proporção de sucesso para isolamento de BAAR (40%). 2º) Comparar três diferentes meios criopreservates para M. bovis, foram utilizados 16 isolados identificados pela técnica de spoligotyping. Cada um desses isolados foi solubilizado em três meios (solução salina, 7H9 original e 7H9 modificado), e armazenado em três diferentes temperaturas (-20ºC, -80ºC e -196ºC), sendo descongelado em três diferentes tempos (45, 90 e 120 dias de congelamento). Antes do congelamento e após o descongelamento foram feitos cultivos quantitativos em meios de Stonebrink Leslie. Os porcentuais de redução de Unidades Formadoras de Colônias (UFC) nas diferentes condições foram calculados e comparados entre si através de métodos paramétricos e não-paramétricos. Os resultados obtidos foram: na análise da variável tempo, em 90 dias de congelamento foi observada uma maior proporção de perda de M. bovis, quando comparado ao tempo 120 dias (p=0,0002); na análise da variável temperatura, foi observada uma diferença estatística significativa entre as proporções de perda média nas temperaturas de -20ºC e -80ºC (p<0,05); na análise da variável meio, foi observada uma diferença significativa (p=0,044) entre os meios A e C, para 45 dias de congelamento e -20ºC de temperatura de criopreservação. Embora as medianas dos porcentuais de perdas de UFC terem sido sempre inferiores a 4,2%, permitiram sugerir que o melhor protocolo de criopreservação de isolados de M. bovis é solubilizá-los em 7H9 modificado e mantê-los à temperatura de -20ºC / In the context of the National Program of Control and Eradication of Brucellosis and Tuberculosis (PNCEBT), the necessity of an efficient bacteriologic characterization of the infected herds as a cornerstone of the monitoring system and the difficulties faced by the laboratories regarding the methods for Mycobacterium bovis isolation led to a growing interest for scientific studies, especially molecular, of M. bovis isolates. To use these molecular techniques it is necessary to have an abundant bacillary mass, obtained through the maintenance of isolates in laboratory and replication in culture media. However the fastidious growth of M. bovis in culture media brings out great difficulties for these activities. Thus, the present study has the following objectives: First, to compare three decontamination methods for organ homogenates, phase that precedes the sowing in culture media, 60 samples of tissues with granulomatosis injuries, proceeding from bovine slaughterhouses in the State of São Paulo, were obtained, immersed in sodium borato saturated solution and transported to the Laboratório de Zoonoses Bacterianas of the VPS-FMVZ-USP, where they were processed up to 60 days after the sampling. These samples were submitted to three methods of decontamination: Basic NaOH 4%, Acid (H2SO4 12%) and 1- Hexadecylpyridinium chloride (HPC) 1.5% and a simple dilution with saline solution (control method). The results were analysed by means of the &chi test to compare proportions, and it was verified that HPC method presented the smallest proportion of contamination (3%) and the greatest proportion of success for M. bovis isolation (40%). Second, to compare three different cryopreservation media for M. bovis, 16 isolates identified by the technique of spoligotyping were used. Each one of these isolates was solubilized in three media (original saline solution, 7H9 and 7H9 modified), and stored in three different temperatures (-20ºC, -80ºC and -196ordm;C), and defrosted in three different time periods (45, 90 and 120 days of freezing). Before the freezing and after the unfreezing, quantitative cultivations in Stonebrink Leslie media were carried out. The proportions of Colony-Forming Units (CFU) loss in the different conditions were calculated and compared with one another through parametric and non-parametric methods. The results obtained were: in the analysis of the variable time, at 90 days of freezing a bigger proportion of CFU loss was observed when compared to 120 days (p=0,0002); in the analysis of the variable temperature, a statistically significant difference was observed between the average proportions of CFU loss for the temperatures of -20ºC and -80ºC (p<0,05); in the analysis of the variable media, a significant difference was observed (p=0,044) between the media A and C, for 45 days of freezing and -20ºC of cryopreservation temperature. Althougth the medium ones of the proportion of losses of CFU to always have been inferior 4,2%, had allowed to suggest that the best protocol for cryopreservation of M. bovis isolates is to solubilize them in 7H9 modified medium and to keep them at a temperature of -20ºC 7H9 7H9 Criopreservação Cryopreservation Decontamination Descontaminação HPC HPC Mycobacterium bovis Mycobacterium bovis
30	Analyse statistique et interprétation automatique de données diagraphiques pétrolières différées à l’aide du calcul haute performance / Statistical analysis and automatic interpretation of oil logs using high performance computing Bruned, Vianney 18 October 2018 (has links) Dans cette thèse, on s'intéresse à l’automatisation de l’identification et de la caractérisation de strates géologiques à l’aide des diagraphies de puits. Au sein d’un puits, on détermine les strates géologiques grâce à la segmentation des diagraphies assimilables à des séries temporelles multivariées. L’identification des strates de différents puits d’un même champ pétrolier nécessite des méthodes de corrélation de séries temporelles. On propose une nouvelle méthode globale de corrélation de puits utilisant les méthodes d’alignement multiple de séquences issues de la bio-informatique. La détermination de la composition minéralogique et de la proportion des fluides au sein d’une formation géologique se traduit en un problème inverse mal posé. Les méthodes classiques actuelles sont basées sur des choix d’experts consistant à sélectionner une combinaison de minéraux pour une strate donnée. En raison d’un modèle à la vraisemblance non calculable, une approche bayésienne approximée (ABC) aidée d’un algorithme de classification basé sur la densité permet de caractériser la composition minéralogique de la couche géologique. La classification est une étape nécessaire afin de s’affranchir du problème d’identifiabilité des minéraux. Enfin, le déroulement de ces méthodes est testé sur une étude de cas. / In this thesis, we investigate the automation of the identification and the characterization of geological strata using well logs. For a single well, geological strata are determined thanks to the segmentation of the logs comparable to multivariate time series. The identification of strata on different wells from the same field requires correlation methods for time series. We propose a new global method of wells correlation using multiple sequence alignment algorithms from bioinformatics. The determination of the mineralogical composition and the percentage of fluids inside a geological stratum results in an ill-posed inverse problem. Current methods are based on experts’ choices: the selection of a subset of mineral for a given stratum. Because of a model with a non-computable likelihood, an approximate Bayesian method (ABC) assisted with a density-based clustering algorithm can characterize the mineral composition of the geological layer. The classification step is necessary to deal with the identifiability issue of the minerals. At last, the workflow is tested on a study case. Problème Inverse Statistiques Machine Learning Petro-Physique Hpc Inverse Problem Machine Learning Statistics Petrophysics Hpc

Search results