1 |
Patterns of somatic genome rearrangement in human cancerRoberts, Nicola Diane January 2018 (has links)
Cancer development is driven by somatic genome alterations, ranging from single point mutations to larger structural variants (SV) affecting kilobases to megabases of one or more chromosomes. Studies of somatic rearrangement have previously been limited by a paucity of whole genome sequencing data, and a lack of methods for comprehensive structural classification and downstream analysis. The ICGC project on the Pan-Cancer Analysis of Whole Genomes provides an unprecedented opportunity to analyse somatic SVs at base-pair resolution in more than 2500 samples from 30 common cancer types. In this thesis, I build on a recently developed SV classification pipeline to present a census of rearrangement across the pan-cancer cohort, including chromoplexy, replicative two-jumps, and templated insertions connecting as many as eight distant loci. By identifying the precise structure of individual breakpoint junctions and separating out complex clusters, the classification scheme empowers detailed exploration of all simple SV properties and signatures. After illustrating the various SV classes and their frequency across cancer types and samples, Chapter 2 focuses on structural properties including event size and breakpoint homology. Then, in Chapter 3, I consider the SV distribution across the genome, and show patterns of association with various genome properties. Upon examination of rearrangement hotspot loci, I describe tissue-specific fragile site deletion patterns, and a variety of SV profiles around known cancer genes, including recurrent templated insertion cycles affecting TERT and RB1. Turning to co-occurring alteration patterns, Chapter 4 introduces the Hierarchical Dirichlet Process as a non-parametric Bayesian model of mutational signatures. After developing methods for consensus signature extraction, I detour to the domain of single nucleotide variants to test the HDP method on real and simulated data, and to illustrate its utility for simultaneous signature discovery and matching. Finally, I return to the PCAWG SV dataset, and extract SV signatures delineated by structural class, size, and replication timing. In Chapter 5, I move on to the complex SV clusters (largely set aside throughout Chapters 2—4) , and develop an improved breakpoint clustering method to subdivide the complex rearrangement landscape. I propose a raft of summary metrics for groups of five or more breakpoint junctions, and explore their utility for preliminary classification of chromothripsis and other complex phenomena. This comprehensive study of somatic genome rearrangement provides detailed insight into SV patterns and properties across event classes, genome regions, samples, and cancer types. To extrapolate from the progress made in this thesis, Chapter 6 suggests future strategies for addressing unanswered questions about complex SV mechanisms, annotation of functional consequences, and selection analysis to discover novel drivers of the cancer phenotype.
|
2 |
Heterogeneous Sensor Data based Online Quality Assurance for Advanced Manufacturing using Spatiotemporal ModelingLiu, Jia 21 August 2017 (has links)
Online quality assurance is crucial for elevating product quality and boosting process productivity in advanced manufacturing. However, the inherent complexity of advanced manufacturing, including nonlinear process dynamics, multiple process attributes, and low signal/noise ratio, poses severe challenges for both maintaining stable process operations and establishing efficacious online quality assurance schemes.
To address these challenges, four different advanced manufacturing processes, namely, fused filament fabrication (FFF), binder jetting, chemical mechanical planarization (CMP), and the slicing process in wafer production, are investigated in this dissertation for applications of online quality assurance, with utilization of various sensors, such as thermocouples, infrared temperature sensors, accelerometers, etc. The overarching goal of this dissertation is to develop innovative integrated methodologies tailored for these individual manufacturing processes but addressing their common challenges to achieve satisfying performance in online quality assurance based on heterogeneous sensor data. Specifically, three new methodologies are created and validated using actual sensor data, namely,
(1) Real-time process monitoring methods using Dirichlet process (DP) mixture model for timely detection of process changes and identification of different process states for FFF and CMP. The proposed methodology is capable of tackling non-Gaussian data from heterogeneous sensors in these advanced manufacturing processes for successful online quality assurance.
(2) Spatial Dirichlet process (SDP) for modeling complex multimodal wafer thickness profiles and exploring their clustering effects. The SDP-based statistical control scheme can effectively detect out-of-control wafers and achieve wafer thickness quality assurance for the slicing process with high accuracy.
(3) Augmented spatiotemporal log Gaussian Cox process (AST-LGCP) quantifying the spatiotemporal evolution of porosity in binder jetting parts, capable of predicting high-risk areas on consecutive layers. This work fills the long-standing research gap of lacking rigorous layer-wise porosity quantification for parts made by additive manufacturing (AM), and provides the basis for facilitating corrective actions for product quality improvements in a prognostic way.
These developed methodologies surmount some common challenges of advanced manufacturing which paralyze traditional methods in online quality assurance, and embody key components for implementing effective online quality assurance with various sensor data. There is a promising potential to extend them to other manufacturing processes in the future. / Ph. D. / This dissertation work develops novel online quality assurance methodologies for advanced manufacturing using various sensor data. Four advanced manufacturing processes, including fused filament fabrication, binder jetting, chemical mechanical planarization, and wafer slicing process, are investigated in this research. The developed methodologies address some common challenges in the aforementioned processes, such as nonlinear process dynamics and high variety in sensor data dimensions, which have severely hindered the effectiveness of traditional online quality assurance methods. Consequently, the proposed research accomplishes satisfying performance in defect detection and quality prediction for the advanced manufacturing processes.
In this dissertation, the research methodologies are constructed in both space and time domains based on different types of sensor data. Sensor data representation and integration for a variety of data formats (e.g., online data stream, profile data, image data) with the dimensionality covering a wide range (from ~100 to ~105 ) are researched to extract effective features that are sensitive to manufacturing process defects; the devised methods, based on the extracted features, utilize spatiotemporal analysis to realize timely detection and accurate prediction of process defects. These integrated methodologies have a promising potential to be extended to other advanced manufacturing processes for efficacious process monitoring and quality assurance.
The accomplished work in this dissertation is an effective effort towards sustainable operations of advanced manufacturing. The achieved performance not only enables improvement in defect detection and quality prediction, but also lays the foundation for future implementation of corrective actions that can automatically mitigate the process defects.
|
3 |
Système complet d’acquisition vidéo, de suivi de trajectoires et de modélisation comportementale pour des environnements 3D naturellement encombrés : application à la surveillance apicole / Full process of acquisition, multi-target tracking, behavioral modeling for naturally crowded environments : application to beehives monitoringChiron, Guillaume 28 November 2014 (has links)
Ce manuscrit propose une approche méthodologique pour la constitution d’une chaîne complète de vidéosurveillance pour des environnements naturellement encombrés. Nous identifions et levons un certain nombre de verrous méthodologiques et technologiques inhérents : 1) à l’acquisition de séquences vidéo en milieu naturel, 2) au traitement d’images, 3) au suivi multi-cibles, 4) à la découverte et la modélisation de motifs comportementaux récurrents, et 5) à la fusion de données. Le contexte applicatif de nos travaux est la surveillance apicole, et en particulier, l’étude des trajectoires des abeilles en vol devant la ruche. De ce fait, cette thèse se présente également comme une étude de faisabilité et de prototypage dans le cadre des deux projets interdisciplinaires EPERAS et RISQAPI (projets menées en collaboration avec l’INRA Magneraud et le Muséum National d’Histoire Naturelle). Il s’agit pour nous informaticiens et pour les biologistes qui nous ont accompagnés, d’un domaine d’investigation totalement nouveau, pour lequel les connaissances métiers, généralement essentielles à ce genre d’applications, restent encore à définir. Contrairement aux approches existantes de suivi d’insectes, nous proposons de nous attaquer au problème dans l’espace à trois dimensions grâce à l’utilisation d’une caméra stéréovision haute fréquence. Dans ce contexte, nous détaillons notre nouvelle méthode de détection de cibles appelée segmentation HIDS. Concernant le calcul des trajectoires, nous explorons plusieurs approches de suivi de cibles, s’appuyant sur plus ou moins d’a priori, susceptibles de supporter les conditions extrêmes de l’application (e.g. cibles nombreuses, de petite taille, présentant un mouvement chaotique). Une fois les trajectoires collectées, nous les organisons selon une structure de données hiérarchique et mettons en œuvre une approche Bayésienne non-paramétrique pour la découverte de comportements émergents au sein de la colonie d’insectes. L’analyse exploratoire des trajectoires issues de la scène encombrée s’effectue par classification non supervisée, simultanément sur des niveaux sémantiques différents, et où le nombre de clusters pour chaque niveau n’est pas défini a priori mais est estimé à partir des données. Cette approche est dans un premier temps validée à l’aide d’une pseudo-vérité terrain générée par un Système Multi-Agents, puis dans un deuxième temps appliquée sur des données réelles. / This manuscript provides the basis for a complete chain of videosurveillence for naturally cluttered environments. In the latter, we identify and solve the wide spectrum of methodological and technological barriers inherent to : 1) the acquisition of video sequences in natural conditions, 2) the image processing problems, 3) the multi-target tracking ambiguities, 4) the discovery and the modeling of recurring behavioral patterns, and 5) the data fusion. The application context of our work is the monitoring of honeybees, and in particular the study of the trajectories bees in flight in front of their hive. In fact, this thesis is part a feasibility and prototyping study carried by the two interdisciplinary projects EPERAS and RISQAPI (projects undertaken in collaboration with INRA institute and the French National Museum of Natural History). It is for us, computer scientists, and for biologists who accompanied us, a completely new area of investigation for which the scientific knowledge, usually essential for such applications, are still in their infancy. Unlike existing approaches for monitoring insects, we propose to tackle the problem in the three-dimensional space through the use of a high frequency stereo camera. In this context, we detail our new target detection method which we called HIDS segmentation. Concerning the computation of trajectories, we explored several tracking approaches, relying on more or less a priori, which are able to deal with the extreme conditions of the application (e.g. many targets, small in size, following chaotic movements). Once the trajectories are collected, we organize them according to a given hierarchical data structure and apply a Bayesian nonparametric approach for discovering emergent behaviors within the colony of insects. The exploratory analysis of the trajectories generated by the crowded scene is performed following an unsupervised classification method simultaneously over different levels of semantic, and where the number of clusters for each level is not defined a priori, but rather estimated from the data only. This approach is has been validated thanks to a ground truth generated by a Multi-Agent System. Then we tested it in the context of real data.
|
4 |
Modèle bayésien non paramétrique pour la segmentation jointe d'un ensemble d'images avec des classes partagées / Bayesian nonparametric model for joint segmentation of a set of images with shared classesSodjo, Jessica 18 September 2018 (has links)
Ce travail porte sur la segmentation jointe d’un ensemble d’images dans un cadre bayésien.Le modèle proposé combine le processus de Dirichlet hiérarchique (HDP) et le champ de Potts.Ainsi, pour un groupe d’images, chacune est divisée en régions homogènes et les régions similaires entre images sont regroupées en classes. D’une part, grâce au HDP, il n’est pas nécessaire de définir a priori le nombre de régions par image et le nombre de classes, communes ou non.D’autre part, le champ de Potts assure une homogénéité spatiale. Les lois a priori et a posteriori en découlant sont complexes rendant impossible le calcul analytique d’estimateurs. Un algorithme de Gibbs est alors proposé pour générer des échantillons de la loi a posteriori. De plus,un algorithme de Swendsen-Wang généralisé est développé pour une meilleure exploration dela loi a posteriori. Enfin, un algorithme de Monte Carlo séquentiel a été défini pour l’estimation des hyperparamètres du modèle.Ces méthodes ont été évaluées sur des images-test et sur des images naturelles. Le choix de la meilleure partition se fait par minimisation d’un critère indépendant de la numérotation. Les performances de l’algorithme sont évaluées via des métriques connues en statistiques mais peu utilisées en segmentation d’image. / This work concerns the joint segmentation of a set images in a Bayesian framework. The proposed model combines the hierarchical Dirichlet process (HDP) and the Potts random field. Hence, for a set of images, each is divided into homogeneous regions and similar regions between images are grouped into classes. On the one hand, thanks to the HDP, it is not necessary to define a priori the number of regions per image and the number of classes, common or not.On the other hand, the Potts field ensures a spatial consistency. The arising a priori and a posteriori distributions are complex and makes it impossible to compute analytically estimators. A Gibbs algorithm is then proposed to generate samples of the distribution a posteriori. Moreover,a generalized Swendsen-Wang algorithm is developed for a better exploration of the a posteriori distribution. Finally, a sequential Monte Carlo sampler is defined for the estimation of the hyperparameters of the model.These methods have been evaluated on toy examples and natural images. The choice of the best partition is done by minimization of a numbering free criterion. The performance are assessed by metrics well-known in statistics but unused in image segmentation.
|
Page generated in 0.0696 seconds