Return to search

Machine learning methods for genomic high-content screen data analysis applied to deduce organization of endocytic network

High-content screens are widely used to get insight on mechanistic organization of biological systems. Chemical and/or genomic interferences are used to modulate molecular machinery, then light microscopy and quantitative image analysis yield a large number of parameters describing phenotype. However, extracting functional information from such high-content datasets (e.g. links between cellular processes or functions of unknown genes) remains challenging. This work is devoted to the analysis of a multi-parametric image-based genomic screen of endocytosis, the process whereby cells uptake cargoes (signals and nutrients) and distribute them into different subcellular compartments. The complexity of the quantitative endocytic data was approached using different Machine Learning techniques, namely, Clustering methods, Bayesian networks, Principal and Independent component analysis, Artificial neural networks. The main goal of such an analysis is to predict possible modes of action of screened genes and also to find candidate genes that can be involved in a process of interest. The degree of freedom for the multidimensional phenotypic space was identified using the data distributions, and then the high-content data were deconvolved into separate signals from different cellular modules. Some of those basic signals (phenotypic traits) were straightforward to interpret in terms of known molecular processes; the other components gave insight into interesting directions for further research. The phenotypic profile of perturbation of individual genes are sparse in coordinates of the basic signals, and, therefore, intrinsically suggest their functional roles in cellular processes. Being a very fundamental process, endocytosis is specifically modulated by a variety of different pathways in the cell; therefore, endocytic phenotyping can be used for analysis of non-endocytic modules in the cell. Proposed approach can be also generalized for analysis of other high-content screens.:Contents
Objectives
Chapter 1 Introduction
1.1 High-content biological data
1.1.1 Different perturbation types for HCS
1.1.2 Types of observations in HTS
1.1.3 Goals and outcomes of MP HTS
1.1.4 An overview of the classical methods of analysis of biological HT- and HCS data
1.2 Machine learning for systems biology
1.2.1 Feature selection
1.2.2 Unsupervised learning
1.2.3 Supervised learning
1.2.4 Artificial neural networks
1.3 Endocytosis as a system process
1.3.1 Endocytic compartments and main players
1.3.2 Relation to other cellular processes
Chapter 2 Experimental and analytical techniques
2.1 Experimental methods
2.1.1 RNA interference
2.1.2 Quantitative multiparametric image analysis
2.2 Detailed description of the endocytic HCS dataset
2.2.1 Basic properties of the endocytic dataset
2.2.2 Control subset of genes
2.3 Machine learning methods
2.3.1 Latent variables models
2.3.2 Clustering
2.3.3 Bayesian networks
2.3.4 Neural networks
Chapter 3 Results
3.1 Selection of labeled data for training and validation based on KEGG information about genes pathways
3.2 Clustering of genes
3.2.1 Comparison of clustering techniques on control dataset
3.2.2 Clustering results
3.3 Independent components as basic phenotypes
3.3.1 Algorithm for identification of the best number of independent components
3.3.2 Application of ICA on the full dataset and on separate assays of the screen
3.3.3 Gene annotation based on revealed phenotypes
3.3.4 Searching for genes with target function
3.4 Bayesian network on endocytic parameters
3.4.1 Prediction of pathway based on parameters values using Naïve Bayesian Classifier
3.4.2 General Bayesian Networks
3.5 Neural networks
3.5.1 Autoencoders as nonlinear ICA
3.5.2 siRNA sequence motives discovery with deep NN
3.6 Biological results
3.6.1 Rab11 ZNF-specific phenotype found by ICA
3.6.2 Structure of BN revealed dependency between endocytosis and cell adhesion
Chapter 4 Discussion
4.1 Machine learning approaches for discovery of phenotypic patterns
4.1.1 Functional annotation of unknown genes based on phenotypic profiles
4.1.2 Candidate genes search
4.2 Adaptation to other HCS data and generalization
Chapter 5 Outlook and future perspectives
5.1 Handling sequence-dependent off-target effects with neural networks
5.2 Transition between machine learning and systems biology models
Acknowledgements
References
Appendix
A.1 Full list of cellular and endocytic parameters
A.2 Description of independent components of the full dataset
A.3 Description of independent components extracted from separate assays of the HCS

Identiferoai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:86459
Date13 July 2023
CreatorsNikitina, Kseniia
ContributorsSbalzarini, Ivo, von Toussaint, Udo, Technische Universität Dresden
Source SetsHochschulschriftenserver (HSSS) der SLUB Dresden
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/publishedVersion, doc-type:doctoralThesis, info:eu-repo/semantics/doctoralThesis, doc-type:Text
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.002 seconds