Spelling suggestions: "subject:"genomannotation"" "subject:"reannotation""
1 |
Machine learning methods for genomic high-content screen data analysis applied to deduce organization of endocytic networkNikitina, Kseniia 13 July 2023 (has links)
High-content screens are widely used to get insight on mechanistic organization of biological systems. Chemical and/or genomic interferences are used to modulate molecular machinery, then light microscopy and quantitative image analysis yield a large number of parameters describing phenotype. However, extracting functional information from such high-content datasets (e.g. links between cellular processes or functions of unknown genes) remains challenging. This work is devoted to the analysis of a multi-parametric image-based genomic screen of endocytosis, the process whereby cells uptake cargoes (signals and nutrients) and distribute them into different subcellular compartments. The complexity of the quantitative endocytic data was approached using different Machine Learning techniques, namely, Clustering methods, Bayesian networks, Principal and Independent component analysis, Artificial neural networks. The main goal of such an analysis is to predict possible modes of action of screened genes and also to find candidate genes that can be involved in a process of interest. The degree of freedom for the multidimensional phenotypic space was identified using the data distributions, and then the high-content data were deconvolved into separate signals from different cellular modules. Some of those basic signals (phenotypic traits) were straightforward to interpret in terms of known molecular processes; the other components gave insight into interesting directions for further research. The phenotypic profile of perturbation of individual genes are sparse in coordinates of the basic signals, and, therefore, intrinsically suggest their functional roles in cellular processes. Being a very fundamental process, endocytosis is specifically modulated by a variety of different pathways in the cell; therefore, endocytic phenotyping can be used for analysis of non-endocytic modules in the cell. Proposed approach can be also generalized for analysis of other high-content screens.:Contents
Objectives
Chapter 1 Introduction
1.1 High-content biological data
1.1.1 Different perturbation types for HCS
1.1.2 Types of observations in HTS
1.1.3 Goals and outcomes of MP HTS
1.1.4 An overview of the classical methods of analysis of biological HT- and HCS data
1.2 Machine learning for systems biology
1.2.1 Feature selection
1.2.2 Unsupervised learning
1.2.3 Supervised learning
1.2.4 Artificial neural networks
1.3 Endocytosis as a system process
1.3.1 Endocytic compartments and main players
1.3.2 Relation to other cellular processes
Chapter 2 Experimental and analytical techniques
2.1 Experimental methods
2.1.1 RNA interference
2.1.2 Quantitative multiparametric image analysis
2.2 Detailed description of the endocytic HCS dataset
2.2.1 Basic properties of the endocytic dataset
2.2.2 Control subset of genes
2.3 Machine learning methods
2.3.1 Latent variables models
2.3.2 Clustering
2.3.3 Bayesian networks
2.3.4 Neural networks
Chapter 3 Results
3.1 Selection of labeled data for training and validation based on KEGG information about genes pathways
3.2 Clustering of genes
3.2.1 Comparison of clustering techniques on control dataset
3.2.2 Clustering results
3.3 Independent components as basic phenotypes
3.3.1 Algorithm for identification of the best number of independent components
3.3.2 Application of ICA on the full dataset and on separate assays of the screen
3.3.3 Gene annotation based on revealed phenotypes
3.3.4 Searching for genes with target function
3.4 Bayesian network on endocytic parameters
3.4.1 Prediction of pathway based on parameters values using Naïve Bayesian Classifier
3.4.2 General Bayesian Networks
3.5 Neural networks
3.5.1 Autoencoders as nonlinear ICA
3.5.2 siRNA sequence motives discovery with deep NN
3.6 Biological results
3.6.1 Rab11 ZNF-specific phenotype found by ICA
3.6.2 Structure of BN revealed dependency between endocytosis and cell adhesion
Chapter 4 Discussion
4.1 Machine learning approaches for discovery of phenotypic patterns
4.1.1 Functional annotation of unknown genes based on phenotypic profiles
4.1.2 Candidate genes search
4.2 Adaptation to other HCS data and generalization
Chapter 5 Outlook and future perspectives
5.1 Handling sequence-dependent off-target effects with neural networks
5.2 Transition between machine learning and systems biology models
Acknowledgements
References
Appendix
A.1 Full list of cellular and endocytic parameters
A.2 Description of independent components of the full dataset
A.3 Description of independent components extracted from separate assays of the HCS
|
2 |
Ein Repräsentationsformat zur standardisierten Beschreibung und wissensbasierten Modellierung genomischer ExpressionsdatenSchober, Daniel 08 June 2006 (has links)
Die Auswertung von Microarray-Daten beginnt oft mit information retrieval-Ansätzen, welche die Datenmassen auf eine im Hinblick auf eine bestimmte Fragestellung besonders interessante und überschaubare Menge von Genen bzw. probe set IDs reduzieren sollen. Vorraussetzung für eine effiziente Suche im Datenbestand ist jedoch eine Semantisierung bzw. Formalisierung der verwendeten Datenformate. Hier wird eine Ontologie als standardisiertes und semantisch definiertes Repräsentationskonstrukt vorgestellt, welches die Formalisierung von Fachwissen in einem interaktiven Wissensmodell erlaubt, das umfassend abgefragt, konsistent interpretiert und gegebenenfalls automatisiert weiterverarbeitet werden kann. Anhand einer molekularbiologischen Ontologie aus 1200 hierarchisch strukturierten Begriffen und am Beispiel des Toll-Like Receptor-Signalwegs wird aufgezeigt, wie ein solch ein objektorientiertes Beschreibungsvokabular zur Annotierung von Genen auf Affymetrix-Microarrays genutzt werden kann. Die Annotationsbegriffe werden über ontologische Konzepte, deren Eigenschaften und deren semantische Verbindungen (relationale Slots) im Wissensbank-Editor Protégé-2000 modelliert. Annotation bedeutet hier ein Gen formal in einen definierten funktionalen Kontext einzubetten. In der Anwendung der Wissensbank entspricht eine Annotation einem "drag and drop" von Genen in ontologische, die Funktion dieser Gene beschreibende, Konzepte. Die weitergehende kontextuale Annotation erfolgt über eine Vernetzung der Gene zu anderen Konzepten oder Genen. Das so erstellte vernetzte Wissensmodell (die knowledgebase) ermöglicht ein inhaltsbasiertes, assoziatives und kontextgeleitetes "Wissens-Browsing". Ontologisch annotierte Gendaten erlauben auch die Anwendung automatischer datengetriebener Visualisierungsstrategien, wie am Beispiel semantischer Netze gezeigt wird. Eine ontologische Anfrageschnittstelle erlaubt auch semantisch komplexe Anfragen an den Datenbestand bei erhöhter Trefferquote und Präzision. / Functional gene annotations provide important search targets and cluster criteria. We introduce an annotation system that exploits the possibilities of modern knowledge management tools, i.e. ontological querying, inference, networking of annotations and automatic datadriven visualization of the annotated model. The Gandr (gene annotation data representation) knowledgebase is an ontological framework for laboratory-specific gene annotation and knowledgemanagement. Gandr uses Protégé-2000 for editing, querying and visualizing microarray data and annotations. Genes can be annotated with provided, newly created or imported ontological concepts. Annotated genes can inherit assigned concept properties and can be related to each other. The resulting knowledgebase can be visualized as interactive semantic network of nodes and edges representing genes with annotations and their functional relationships. This allows for immediate and associative gene context exploration. Ontological query techniques allow for powerful data access. Annotating genes with formal conceptual descriptions can be performed using ‘drag and drop’ of one or more gene instances onto an annotating concept. Compared with unstructured annotation systems, the annotation process itself becomes faster and leads to annotation schemes of better quality owing to enforcement of constraints provided by the ontology. GandrKB enables lab-bench scientists to query for implicit domain knowledge, inferred from the ontological domain model. Full access to data semantics through queries for properties and relationships ensures a more complete and adequate reply of the system.
|
Page generated in 0.1423 seconds