Global ETD Search

1	Probabilistic Modeling for Whole Metagenome Profiling Burks, David 05 1900 (has links) To address the shortcomings in existing Markov model implementations in handling large amount of metagenomic data with comparable or better accuracy in classification, we developed a new algorithm based on pseudo-count supplemented standard Markov model (SMM), which leverages the power of higher order models to more robustly classify reads at different taxonomic levels. Assessment on simulated metagenomic datasets demonstrated that overall SMM was more accurate in classifying reads to their respective taxa at all ranks compared to the interpolated methods. Higher order SMMs (9th order or greater) also outperformed BLAST alignments in assigning taxonomic labels to metagenomic reads at different taxonomic ranks (genus and higher) on tests that masked the read originating species (genome models) in the database. Similar results were obtained by masking at other taxonomic ranks in order to simulate the plausible scenarios of non-representation of the source of a read at different taxonomic levels in the genome database. The performance gap became more pronounced with higher taxonomic levels. To eliminate contaminations in datasets and to further improve our alignment-free approach, we developed a new framework based on a genome segmentation and clustering algorithm. This framework allowed removal of adapter sequences and contaminant DNA, as well as generation of clusters of similar segments, which were then used to sample representative read fragments to constitute training datasets. The parameters of a logistic regression model were learnt from these training datasets using a Bayesian optimization procedure. This allowed us to establish thresholds for classifying metagenomic reads by SMM. This led to the development of a Python-based frontend that combines our SMM algorithm with the logistic regression optimization, named POSMM (Python Optimized Standard Markov Model). POSMM provides a much-needed alternative to metagenome profiling programs. Our algorithm that builds the genome models on the fly, and thus obviates the need to build a database, complements alignment-based classification and can thus be used in concert with alignment-based classifiers to raise the bar in metagenome profiling. Bioinformatics Metagenomics Markov Alignment-Free POSMM Taxonomy Classification Markovian Jensen-Shannon Divergence Segmentation Clustering
2	Development of novel Classical and Quantum Information Theory Based Methods for the Detection of Compensatory Mutations in MSAs Gültas, Mehmet 18 September 2013 (has links) Multiple Sequenzalignments (MSAs) von homologen Proteinen sind nützliche Werkzeuge, um kompensatorische Mutationen zwischen nicht-konservierten Residuen zu charakterisieren. Die Identifizierung dieser Residuen in MSAs ist eine wichtige Aufgabe um die strukturellen Grundlagen und molekularen Mechanismen von Proteinfunktionen besser zu verstehen. Trotz der vielen Anzahl an Literatur über kompensatorische Mutationen sowie über die Sequenzkonservierungsanalyse für die Erkennung von wichtigen Residuen, haben vorherige Methoden meistens die biochemischen Eigenschaften von Aminosäuren nicht mit in Betracht gezogen, welche allerdings entscheidend für die Erkennung von kompensatorischen Mutationssignalen sein können. Jedoch werden kompensatorische Mutationssignale in MSAs oft durch das Rauschen verfälscht. Aus diesem Grund besteht ein weiteres Problem der Bioinformatik in der Trennung signifikanter Signale vom phylogenetischen Rauschen und beziehungslosen Paarsignalen. Das Ziel dieser Arbeit besteht darin Methoden zu entwickeln, welche biochemische Eigenschaften wie Ähnlichkeiten und Unähnlichkeiten von Aminosäuren in der Identifizierung von kompensatorischen Mutationen integriert und sich mit dem Rauschen auseinandersetzt. Deshalb entwickeln wir unterschiedliche Methoden basierend auf klassischer- und quantum Informationstheorie sowie multiple Testverfahren. Unsere erste Methode basiert auf der klassischen Informationstheorie. Diese Methode betrachtet hauptsächlich BLOSUM62-unähnliche Paare von Aminosäuren als ein Modell von kompensatorischen Mutationen und integriert sie in die Identifizierung von wichtigen Residuen. Um diese Methode zu ergänzen, entwickeln wir unsere zweite Methode unter Verwendung der Grundlagen von quantum Informationstheorie. Diese neue Methode unterscheidet sich von der ersten Methode durch gleichzeitige Modellierung ähnlicher und unähnlicher Signale in der kompensatorischen Mutationsanalyse. Des Weiteren, um signifikante Signale vom Rauschen zu trennen, entwickeln wir ein MSA-spezifisch statistisches Modell in Bezug auf multiple Testverfahren. Wir wenden unsere Methode für zwei menschliche Proteine an, nämlich epidermal growth factor receptor (EGFR) und glucokinase (GCK). Die Ergebnisse zeigen, dass das MSA-spezifisch statistische Modell die signifikanten Signale vom phylogenetischen Rauschen und von beziehungslosen Paarsignalen trennen kann. Nur unter Berücksichtigung BLOSUM62-unähnlicher Paare von Aminosäuren identifiziert die erste Methode erfolgreich die krankheits-assoziierten wichtigen Residuen der beiden Proteine. Im Gegensatz dazu, durch die gleichzeitige Modellierung ähnlicher und unähnlicher Signale von Aminosäurepaare ist die zweite Methode sensibler für die Identifizierung von katalytischen und allosterischen Residuen. 510 False discovery rate (FDR) Coupled Mutation Finder Quantum Coupled Mutation Finder Beta distribution Mutual Information Quantum Jensen-Shannon divergence Quantum information theory Information theory Jensen-Shannon divergence MSA EGFR GCK Informatik (PPN619939052)
3	Probabilistic Models to Detect Important Sites in Proteins Dang, Truong Khanh Linh 24 September 2020 (has links) No description available. 510 Protein structural transition Graph algorithms Generalized Viterbi algorithm Jensen–Shannon divergence Random Forest DNA-binding sites Informatik (PPN619939052)
4	Computational Analysis of Thalamocortical Communication of Auditory Information using Pairwise Spike Recordings / Beräkningsanalys av thalamokortikal kommunikation av auditorisk information med hjälp av parvisa neuronala registreringar av aktionspotentialer Guo, Xinxing January 2022 (has links) Investigating the properties and mechanisms of coordination among neurons plays an important role in understanding how the brain encodes information and performs in thalamocortical processing in the auditory system. Whether the coordinated neuronal spikes in the auditory thalamus enhance the thalamocortical communications in the auditory cortex (AC) is the main concern in this project. Researchers are mostly focusing on the investigation of the V1 and V2 in visual system and corticortical circuits in auditory system using neuronal pairwise correlations as the method. However, what we explored in this project is the coordination among neurons in thalamocortical circuits. we applied the Jensen-Shannon divergence method to measure the similarity between two distributions and analyze the coordination in thalamus neurons and different parts of AC in ascending pathway and descending pathway of auditory system respectively. At the same time, we designed an algorithm to calculated spiking coordination. The result shows that the coordination pattern differs in separate pathway when keeping sound stimulation and basal forebrain (BF) stimulation on or off. In ascending pathway, the coordination in thalamus neurons precedes information to AC when the brain is silent, keeping sound and BF stimulation off. In descending pathway, the coordination mainly in the superficial area of AC precedes information to thalamus. The coordination is lower in the case of keeping sound on. In the future, more data on rats can be verified using our method and algorithm to investigate the coordinated spikes in auditory system. / Att undersöka egenskaperna och mekanismerna för koordination mellan neuroner spelar en viktig roll för att förstå hur hjärnan kodar information och fungerar i talamokortikal bearbetning i hörselsystemet. Huruvida de koordinerade neuronala spikarna i den auditiva thalamus förstärker den talamokortikala kommunikationen i den auditiva cortex (AC) är huvudproblemet i detta projekt. Forskare fokuserar mestadels på undersökningen av V1 och V2 i visuella system och kortikokortikala kretsar i hörselsystemet med hjälp av neuronala parvisa korrelationer som metod. Men vad vi utforskade i detta projekt är koordinationen mellan neuroner i talamokortikala kretsar. vi tillämpade Jensen-Shannon-divergensmetoden för att mäta likheten mellan två distributioner och analysera koordinationen i thalamusneuroner och olika delar av AC i stigande bana respektive fallande bana i hörselsystemet. Samtidigt designade vi en algoritm för att beräkna spikkoordination. Resultatet visar att koordinationsmönstret skiljer sig åt i separata vägar när ljudstimulering och basal framhjärnsstimulering (BF) hålls på eller av. I stigande väg föregår koordinationen i talamusneuroner information till AC när hjärnan är tyst, vilket håller ljud och BF-stimulering borta. I fallande väg föregår koordinationen huvudsakligen i det ytliga området av AC information till thalamus. Koordinationen är lägre när det gäller att hålla ljud på. I framtiden kan mer data om råttor verifieras med vår metod och algoritm för att undersöka de samordnade spikarna i hörselsystemet. Thalamocortical basal forebrain stimulation Jensen-Shannon divergence auditory system coordination Talokortisk stimulering av bashjärnan Jensen-Shannon divergens hörselsystem koordination Computer and Information Sciences Data- och informationsvetenskap
5	Méthodes non-paramétriques pour l'apprentissage et la détection de dissimilarité statistique multivariée / Nonparametric methods for learning and detecting multivariate statistical dissimilarity Lhéritier, Alix 23 November 2015 (has links) Cette thèse présente trois contributions en lien avec l'apprentissage et la détection de dissimilarité statistique multivariée, problématique d'importance primordiale pour de nombreuses méthodes d'apprentissage utilisées dans un nombre croissant de domaines. La première contribution introduit la notion de taille d'effet multivariée non-paramétrique, éclairant la nature de la dissimilarité détectée entre deux jeux de données, en deux étapes. La première consiste en une décomposition d'une mesure de dissimilarité (divergence de Jensen-Shannon) visant à la localiser dans l'espace ambiant, tandis que la seconde génère un résultat facilement interprétable en termes de grappes de points de forte discrépance et en proximité spatiale. La seconde contribution présente le premier test non-paramétrique d'homogénéité séquentiel, traitant les données issues de deux jeux une à une--au lieu de considérer ceux-ci- in extenso. Le test peut ainsi être arrêté dès qu'une évidence suffisamment forte est observée, offrant une flexibilité accrue tout en garantissant un contrôle del'erreur de type I. Sous certaines conditions, nous établissons aussi que le test a asymptotiquement une probabilité d'erreur de type II tendant vers zéro. La troisième contribution consiste en un test de détection de changement séquentiel basé sur deux fenêtres glissantes sur lesquelles un test d'homogénéité est effectué, avec des garanties sur l'erreur de type I. Notre test a une empreinte mémoire contrôlée et, contrairement à des méthodes de l'état de l'art qui ont aussi un contrôle sur l'erreur de type I, a une complexité en temps constante par observation, le rendant adapté aux flux de données. / In this thesis, we study problems related to learning and detecting multivariate statistical dissimilarity, which are of paramount importance for many statistical learning methods nowadays used in an increasingly number of fields. This thesis makes three contributions related to these problems. The first contribution introduces a notion of multivariate nonparametric effect size shedding light on the nature of the dissimilarity detected between two datasets. Our two step method first decomposes a dissimilarity measure (Jensen-Shannon divergence) aiming at localizing the dissimilarity in the data embedding space, and then proceeds by aggregating points of high discrepancy and in spatial proximity into clusters. The second contribution presents the first sequential nonparametric two-sample test. That is, instead of being given two sets of observations of fixed size, observations can be treated one at a time and, when strongly enough evidence has been found, the test can be stopped, yielding a more flexible procedure while keeping guaranteed type I error control. Additionally, under certain conditions, when the number of observations tends to infinity, the test has a vanishing probability of type II error. The third contribution consists in a sequential change detection test based on two sliding windows on which a two-sample test is performed, with type I error guarantees. Our test has controlled memory footprint and, as opposed to state-of-the-art methods that also provide type I error control, has constant time complexity per observation, which makes our test suitable for streaming data. Statistique Théorie de l'information Divergence de Jensen-Shannon Analyse de données Comparaison de données Nuages de points Test non-paramétrique d'homogénéité Taille d'effet Estimation de la divergence Statistics Information theory Jensen-Shannon divergence Data analysis Data comparison Point clouds Nonparametric estimation Regression Topological persistence Conditional probability estimation

1

Page generated in 0.0623 seconds