Global ETD Search

71	Model Likelihoods and Bayes Factors for Switching and Mixture Models Frühwirth-Schnatter, Sylvia January 2002 (has links) (PDF) In the present paper we discuss the problem of estimating model likelihoods from the MCMC output for a general mixture and switching model. Estimation is based on the method of bridge sampling (Meng and Wong, 1996), where the MCMC sample is combined with an iid sample from an importance density. The importance density is constructed in an unsupervised manner from the MCMC output using a mixture of complete data posteriors. Whereas the importance sampling estimator as well as the reciprocal importance sampling estimator are sensitive to the tail behaviour of the importance density, we demonstrate that the bridge sampling estimator is far more robust in this concern. Our case studies range from computing marginal likelihoods for a mixture of multivariate normal distributions, testing for the inhomogeneity of a discrete time Poisson process, to testing for the presence of Markov switching and order selection in the MSAR model. (author's abstract) / Series: Report Series SFB "Adaptive Information Systems and Modelling in Economics and Management Science"
72	Extensions of principal components analysis Brubaker, S. Charles 29 June 2009 (has links) Principal Components Analysis is a standard tool in data analysis, widely used in data-rich fields such as computer vision, data mining, bioinformatics, and econometrics. For a set of vectors in n dimensions and a natural number k less than n, the method returns a subspace of dimension k whose average squared distance to that set is as small as possible. Besides saving computation by reducing the dimension, projecting to this subspace can often reveal structure that was hidden in high dimension. This thesis considers several novel extensions of PCA, which provably reveals hidden structure where standard PCA fails to do so. First, we consider Robust PCA, which prevents a few points, possibly corrupted by an adversary, from having a large effect on the analysis. When applied to learning noisy logconcave mixture models, the algorithm requires only slightly more separation between component means than is required for the noiseless case. Second, we consider Isotropic PCA, which can go beyond the first two moments in identifying ``interesting' directions in data. The method leads to the first affine-invariant algorithm that can provably learn mixtures of Gaussians in high dimensions, improving significantly on known results. Thirdly, we define the ``Subgraph Parity Tensor' of order r of a graph and reduce the problem of finding planted cliques in random graphs to the problem of finding the top principal component of this tensor. Principal components analysis Planted cliques Random tensors Mixture models Principal components analysis Algorithms Mathematical statistics Eigenvectors
73	Asymptotic methods for tests of homogeneity for finite mixture models Stewart, Michael Ian January 2002 (has links) We present limit theory for tests of homogeneity for finite mixture models. More specifically, we derive the asymptotic distribution of certain random quantities used for testing that a mixture of two distributions is in fact just a single distribution. Our methods apply to cases where the mixture component distributions come from one of a wide class of one-parameter exponential families, both continous and discrete. We consider two random quantities, one related to testing simple hypotheses, the other composite hypotheses. For simple hypotheses we consider the maximum of the standardised score process, which is itself a test statistic. For composite hypotheses we consider the maximum of the efficient score process, which is itself not a statistic (it depends on the unknown true distribution) but is asymptotically equivalent to certain common test statistics in a certain sense. We show that we can approximate both quantities with the maximum of a certain Gaussian process depending on the sample size and the true distribution of the observations, which when suitably normalised has a limiting distribution of the Gumbel extreme value type. Although the limit theory is not practically useful for computing approximate p-values, we use Monte-Carlo simulations to show that another method suggested by the theory, involving using a Studentised version of the maximum-score statistic and simulating a Gaussian process to compute approximate p-values, is remarkably accurate and uses a fraction of the computing resources that a straight Monte-Carlo approximation would.
74	Trajectories of Cannabis Use Disorder: Risk and Developmental Factors, Clinical Characteristics, and Outcomes Kosty, Derek 18 August 2015 (has links) Efforts to objectively inform cannabis discourses include research on the epidemiology of cannabis abuse and dependence disorders or, collectively, cannabis use disorder (CUD). For my dissertation I identified classes of individuals based on intraindividual CUD trajectory patterns and contrasted trajectory classes with respect to clinical characteristics of CUD, developmental risk factors, and psychosocial outcomes. Identifying differences between trajectory classes provides evidence for the validity of trajectory-based CUD constructs and informs the development of comprehensive models of CUD epidemiology and trajectory-specific intervention approaches. My dissertation used data from the Oregon Adolescent Depression Project, a prospective epidemiological study of the psychiatric and psychosocial functioning of a representative community-based sample randomly selected from nine high schools across western Oregon. Four waves of data collection occurred between mid-adolescence and early adulthood and included diagnostic interviews and self-report questionnaires. Onset and offset ages of all CUD episodes were recorded. The reference sample included 816 participants who completed all diagnostic interviews. A series of latent class growth models revealed three distinct CUD trajectory classes through age 30: (1) a persistent increasing risk class; (2) a maturing out class, marked by increasing risk through age 20 and then a decreasing risk through early adulthood; and (3) a stable low risk class. Rates of cannabis dependence were similar across the persistent increasing and the maturing out classes. Trajectory classes characterized by a history of CUD were associated with a variety of childhood risk factors and measures of psychosocial functioning during early adulthood. Participants who were male, had externalizing disorders, and had psychotic experiences during early adulthood discriminated between the persistent increasing and the maturing out classes. Future research based on more diverse samples is indicated, as are well-controlled tests of associations between risk factors, trajectory class membership, and psychosocial outcomes. A better understanding of these relationships will inform etiological theories of CUD and the development of effective intervention programs that target problematic cannabis use at specific developmental stages. Designing targeted versus undifferentiated interventions for those at greatest risk for adult psychosocial impairment could be a cost-effective way to mitigate the consequences of CUD. cannabis Cannabis use disorder Growth mixture models Latent class analysis Trajectories
75	Beyond One-Size Fits All: Using Heterogeneous Models to Estimate School Performance in Mathematics Melton, Joshua 01 May 2017 (has links) This dissertation explored the academic growth in mathematics of a longitudinal cohort of 21,567 Oregon students during middle school on a state accountability test. The student test scores were used to calculate estimates of school performance based on four different accountability models (percent proficient [PP], change in PP, multilevel growth, and growth mixture). On average, 72% of Oregon eighth graders were proficient in mathematics in 2012, 71% in the average school, and 6% more students in this cohort demonstrated mathematics proficiency compared to 2011. The two-level unconditional multilevel growth model estimated the average intercept (Grade 6) to be 228.4 (SE = 0.07) scale score points with an average middle school growth rate of 5.40 scale points per year (SE = 0.02) on the state mathematics test. Student demographic characteristics were a statistically significant improvement on the unconditional model. A major shortcoming of this research, however, was the inability to find successful model convergence for any three-level growth model or any growth mixture model. A latent class growth analysis was used to uncover groups of students who shared common growth trajectories. A five-latent class solution best represented the data with the lowest BIC and a significant LMR p. Two of the latent classes were students who had high achievement in Grade 6 and demonstrated high growth across middle school and a second group with low sixth grade achievement that had below average growth in middle school. Student-level demographic predictors had statistically significant relations with growth characteristics and latent class membership. In comparing school performance based on the four different models, it was found that, although statistically correlated, the models of school performance ranked schools differently. A school’s percentage of proficient students in Grade 8 correlated moderately (r = [.60, .70]) with growth over the middle school years as estimated by the growth and LCGA models. About 70% to 80% of schools ranked more than 10 percentiles differently for every pairwise comparison of models. These results, like previous research call into question whether currently used models of school performance produce consistent and valid descriptions of school performance using state test scores. Achievement gaps Growth mixture models Latent class growth analysis Longitudinal modeling School performance Student growth
76	Continuous reinforcement learning with incremental Gaussian mixture models / Aprendizagem por reforço contínua com modelos de mistura gaussianas incrementais Pinto, Rafael Coimbra January 2017 (has links) A contribução original desta tese é um novo algoritmo que integra um aproximador de funções com alta eficiência amostral com aprendizagem por reforço em espaços de estados contínuos. A pesquisa completa inclui o desenvolvimento de um algoritmo online e incremental capaz de aprender por meio de uma única passada sobre os dados. Este algoritmo, chamado de Fast Incremental Gaussian Mixture Network (FIGMN) foi empregado como um aproximador de funções eficiente para o espaço de estados de tarefas contínuas de aprendizagem por reforço, que, combinado com Q-learning linear, resulta em performance competitiva. Então, este mesmo aproximador de funções foi empregado para modelar o espaço conjunto de estados e valores Q, todos em uma única FIGMN, resultando em um algoritmo conciso e com alta eficiência amostral, i.e., um algoritmo de aprendizagem por reforço capaz de aprender por meio de pouquíssimas interações com o ambiente. Um único episódio é suficiente para aprender as tarefas investigadas na maioria dos experimentos. Os resultados são analisados a fim de explicar as propriedades do algoritmo obtido, e é observado que o uso da FIGMN como aproximador de funções oferece algumas importantes vantagens para aprendizagem por reforço em relação a redes neurais convencionais. / This thesis’ original contribution is a novel algorithm which integrates a data-efficient function approximator with reinforcement learning in continuous state spaces. The complete research includes the development of a scalable online and incremental algorithm capable of learning from a single pass through data. This algorithm, called Fast Incremental Gaussian Mixture Network (FIGMN), was employed as a sample-efficient function approximator for the state space of continuous reinforcement learning tasks, which, combined with linear Q-learning, results in competitive performance. Then, this same function approximator was employed to model the joint state and Q-values space, all in a single FIGMN, resulting in a concise and data-efficient algorithm, i.e., a reinforcement learning algorithm that learns from very few interactions with the environment. A single episode is enough to learn the investigated tasks in most trials. Results are analysed in order to explain the properties of the obtained algorithm, and it is observed that the use of the FIGMN function approximator brings some important advantages to reinforcement learning in relation to conventional neural networks. Informática : Educação Aprendizagem : Computador na educação Redes neurais Reinforcement learning Neural networks Gaussian mixture models
77	An incremental gaussian mixture network for data stream classification in non-stationary environments / Uma rede de mistura de gaussianas incrementais para classificação de fluxos contínuos de dados em cenários não estacionários Diaz, Jorge Cristhian Chamby January 2018 (has links) Classificação de fluxos contínuos de dados possui muitos desafios para a comunidade de mineração de dados quando o ambiente não é estacionário. Um dos maiores desafios para a aprendizagem em fluxos contínuos de dados está relacionado com a adaptação às mudanças de conceito, as quais ocorrem como resultado da evolução dos dados ao longo do tempo. Duas formas principais de desenvolver abordagens adaptativas são os métodos baseados em conjunto de classificadores e os algoritmos incrementais. Métodos baseados em conjunto de classificadores desempenham um papel importante devido à sua modularidade, o que proporciona uma maneira natural de se adaptar a mudanças de conceito. Os algoritmos incrementais são mais rápidos e possuem uma melhor capacidade anti-ruído do que os conjuntos de classificadores, mas têm mais restrições sobre os fluxos de dados. Assim, é um desafio combinar a flexibilidade e a adaptação de um conjunto de classificadores na presença de mudança de conceito, com a simplicidade de uso encontrada em um único classificador com aprendizado incremental. Com essa motivação, nesta dissertação, propomos um algoritmo incremental, online e probabilístico para a classificação em problemas que envolvem mudança de conceito. O algoritmo é chamado IGMN-NSE e é uma adaptação do algoritmo IGMN. As duas principais contribuições da IGMN-NSE em relação à IGMN são: melhoria de poder preditivo para tarefas de classificação e a adaptação para alcançar um bom desempenho em cenários não estacionários. Estudos extensivos em bases de dados sintéticas e do mundo real demonstram que o algoritmo proposto pode rastrear os ambientes em mudança de forma muito próxima, independentemente do tipo de mudança de conceito. / Data stream classification poses many challenges for the data mining community when the environment is non-stationary. The greatest challenge in learning classifiers from data stream relates to adaptation to the concept drifts, which occur as a result of changes in the underlying concepts. Two main ways to develop adaptive approaches are ensemble methods and incremental algorithms. Ensemble method plays an important role due to its modularity, which provides a natural way of adapting to change. Incremental algorithms are faster and have better anti-noise capacity than ensemble algorithms, but have more restrictions on concept drifting data streams. Thus, it is a challenge to combine the flexibility and adaptation of an ensemble classifier in the presence of concept drift, with the simplicity of use found in a single classifier with incremental learning. With this motivation, in this dissertation we propose an incremental, online and probabilistic algorithm for classification as an effort of tackling concept drifting. The algorithm is called IGMN-NSE and is an adaptation of the IGMN algorithm. The two main contributions of IGMN-NSE in relation to the IGMN are: predictive power improvement for classification tasks and adaptation to achieve a good performance in non-stationary environments. Extensive studies on both synthetic and real-world data demonstrate that the proposed algorithm can track the changing environments very closely, regardless of the type of concept drift. Banco : Dados Algoritmos Incremental learning Gaussian mixture models Concept drift Data streams classification
78	Méthodes bioinformatiques pour l'analyse de données de séquençage dans le contexte du cancer / Bioinformatics methods for cancer sequencing data analysis Rudewicz, Justine 30 June 2017 (has links) Le cancer résulte de la prolifération excessive de cellules qui dérivent toutes de la même cellule initiatrice et suivent un processus Darwinien de diversification et de sélection. Ce processus est défini par l'accumulation d'altérations génétiques et épigénétiques dont la caractérisation est un élément majeur pour pouvoir proposer une thérapie ciblant spécifiquement les cellules tumorales. L'avènement des nouvelles technologies de séquençage haut débit permet cette caractérisation à un niveau moléculaire. Cette révolution technologique a entraîné le développement de nombreuses méthodes bioinformatiques. Dans cette thèse, nous nous intéressons particulièrement au développement de nouvelles méthodes computationnelles d'analyse de données de séquençage d'échantillons tumoraux permettant une identification précise d'altérations spécifiques aux tumeurs et une description fine des sous populations tumorales. Dans le premier chapitre, il s'agît d'étudier des méthodes d'identification d'altérations ponctuelles dans le cadre de séquençage ciblé, appliquées à une cohorte de patientes atteintes du cancer du sein. Nous décrivons deux nouvelles méthodes d'analyse, chacune adaptée à une technologie de séquençage, spécifiquement Roche 454 et Pacifique Biosciences.Dans le premier cas, nous avons adapté des approches existantes au cas particulier de séquences de transcrits. Dans le second cas, nous avons été confronté à un bruit de fond élevé entraînant un fort taux de faux positifs lors de l'utilisation d'approches classiques. Nous avons développé une nouvelle méthode, MICADo, basée sur les graphes de De Bruijn et permettant une distinction efficace entre les altérations spécifiques aux patients et les altérations communes à la cohorte, ce qui rend les résultats exploitables dans un contexte clinique. Le second chapitre aborde l'identification d'altérations de nombre de copies. Nous décrivons l'approche mise en place pour leur identification efficace à partir de données de très faible couverture. L'apport principal de ce travail consiste en l'élaboration d'une stratégie d'analyse statistique afin de mettre en évidence des changements locaux et globaux au niveau du génome survenus durant le traitement administré à des patientes atteintes de cancer du sein. Notre méthode repose sur la construction d'un modèle linéaire permettant d'établir des scores de différences entre les échantillons avant et après traitement. Dans le troisième chapitre, nous nous intéressons au problème de reconstruction clonale. Cette problématique récente est actuellement en plein essor, mais manque cependant d'un cadre formel bien établi. Nous proposons d'abord une formalisation du problème de reconstruction clonale. Ensuite nous utilisons ce formalisme afin de mettre en place une méthode basée sur les modèles de mélanges Gaussiens. Cette méthode utilise les altérations ponctuelles et de nombre de copies - comme celles abordées dans les deux chapitres précédents - afin de caractériser et quantifier les différentes populations clonales présentes dans un échantillon tumoral. / Cancer results from the excessive proliferation of cells decending from the same founder cell and following a Darwinian process of diversification and selection. This process is defined by the accumulation of genetic and epigenetic alterations whose characterization is a key element for establishing a therapy that would specifically target tumor cells. The advent of new high-throughput sequencing technologies enables this characterization at the molecular level. This technological revolution has led to the development of numerous bioinformatics methods. In this thesis, we are particularly interested in the development of new computational methods for the analysis of sequencing data of tumor samples allowing precise identification of tumor-specific alterations and an accurate description of tumor subpopulations. In the first chapter, we explore methods for identifying single nucleotide alterations in targeted sequencing data and apply them to a cohort of breast cancer patients. We introduce two new methods of analysis, each tailored to a particular sequencing technology, namely Roche 454 and Pacific Biosciences. In the first case, we adapted existing approaches to the particular case of transcript sequencing. In the second case, when using conventional approaches, we were confronted with a high background noise resulting in a high rate of false positives. We have developed a new method, MICADo, based on the De Bruijn graphs and making possible an effective distinction between patient-specific alterations and alterations common to the cohort, which makes the results usable in a clinical context. Second chapter deals with the identification of copy number alterations. We describe the approach put in place for their efficient identification from very low coverage data. The main contribution of this work is the development of a strategy for statistical analysis in order to emphasise local and global changes in the genome that occurred during the treatment administered to patients with breast cancer. Our method is based on the construction of a linear model to establish scores of differences between samples before and after treatment. In the third chapter, we focus on the problem of clonal reconstruction. This problem has recently gathered a lot of interest, but it still lacks a well-established formal framework. We first propose a formalization of the clonal reconstruction problem. Then we use this formalism to put in place a method based on Gaussian mixture models. Our method uses single nucleotide and copy number alterations - such as those discussed in the previous two chapters - to characterize and quantify different clonal populations present in a tumor sample. Cancer Bioinformatique NGS TGS Graphes de de Bruijn Modèles de mélanges Cancer Bioinformatics NGS TGS De Bruijn graphs Mixture models
79	Continuous reinforcement learning with incremental Gaussian mixture models / Aprendizagem por reforço contínua com modelos de mistura gaussianas incrementais Pinto, Rafael Coimbra January 2017 (has links) A contribução original desta tese é um novo algoritmo que integra um aproximador de funções com alta eficiência amostral com aprendizagem por reforço em espaços de estados contínuos. A pesquisa completa inclui o desenvolvimento de um algoritmo online e incremental capaz de aprender por meio de uma única passada sobre os dados. Este algoritmo, chamado de Fast Incremental Gaussian Mixture Network (FIGMN) foi empregado como um aproximador de funções eficiente para o espaço de estados de tarefas contínuas de aprendizagem por reforço, que, combinado com Q-learning linear, resulta em performance competitiva. Então, este mesmo aproximador de funções foi empregado para modelar o espaço conjunto de estados e valores Q, todos em uma única FIGMN, resultando em um algoritmo conciso e com alta eficiência amostral, i.e., um algoritmo de aprendizagem por reforço capaz de aprender por meio de pouquíssimas interações com o ambiente. Um único episódio é suficiente para aprender as tarefas investigadas na maioria dos experimentos. Os resultados são analisados a fim de explicar as propriedades do algoritmo obtido, e é observado que o uso da FIGMN como aproximador de funções oferece algumas importantes vantagens para aprendizagem por reforço em relação a redes neurais convencionais. / This thesis’ original contribution is a novel algorithm which integrates a data-efficient function approximator with reinforcement learning in continuous state spaces. The complete research includes the development of a scalable online and incremental algorithm capable of learning from a single pass through data. This algorithm, called Fast Incremental Gaussian Mixture Network (FIGMN), was employed as a sample-efficient function approximator for the state space of continuous reinforcement learning tasks, which, combined with linear Q-learning, results in competitive performance. Then, this same function approximator was employed to model the joint state and Q-values space, all in a single FIGMN, resulting in a concise and data-efficient algorithm, i.e., a reinforcement learning algorithm that learns from very few interactions with the environment. A single episode is enough to learn the investigated tasks in most trials. Results are analysed in order to explain the properties of the obtained algorithm, and it is observed that the use of the FIGMN function approximator brings some important advantages to reinforcement learning in relation to conventional neural networks. Informática : Educação Aprendizagem : Computador na educação Redes neurais Reinforcement learning Neural networks Gaussian mixture models
80	An incremental gaussian mixture network for data stream classification in non-stationary environments / Uma rede de mistura de gaussianas incrementais para classificação de fluxos contínuos de dados em cenários não estacionários Diaz, Jorge Cristhian Chamby January 2018 (has links) Classificação de fluxos contínuos de dados possui muitos desafios para a comunidade de mineração de dados quando o ambiente não é estacionário. Um dos maiores desafios para a aprendizagem em fluxos contínuos de dados está relacionado com a adaptação às mudanças de conceito, as quais ocorrem como resultado da evolução dos dados ao longo do tempo. Duas formas principais de desenvolver abordagens adaptativas são os métodos baseados em conjunto de classificadores e os algoritmos incrementais. Métodos baseados em conjunto de classificadores desempenham um papel importante devido à sua modularidade, o que proporciona uma maneira natural de se adaptar a mudanças de conceito. Os algoritmos incrementais são mais rápidos e possuem uma melhor capacidade anti-ruído do que os conjuntos de classificadores, mas têm mais restrições sobre os fluxos de dados. Assim, é um desafio combinar a flexibilidade e a adaptação de um conjunto de classificadores na presença de mudança de conceito, com a simplicidade de uso encontrada em um único classificador com aprendizado incremental. Com essa motivação, nesta dissertação, propomos um algoritmo incremental, online e probabilístico para a classificação em problemas que envolvem mudança de conceito. O algoritmo é chamado IGMN-NSE e é uma adaptação do algoritmo IGMN. As duas principais contribuições da IGMN-NSE em relação à IGMN são: melhoria de poder preditivo para tarefas de classificação e a adaptação para alcançar um bom desempenho em cenários não estacionários. Estudos extensivos em bases de dados sintéticas e do mundo real demonstram que o algoritmo proposto pode rastrear os ambientes em mudança de forma muito próxima, independentemente do tipo de mudança de conceito. / Data stream classification poses many challenges for the data mining community when the environment is non-stationary. The greatest challenge in learning classifiers from data stream relates to adaptation to the concept drifts, which occur as a result of changes in the underlying concepts. Two main ways to develop adaptive approaches are ensemble methods and incremental algorithms. Ensemble method plays an important role due to its modularity, which provides a natural way of adapting to change. Incremental algorithms are faster and have better anti-noise capacity than ensemble algorithms, but have more restrictions on concept drifting data streams. Thus, it is a challenge to combine the flexibility and adaptation of an ensemble classifier in the presence of concept drift, with the simplicity of use found in a single classifier with incremental learning. With this motivation, in this dissertation we propose an incremental, online and probabilistic algorithm for classification as an effort of tackling concept drifting. The algorithm is called IGMN-NSE and is an adaptation of the IGMN algorithm. The two main contributions of IGMN-NSE in relation to the IGMN are: predictive power improvement for classification tasks and adaptation to achieve a good performance in non-stationary environments. Extensive studies on both synthetic and real-world data demonstrate that the proposed algorithm can track the changing environments very closely, regardless of the type of concept drift. Banco : Dados Algoritmos Incremental learning Gaussian mixture models Concept drift Data streams classification

Search results