Global ETD Search

61	Aspects of probabilistic modelling for data analysis Delannay, Nicolas 23 October 2007 (has links) Computer technologies have revolutionised the processing of information and the search for knowledge. With the ever increasing computational power, it is becoming possible to tackle new data analysis applications as diverse as mining the Internet resources, analysing drugs effects on the organism or assisting wardens with autonomous video detection techniques. Fundamentally, the principle of any data analysis task is to fit a model which encodes well the dependencies (or patterns) present in the data. However, the difficulty is precisely to define such proper model when data are noisy, dependencies are highly stochastic and there is no simple physical rule to represent them. The aim of this work is to discuss the principles, the advantages and weaknesses of the probabilistic modelling framework for data analysis. The main idea of the framework is to model dispersion of data as well as uncertainty about the model itself by probability distributions. Three data analysis tasks are presented and for each of them the discussion is based on experimental results from real datasets. The first task considers the problem of linear subspaces identification. We show how one can replace a Gaussian noise model by a Student-t noise to make the identification more robust to atypical samples and still keep the learning procedure simple. The second task is about regression applied more specifically to near-infrared spectroscopy datasets. We show how spectra should be pre-processed before entering the regression model. We then analyse the validity of the Bayesian model selection principle for this application (and in particular within the Gaussian Process formulation) and compare this principle to the resampling selection scheme. The final task considered is Collaborative Filtering which is related to applications such as recommendation for e-commerce and text mining. This task is illustrative of the way how intuitive considerations can guide the design of the model and the choice of the probability distributions appearing in it. We compare the intuitive approach with a simpler matrix factorisation approach. Data mining Machine learning Bayesian statistics Robust statistics Regression NIR spectroscopy Collaborative filtering
62	Role of Semantics in the Reconsolidation of Episodic Memories Kumar, Shikhar January 2012 (has links) Evidence suggests that when memories are reactivated they become labile and can be updated or even erased. Reactivation induces plasticity in memory representations, rendering them fragile, much as they were after initial acquisition. When a memory has been reactivated it must be re-stabilized, which requires reconsolidation. A recent set of studies established the phenomenon of memory reconsolidation for episodic memory (Hupbach et al., 2007, 2008, 2011). That reconsolidation effects apply to explicit memory, which requires conscious recollection, has far reaching implications. In the Hupbach et al. studies the ability of subtle reminders to trigger reconsolidation was investigated; these reminders consisted of the same spatial context, the same experimenter and a reminder question. Given we live in a predictable world, episodes are not random occurrences of events in time and space, but instead consist of statistical and semantic regularities. This leaves open the question of whether semantic relations and statistical regularities between episodes can trigger a reactivation of episodic memory. If so, how would this affect the status of the reactivated memory? This dissertation explored the role of semantic relatedness between the elements of different episodes in memory reactivation and subsequent updating. We focused particularly on categorical and contextual aspects of semantic relations. A series of experiments considered different kinds of semantic relations between elements of episodes, providing evidence of memory reactivation and updating as a consequence of basic level category relations between items in two separate episodes. We also tested the predictions of the Temporal Context Model (TCM) (Sederberg et al., 2011) for our experimental paradigm and show that the current TCM model is not able to account for all the effects of semantic relatedness in the reconsolidation paradigm. Finally, we explore an alternative approach that seeks to explain memory reconsolidation as Bayesian Inference. Our results provide support for this Bayesian framework, showing the potential of it for exploring different aspects of memory organization. Episodic memory Reconsolidation Semantics Temporal context model Psychology Bayesian statistics Computational modelling
63	Measuring the Mass of a Galaxy: An evaluation of the performance of Bayesian mass estimates using statistical simulation Eadie, Gwendolyn 27 March 2013 (has links) This research uses a Bayesian approach to study the biases that may occur when kinematic data is used to estimate the mass of a galaxy. Data is simulated from the Hernquist (1990) distribution functions (DFs) for velocity dispersions of the isotropic, constant anisotropic, and anisotropic Osipkov (1979) and Merritt (1985) type, and then analysed using the isotropic Hernquist model. Biases are explored when i) the model and data come from the same DF, ii) the model and data come from the same DF but tangential velocities are unknown, iii) the model and data come from different DFs, and iv) the model and data come from different DFs and the tangential velocities are unknown. Mock observations are also created from the Gauthier (2006) simulations and analysed with the isotropic Hernquist model. No bias was found in situation (i), a slight positive bias was found in (ii), a negative bias was found in (iii), and a large positive bias was found in (iv). The mass estimate of the Gauthier system when tangential velocities were unknown was nearly correct, but the mass profile was not described well by the isotropic Hernquist model. When the Gauthier data was analysed with the tangential velocities, the mass of the system was overestimated. The code created for the research runs three parallel Markov Chains for each data set, uses the Gelman-Rubin statistic to assess convergence, and combines the converged chains into a single sample of the posterior distribution for each data set. The code also includes two ways to deal with nuisance parameters. One is to marginalize over the nuisance parameter at every step in the chain, and the other is to sample the nuisance parameters using a hybrid-Gibbs sampler. When tangential velocities, v(t), are unobserved in the analyses above, they are sampled as nuisance parameters in the Markov Chain. The v(t) estimates from the Markov chains did a poor job of estimating the true tangential velocities. However, the posterior samples of v(t) proved to be useful, as the estimates of the tangential velocities helped explain the biases discovered in situations (i)-(iv) above. / Thesis (Master, Physics, Engineering Physics and Astronomy) -- Queen's University, 2013-03-26 17:23:14.643 phase-space distribution function statistical simulation bayesian statistics Hernquist model dark matter halo galaxy mass estimator
64	Random finite sets for multitarget tracking with applications Wood, Trevor M. January 2011 (has links) Multitarget tracking is the process of jointly determining the number of targets present and their states from noisy sets of measurements. The difficulty of the multitarget tracking problem is that the number of targets present can change as targets appear and disappear while the sets of measurements may contain false alarms and measurements of true targets may be missed. The theory of random finite sets was proposed as a systematic, Bayesian approach to solving the multitarget tracking problem. The conceptual solution is given by Bayes filtering for the probability distribution of the set of target states, conditioned on the sets of measurements received, known as the multitarget Bayes filter. A first-moment approximation to this filter, the probability hypothesis density (PHD) filter, provides a more computationally practical, but theoretically sound, solution. The central thesis of this work is that the random finite set framework is theoretically sound, compatible with the Bayesian methodology and amenable to immediate implementation in a wide range of contexts. In advancing this thesis, new links between the PHD filter and existing Bayesian approaches for manoeuvre handling and incorporation of target amplitude information are presented. A new multitarget metric which permits incorporation of target confidence information is derived and new algorithms are developed which facilitate sequential Monte Carlo implementations of the PHD filter. Several applications of the PHD filter are presented, with a focus on applications for tracking in sonar data. Good results are presented for implementations on real active and passive sonar data. The PHD filter is also deployed in order to extract bacterial trajectories from microscopic visual data in order to aid ongoing work in understanding bacterial chemotaxis. A performance comparison between the PHD filter and conventional multitarget tracking methods using simulated data is also presented, showing favourable results for the PHD filter. 621.38485
65	Bayesian pathway analysis in epigenetics Wright, Alan January 2013 (has links) A typical gene expression data set consists of measurements of a large number of gene expressions, on a relatively small number of subjects, classified according to two or more outcomes, for example cancer or non-cancer. The identification of associations between gene expressions and outcome is a huge multiple testing problem. Early approaches to this problem involved the application of thousands of univariate tests with corrections for multiplicity. Over the past decade, numerous studies have demonstrated that analyzing gene expression data structured into predefined gene sets can produce benefits in terms of statistical power and robustness when compared to alternative approaches. This thesis presents the results of research on gene set analysis. In particular, it examines the properties of some existing methods for the analysis of gene sets. It introduces novel Bayesian methods for gene set analysis. A distinguishing feature of these methods is that the model is specified conditionally on the expression data, whereas other methods of gene set analysis and IGA generally make inferences conditionally on the outcome. Computer simulation is used to compare three common established methods for gene set analysis. In this simulation study a new procedure for the simulation of gene expression data is introduced. The simulation studies are used to identify situations in which the established methods perform poorly. The Bayesian approaches developed in this thesis apply reversible jump Markov chain Monte Carlo (RJMCMC) techniques to model gene expression effects on phenotype. The reversible jump step in the modelling procedure allows for posterior probabilities for activeness of gene set to be produced. These mixture models reverse the generally accepted conditionality and model outcome given gene expression, which is a more intuitive assumption when modelling the pathway to phenotype. It is demonstrated that the two models proposed may be superior to the established methods studied. There is considerable scope for further development of this line of research, which is appealing in terms of the use of mixture model priors that reflect the belief that a relatively small number of genes, restricted to a small number of gene sets, are associated with the outcome. 572.8
66	New Advancements of Scalable Statistical Methods for Learning Latent Structures in Big Data Zhao, Shiwen January 2016 (has links) <p>Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.</p><p>Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.</p> / Dissertation Statistics Bioinformatics Mathematics Bayesian Statistics Big Data Dimension Reduction Latent Structure Method of Moment
67	Bayesian Emulation for Sequential Modeling, Inference and Decision Analysis Irie, Kaoru January 2016 (has links) <p>The advances in three related areas of state-space modeling, sequential Bayesian learning, and decision analysis are addressed, with the statistical challenges of scalability and associated dynamic sparsity. The key theme that ties the three areas is Bayesian model emulation: solving challenging analysis/computational problems using creative model emulators. This idea defines theoretical and applied advances in non-linear, non-Gaussian state-space modeling, dynamic sparsity, decision analysis and statistical computation, across linked contexts of multivariate time series and dynamic networks studies. Examples and applications in financial time series and portfolio analysis, macroeconomics and internet studies from computational advertising demonstrate the utility of the core methodological innovations.</p><p>Chapter 1 summarizes the three areas/problems and the key idea of emulating in those areas. Chapter 2 discusses the sequential analysis of latent threshold models with use of emulating models that allows for analytical filtering to enhance the efficiency of posterior sampling. Chapter 3 examines the emulator model in decision analysis, or the synthetic model, that is equivalent to the loss function in the original minimization problem, and shows its performance in the context of sequential portfolio optimization. Chapter 4 describes the method for modeling the steaming data of counts observed on a large network that relies on emulating the whole, dependent network model by independent, conjugate sub-models customized to each set of flow. Chapter 5 reviews those advances and makes the concluding remarks.</p> / Dissertation Statistics Bayesian Statistics Dynamic Gravity Models Latent Threshold Portfolio Optimization Sequential Monte Carlo State-Space Modeling
68	An Introduction to the Theory and Applications of Bayesian Networks Jaitha, Anant 01 January 2017 (has links) Bayesian networks are a means to study data. A Bayesian network gives structure to data by creating a graphical system to model the data. It then develops probability distributions over these variables. It explores variables in the problem space and examines the probability distributions related to those variables. It conducts statistical inference over those probability distributions to draw meaning from them. They are good means to explore a large set of data efficiently to make inferences. There are a number of real world applications that already exist and are being actively researched. This paper discusses the theory and applications of Bayesian networks. Bayesian Network Artificial Intelligence Classification Learning Bayesian Statistics Artificial Intelligence and Robotics Other Computer Sciences Theory and Algorithms
69	Polarised neutron diffraction measurements of PrBa2Cu3O6+x and the Bayesian statistical analysis of such data Markvardsen, Anders Johannes January 2000 (has links) The physics of the series Pr<sub>y</sub>Y<sub>1-y</sub>Ba<sub>2</sub>Cu<sub>3</sub>O<sub>6+x</sub>, and ability of Pr to suppress superconductivity, has been a subject of frequent discussions in the literature for more than a decade. This thesis describes a polarised neutron diffraction (PND) experiment performed on PrBa<sub>2</sub>Cu<sub>3</sub>O<sub>6.24</sub> designed to find out something about the electron structure. This experiment pushed the limits of what can be done using the PND technique. The problem is one of a limited number of measured Fourier components that need to be inverted to form a real space image. To accomplish this inversion the maximum entropy technique has been employed. In some cases, the maximum entropy technique has the ability to increase the resolution of ‘inverted’ data immensely, but this ability is found to depend critically on the choice of constants used in the method. To investigate this a Bayesian robustness analysis of the maximum entropy method is carried out, resulting in an improvement of the maximum entropy technique for analysing PND data. Some results for nickel in the literature have been re-analysed and a comparison is made with different maximum entropy algorithms. Equipped with an improved data analysis technique and carefully measured PND data for PrBa<sub>2</sub>Cu<sub>3</sub>O<sub>6.24</sub> a number of new interesting features are observed, putting constraints on existing theoretical models of Pr<sub>y</sub>Y<sub>1-y</sub>Ba<sub>2</sub>Cu<sub>3</sub>O<sub>6+x</sub> and leaving room for more questions to be answered. 539.72
70	Anotação probabilística de perfis de metabólitos obtidos por cromatografia líquida acoplada a espectrometria de massas / Probabilistic annotation of metabolite profiles obtained by liquid chromatography coupled to mass spectrometry Silva, Ricardo Roberto da 16 April 2014 (has links) A metabolômica é uma ciência emergente na era pós-genômica que almeja a análise abrangente de pequenas moléculas orgânicas em sistemas biológicos. Técnicas de cromatografia líquida acoplada a espectrometria de massas (LC-MS) figuram como as abordagens de amostragem mais difundidas. A extração e detecção simultânea de metabólitos por LC-MS produz conjuntos de dados complexos que requerem uma série de etapas de pré-processamento para que a informação possa ser extraída com eficiência e precisão. Para que as abordagens de perfil metabólico não direcionado possam ser efetivamente relacionadas às alterações de interesse no metabolismo, é estritamente necessário que os metabólitos amostrados sejam anotados com confiabilidade e que a sua inter-relação seja interpretada sob a pressuposição de uma amostra conectada do metabolismo. Diante do desafio apresentado, a presente tese teve por objetivo desenvolver um arcabouço de software, que tem como componente central um método probabilístico de anotação de metabólitos que permite a incorporação de fontes independentes de informações espectrais e conhecimento prévio acerca do metabolismo. Após a classificação probabilística, um novo método para representar a distribuição de probabilidades a posteriori em forma de grafo foi proposto. Uma biblioteca de métodos para o ambiente R, denominada ProbMetab (Probilistic Metabolomics), foi criada e disponibilizada de forma aberta e gratuita. Utilizando o software ProbMetab para analisar um conjunto de dados benchmark com identidades dos compostos conhecidas de antemão, demonstramos que até 90% das identidades corretas dos metabólitos estão presentes entre as três maiores probabilidades. Portanto, pode-se enfatizar a eficiência da disponibilização da distribuição de probabilidades a posteriori em lugar da classificação simplista usualmente adotada na área de metabolômica, em que se usa apenas o candidato de maior probabilidade. Numa aplicação à dados reais, mudanças em uma via metabólica reconhecidamente relacionada a estresses abióticos em plantas (Biossíntese de Flavona e Flavonol) foram automaticamente detectadas em dados de cana-de-açúcar, demonstrando a importância de uma visualização centrada na distribuição a posteriori da rede de anotações dos metabólitos. / Metabolomics is an emerging science field in the post-genomic era, which aims at a comprehensive analysis of small organic molecules in biological systems. Techniques of liquid chromatography coupled to mass spectrometry (LC-MS) figure as the most widespread approaches to metabolomics studies. The metabolite detection by LC-MS produces complex data sets, that require a series of preprocessing steps to ensure that the information can be extracted efficiently and accurately. In order to be effectively related to alterations in the metabolism of interest, is absolutely necessary that the metabolites sampled by untargeted metabolic profiling approaches are annotated with reliability and that their relationship are interpreted under the assumption of a connected metabolism sample. Faced with the presented challenge, this thesis developed a software framework, which has as its central component a probabilistic method for metabolite annotation that allows the incorporation of independent sources of spectral information and prior knowledge about metabolism. After the probabilistic classification, a new method to represent the a posteriori probability distribution in the form of a graph has been proposed. A library of methods for R environment, called ProbMetab (Probilistic Metabolomics), was created and made available as an open source software. Using the ProbMetab software to analyze a set of benchmark data with compound identities known beforehand, we demonstrated that up to 90% of the correct metabolite identities were present among the top-three higher probabilities, emphasizing the efficiency of a posteriori probability distribution display, in place of a simplistic classification with only the most probable candidate, usually adopted in the field of metabolomics. In an application to real data, changes in a known metabolic pathway related to abiotic stresses in plants (Biosynthesis of Flavone and Flavonol) were automatically detected on sugar cane data, demonstrating the importance of a view centered on the posterior distribution of metabolite annotation network. Bayesian Statistics Bioinformática Bioinformatics Espectrometria de massas Estatística Bayesiana Mass spectrometry Metabolômica Metabolomics

Search results