• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1990
  • 601
  • 260
  • 258
  • 61
  • 32
  • 26
  • 19
  • 15
  • 14
  • 8
  • 6
  • 6
  • 6
  • 5
  • Tagged with
  • 4074
  • 791
  • 745
  • 720
  • 711
  • 703
  • 693
  • 648
  • 565
  • 440
  • 422
  • 416
  • 391
  • 362
  • 307
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

Non-parametric Bayesian models for structured output prediction

Bratières, Sébastien January 2018 (has links)
Structured output prediction is a machine learning tasks in which an input object is not just assigned a single class, as in classification, but multiple, interdependent labels. This means that the presence or value of a given label affects the other labels, for instance in text labelling problems, where output labels are applied to each word, and their interdependencies must be modelled. Non-parametric Bayesian (NPB) techniques are probabilistic modelling techniques which have the interesting property of allowing model capacity to grow, in a controllable way, with data complexity, while maintaining the advantages of Bayesian modelling. In this thesis, we develop NPB algorithms to solve structured output problems. We first study a map-reduce implementation of a stochastic inference method designed for the infinite hidden Markov model, applied to a computational linguistics task, part-of-speech tagging. We show that mainstream map-reduce frameworks do not easily support highly iterative algorithms. The main contribution of this thesis consists in a conceptually novel discriminative model, GPstruct. It is motivated by labelling tasks, and combines attractive properties of conditional random fields (CRF), structured support vector machines, and Gaussian process (GP) classifiers. In probabilistic terms, GPstruct combines a CRF likelihood with a GP prior on factors; it can also be described as a Bayesian kernelized CRF. To train this model, we develop a Markov chain Monte Carlo algorithm based on elliptical slice sampling and investigate its properties. We then validate it on real data experiments, and explore two topologies: sequence output with text labelling tasks, and grid output with semantic segmentation of images. The latter case poses scalability issues, which are addressed using likelihood approximations and an ensemble method which allows distributed inference and prediction. The experimental validation demonstrates: (a) the model is flexible and its constituent parts are modular and easy to engineer; (b) predictive performance and, most crucially, the probabilistic calibration of predictions are better than or equal to that of competitor models, and (c) model hyperparameters can be learnt from data.
112

Robust variational Bayesian clustering for underdetermined speech separation

Zohny, Zeinab Y. January 2016 (has links)
The main focus of this thesis is the enhancement of the statistical framework employed for underdetermined T-F masking blind separation of speech. While humans are capable of extracting a speech signal of interest in the presence of other interference and noise; actual speech recognition systems and hearing aids cannot match this psychoacoustic ability. They perform well in noise and reverberant free environments but suffer in realistic environments. Time-frequency masking algorithms based on computational auditory scene analysis attempt to separate multiple sound sources from only two reverberant stereo mixtures. They essentially rely on the sparsity that binaural cues exhibit in the time-frequency domain to generate masks which extract individual sources from their corresponding spectrogram points to solve the problem of underdetermined convolutive speech separation. Statistically, this can be interpreted as a classical clustering problem. Due to analytical simplicity, a finite mixture of Gaussian distributions is commonly used in T-F masking algorithms for modelling interaural cues. Such a model is however sensitive to outliers, therefore, a robust probabilistic model based on the Student's t-distribution is first proposed to improve the robustness of the statistical framework. This heavy tailed distribution, as compared to the Gaussian distribution, can potentially better capture outlier values and thereby lead to more accurate probabilistic masks for source separation. This non-Gaussian approach is applied to the state-of the-art MESSL algorithm and comparative studies are undertaken to confirm the improved separation quality. A Bayesian clustering framework that can better model uncertainties in reverberant environments is then exploited to replace the conventional expectation-maximization (EM) algorithm within a maximum likelihood estimation (MLE) framework. A variational Bayesian (VB) approach is then applied to the MESSL algorithm to cluster interaural phase differences thereby avoiding the drawbacks of MLE; specifically the probable presence of singularities and experimental results confirm an improvement in the separation performance. Finally, the joint modelling of the interaural phase and level differences and the integration of their non-Gaussian modelling within a variational Bayesian framework, is proposed. This approach combines the advantages of the robust estimation provided by the Student's t-distribution and the robust clustering inherent in the Bayesian approach. In other words, this general framework avoids the difficulties associated with MLE and makes use of the heavy tailed Student's t-distribution to improve the estimation of the soft probabilistic masks at various reverberation times particularly for sources in close proximity. Through an extensive set of simulation studies which compares the proposed approach with other T-F masking algorithms under different scenarios, a significant improvement in terms of objective and subjective performance measures is achieved.
113

Optimisation and Bayesian optimality

Joyce, Thomas January 2016 (has links)
This doctoral thesis will present the results of work into optimisation algorithms. We first give a detailed exploration of the problems involved in comparing optimisation algorithms. In particular we provide extensions and refinements to no free lunch results, exploring algorithms with arbitrary stopping conditions, optimisation under restricted metrics, parallel computing and free lunches, and head-to-head minimax behaviour. We also characterise no free lunch results in terms of order statistics. We then ask what really constitutes understanding of an optimisation algorithm. We argue that one central part of understanding an optimiser is knowing its Bayesian prior and cost function. We then pursue a general Bayesian framing of optimisation, and prove that this Bayesian perspective is applicable to all optimisers, and that even seemingly non-Bayesian optimisers can be understood in this way. Specifically we prove that arbitrary optimisation algorithms can be represented as a prior and a cost function. We examine the relationship between the Kolmogorov complexity of the optimiser and the Kolmogorov complexity of it’s corresponding prior. We also extended our results from deterministic optimisers to stochastic optimisers and forgetful optimisers, and we show that uniform randomly selecting a prior is not equivalent to uniform randomly selecting an optimisation behaviour. Lastly we consider what the best way to go about gaining a Bayesian understanding of real optimisation algorithms is. We use the developed Bayesian framework to explore the affects of some common approaches to constructing meta-heuristic optimisation algorithms, such as on-line parameter adaptation. We conclude by exploring an approach to uncovering the probabilistic beliefs of optimisers with a “shattering” method.
114

SELEÇÃO de Modelos e Estimação de Parâmetros No Tratamento Quimioterápico de Tumores Via Inferência Bayesiana

MATA, A. M. M. 21 July 2017 (has links)
Made available in DSpace on 2018-08-02T00:03:01Z (GMT). No. of bitstreams: 1 tese_11469_ADRIANA MACHADO MALAFAIA DA MATA.pdf: 525854 bytes, checksum: 6cb593fee29b00aa8d38d9498f996ea0 (MD5) Previous issue date: 2017-07-21 / O câncer é uma doença decorrente do crescimento desordenado de células. Comumente, a quimioterapia antineoplásica é utilizada no tratamento dos cânceres mais comuns. Nesse contexto, as pesquisas têm se voltado para modelos matemáticos que descrevem o crescimento de células tumorais com a ação de um fármaco quimioterápico. Diante de uma variedade de modelos existentes na literatura para tal fim, um método para selecionar o modelo mais adequado faz-se necessário. Esta dissertação estuda modelos matemáticos de tratamento de tumores e aplica Approximate Bayesian Computation (ABC) para seleção do modelo que melhor representa os dados observados. O algoritmo ABC utilizado foi determinístico, priorizando a seleção do modelo. Ao modelo selecionado, foi aplicado o filtro de partículas SIR que permitiu aprimorar as estimativas de parâmetros. Foram estudados modelos de crescimento tumoral via equações diferenciais ordinárias e os parâmetros foram assumidos como constantes. Os modelos foram estruturados a partir de farmacocinética Bicompartimental, que permite o estudo de drogas antineoplásicas administradas por via oral. Além disso, foram utilizadas formulações de crescimento de tumores conhecidas adicionando-se o fator de influência de uma dose única de droga quimioterápica.
115

Bayesian M/EEG source localization with possible joint skull conductivity estimation

Costa, Facundo Hernan 02 March 2017 (has links) (PDF)
M/EEG mechanisms allow determining changes in the brain activity, which is useful in diagnosing brain disorders such as epilepsy. They consist of measuring the electric potential at the scalp and the magnetic field around the head. The measurements are related to the underlying brain activity by a linear model that depends on the lead-field matrix. Localizing the sources, or dipoles, of M/EEG measurements consists of inverting this linear model. However, the non-uniqueness of the solution (due to the fundamental law of physics) and the low number of dipoles make the inverse problem ill-posed. Solving such problem requires some sort of regularization to reduce the search space. The literature abounds of methods and techniques to solve this problem, especially with variational approaches. This thesis develops Bayesian methods to solve ill-posed inverse problems, with application to M/EEG. The main idea underlying this work is to constrain sources to be sparse. This hypothesis is valid in many applications such as certain types of epilepsy. We develop different hierarchical models to account for the sparsity of the sources. Theoretically, enforcing sparsity is equivalent to minimizing a cost function penalized by an l0 pseudo norm of the solution. However, since the l0 regularization leads to NP-hard problems, the l1 approximation is usually preferred. Our first contribution consists of combining the two norms in a Bayesian framework, using a Bernoulli-Laplace prior. A Markov chain Monte Carlo (MCMC) algorithm is used to estimate the parameters of the model jointly with the source location and intensity. Comparing the results, in several scenarios, with those obtained with sLoreta and the weighted l1 norm regularization shows interesting performance, at the price of a higher computational complexity. Our Bernoulli-Laplace model solves the source localization problem at one instant of time. However, it is biophysically well-known that the brain activity follows spatiotemporal patterns. Exploiting the temporal dimension is therefore interesting to further constrain the problem. Our second contribution consists of formulating a structured sparsity model to exploit this biophysical phenomenon. Precisely, a multivariate Bernoulli-Laplacian distribution is proposed as an a priori distribution for the dipole locations. A latent variable is introduced to handle the resulting complex posterior and an original Metropolis-Hastings sampling algorithm is developed. The results show that the proposed sampling technique improves significantly the convergence. A comparative analysis of the results is performed between the proposed model, an l21 mixed norm regularization and the Multiple Sparse Priors (MSP) algorithm. Various experiments are conducted with synthetic and real data. Results show that our model has several advantages including a better recovery of the dipole locations. The previous two algorithms consider a fully known leadfield matrix. However, this is seldom the case in practical applications. Instead, this matrix is the result of approximation methods that lead to significant uncertainties. Our third contribution consists of handling the uncertainty of the lead-field matrix. The proposed method consists in expressing this matrix as a function of the skull conductivity using a polynomial matrix interpolation technique. The conductivity is considered as the main source of uncertainty of the lead-field matrix. Our multivariate Bernoulli-Laplacian model is then extended to estimate the skull conductivity jointly with the brain activity. The resulting model is compared to other methods including the techniques of Vallaghé et al and Guttierez et al. Our method provides results of better quality without requiring knowledge of the active dipole positions and is not limited to a single dipole activation.
116

Statistical methods & algorithms for autonomous immunoglobulin repertoire analysis

Norwood, Katherine Frances 13 January 2021 (has links)
Investigating the immunoglobulin repertoire is a means of understanding the adaptive immune response to infectious disease or vaccine challenge. The data examined are typically generated using high-throughput sequencing on samples of immunoglobulin variable-region genes present in blood or tissue collected from human or animal subjects. The analysis of these large, diverse collections provides a means of gaining insight into the specific molecular mechanisms involved in generating and maintaining a protective immune response. It involves the characterization of distinct clonal populations, specifically through the inference of founding alleles for germline gene segment recombination, as well as the lineage of accumulated mutations acquired during the development of each clone. Germline gene segment inference is currently performed by aligning immunoglobulin sequencing reads against an external reference database and assigning each read to the entry that provides the best score according to the metric used. The problem with this approach is that allelic diversity is greater than can be usefully accommodated in a static database. The absence of the alleles used from the database often leads to the misclassification of single-nucleotide polymorphisms as somatic mutations acquired during affinity maturation. This trend is especially evident with the rhesus macaque, but also affects the comparatively well-catalogued human databases, whose collections are biased towards samples from individuals of European descent. Our project presents novel statistical methods for immunoglobulin repertoire analysis which allow for the de novo inference of germline gene segment libraries directly from next-generation sequencing data, without the need for external reference databases. These methods follow a Bayesian paradigm, which uses an information-theoretic modelling approach to iteratively improve upon internal candidate gene segment libraries. Both candidate libraries and trial analyses given those libraries are incorporated as components of the machine learning evaluation procedure, allowing for the simultaneous optimization of model accuracy and simplicity. Finally, the proposed methods are evaluated using synthetic data designed to mimic known mechanisms for repertoire generation, with pre-designated parameters. We also apply these methods to known biological sources with unknown repertoire generation parameters, and conclude with a discussion on how this method can be used to identify potential novel alleles.
117

Adaptive methods for Bayesian time-to-event point-of-care clinical trials

Leatherman, Sarah Michelle 22 January 2016 (has links)
Point-of-care clinical trials are randomized clinical trials designed to maximize pragmatic design features. The goal is to integrate research into standard care such that the burden of research is minimized for patient and physician, including recruitment, randomization and study visits. When possible, these studies employ Bayesian adaptive methods and data collection through the medical record. Due to the passive and adaptive nature of these trials, a number of unique challenges may arise over the course of a study. In this dissertation, adaptive methodology for Bayesian time-to-event clinical trials is developed and evaluated for studies with limited censoring. Use of a normal approximation to the study parameter likelihood is proposed for trials in which the likelihood is not normally distributed and assessed with respect to frequentist type I and II errors. A previously developed method for choosing a normal prior distribution for analysis is applied with modifications to allow for adaptive randomization. This method of prior selection in conjunction with the normal parameter likelihood is used to estimate future data for the purpose of prediction of study success. A previously published method for future event estimation is modified to allow for adaptive randomization and inclusion of prior information. Accuracy of this method is evaluated against final study numbers under a range of study designs and parameter likelihood assumptions. With these future estimates, we predict study conclusions by calculating predicted probabilities of study outcome and compare them to actual study conclusions. Reliability of this method is evaluated considering prior distribution choice, study design, and use of an incorrect likelihood for analysis. The normal approximation to non-normally distributed data performs well here and is reliable when the underlying likelihood is known. The choice of analytic prior distribution agrees with previously published results when equal allocation is forced, but changes depending on the severity of adaptive allocation. Performance of event estimation and prediction vary, but can provide reliable estimates after only 25 subjects have been observed. Analysis and prediction can reliably be carried out in point-of-care studies when care is taken to ensure assumptions are reasonable.
118

Bayesian Network Analysis for Diagnostics and Prognostics of Engineering Systems

Banghart, Marc D 11 August 2017 (has links)
Bayesian networks have been applied to many different domains to perform prognostics, reduce risk and ultimately improve decision making. However, these methods have not been applied to military field and human performance data sets in an industrial environment. Methods frequently rely on a clear understanding of causal connections leading to an undesirable event and detailed understanding of the system behavior. Methods may also require large amount of analyst teams and domain experts, coupled with manual data cleansing and classification. The research performed utilized machine learning algorithms (such as Bayesian networks) and two existing data sets. The primary objective of the research was to develop a diagnostic and prognostic tool utilizing Bayesian networks that does not require the need for detailed causal understanding of the underlying system. The research yielded a predictive method with substantial benefits over reactive methods. The research indicated Bayesian networks can be trained and utilized to predict failure of several important components to include potential malfunction codes and downtime on a real-world Navy data set. The research also considered potential error within the training data set. The results provided credence to utilization of Bayesian networks in real field data – which will always contain error that is not easily quantified. Research should be replicated with additional field data sets from other aircraft. Future research should be conducted to solicit and incorporate domain expertise into subsequent models. Research should also consider incorporation of text based analytics for text fields, which was considered out of scope for this research project.
119

Using Box-Scores to Determine a Position's Contribution to Winning Basketball Games

Page, Garritt L. 16 August 2005 (has links) (PDF)
Basketball is a sport that has become increasingly popular world-wide. At the professional level it is a game in which each of the five positions has a specific responsibility that requires unique skills. It seems likely that it would be valuable for coaches to know which skills for each position are most conducive to winning. Knowing which skills to develop for each position could help coaches optimize each player's ability by customizing practice to contain drills that develop the most important skills for each position that would in turn improve the team's overall ability. Through the use of Bayesian hierarchical modeling and NBA box-score performance categories, this project will determine how each position needs to perform in order for their team to be successful.
120

Graphical and Bayesian Analysis of Unbalanced Patient Management Data

Righter, Emily Stewart 01 March 2007 (has links) (PDF)
The International Normalizing Ratio (INR) measures the speed at which blood clots. Healthy people have an INR of about one. Some people are at greater risk of blood clots and their physician prescribes a target INR range, generally 2-3. The farther a patient is above or below their prescribed range, the more dangerous their situation. A variety of point-of-care (POC) devices has been developed to monitor patients. The purpose of this research was to develop innovative graphics to help describe a highly unbalanced dataset and to carry out Bayesian analyses to determine which of five devices best manages patients. An initial Bayesian analysis compared a machine-identical beta-binomial model to a machine-specific beta-binomial model. The response variable was number of in-range visits. A second Bayesian analysis compared a machine-identical lognormal model, a machine-specific lognormal model, and a machine-specific lognormal model with lower therapeutic bound as a predictor. The response variable was INR. Machines were compared using posterior predictive distributions of the absolute distance outside a patient's therapeutic range. For the beta-binomial models, the machine-identical model had the lower DIC, meaning that POC device was not a strong predictor of success in keeping a patient in-range. The machine-specific lognormal model with a term for lower therapeutic bound had the lowest DIC of the three lognormal models, implying that the additional information of distance out of range revealed differences among the POC devices. Three of the machines had more favorable out-of-range distributions than the other two. Both Bayesian analyses provided useful information for medical practice in managing INR.

Page generated in 0.0358 seconds