421 |
Bayesian Phylogenetics and the Evolution of Gall WaspsNylander, Johan A. A. January 2004 (has links)
This thesis concerns the phylogenetic relationships and the evolution of the gall-inducing wasps belonging to the family Cynipidae. Several previous studies have used morphological data to reconstruct the evolution of the family. DNA sequences from several mitochondrial and nuclear genes where obtained and the first molecular, and combined molecular and morphological, analyses of higher-level relationships in the Cynipidae is presented. A Bayesian approach to data analysis is adopted, and models allowing combined analysis of heterogeneous data, such as multiple DNA data sets and morphology, are developed. The performance of these models is evaluated using methods that allow the estimation of posterior model probabilities, thus allowing selection of most probable models for the use in phylogenetics. The use of Bayesian model averaging in phylogenetics, as opposed to model selection, is also discussed. It is shown that Bayesian MCMC analysis deals efficiently with complex models and that morphology can influence combined-data analyses, despite being outnumbered by DNA data. This emphasizes the utility and potential importance of using morphological data in statistical analyses of phylogeny. The DNA-based and combined-data analyses of cynipid relationships differ from previous studies in two important respects. First, it was previously believed that there was a monophyletic clade of woody rosid gallers but the new results place the non-oak gallers in this assemblage (tribes Pediaspidini, Diplolepidini, and Eschatocerini) outside the rest of the Cynipidae. Second, earlier studies have lent strong support to the monophyly of the inquilines (tribe Synergini), gall wasps that develop inside the galls of other species. The new analyses suggest that the inquilines either originated several times independently, or that some inquilines secondarily regained the ability to induce galls. Possible reasons for the incongruence between morphological and DNA data is discussed in terms of heterogeneity in evolutionary rates among lineages, and convergent evolution of morphological characters.
|
422 |
Semi-automated search for abnormalities in mammographic X-ray imagesBarnett, Michael Gordon 24 October 2006
Breast cancer is the most commonly diagnosed cancer among Canadian women; x-ray mammography is the leading screening technique for early detection. This work introduces a semi-automated technique for analyzing mammographic x-ray images to measure their degree of suspiciousness for containing abnormalities. The designed system applies the discrete wavelet transform to parse the images and extracts statistical features that characterize an images content, such as the mean intensity and the skewness of the intensity. A naïve Bayesian classifier uses these features to classify the images, achieving sensitivities as high as 99.5% for a data set containing 1714 images. To generate confidence levels, multiple classifiers are combined in three possible ways: a sequential series of classifiers, a vote-taking scheme of classifiers, and a network of classifiers tuned to detect particular types of abnormalities. The third method offers sensitivities of 99.85% or higher with specificities above 60%, making it an ideal candidate for pre-screening images. Two confidence level measures are developed: first, a real confidence level measures the true probability that an image was suspicious; and second, a normalized confidence level assumes that normal and suspicious images were equally likely to occur. The second confidence measure allows for more flexibility and could be combined with other factors, such as patient age and family history, to give a better true confidence level than assuming a uniform incidence rate. The system achieves sensitivities exceeding those in other current approaches while maintaining reasonable specificity, especially for the sequential series of classifiers and for the network of tuned classifiers.
|
423 |
Bayesian signal processing techniques for GNSS receivers: from multipath mitigation to positioningClosas Gómez, Pau 15 June 2009 (has links)
Aquesta tesi gira al voltant del disseny de receptors per a sistemes globals de navegació per satèl·lit (Global Navigation Satellite Systems, GNSS). El terme GNSS fa referència a tots aquells sistemes de navegació basats en una constel·lació de satèl·lits que emeten senyals de navegació útils per a posicionament. El més popular és l'americà GPS, emprat globalment. Els esforços d'Europa per a tenir un sistema similar veuran el seu fruit en un futur proper, el sistema s'anomena Galileo. Altres sistemes globals i regionals existeixen dissenyats per al mateix objectiu: calcular la posició dels receptors. Inicialment la tesi presenta l'estat de l'art en GNSS, a nivell de l'estructura dels actuals senyals de navegació i pel que fa a l'arquitectura dels receptors.El disseny d'un receptor per a GNSS consta d'un seguit de blocs funcionals. Començant per l'antena receptora fins al càlcul final de la posició del receptor, el disseny proporciona una gran motivació per a la recerca en diversos àmbits. Tot i que la cadena de Radiofreqüència del receptor també és comentada a la tesis, l'objectiu principal de la recerca realitzada recau en els algorismes de processament de senyal emprats un cop realitzada la digitalització del senyal rebut. En un receptor per a GNSS, aquests algorismes es poden dividir en dues classes: els de sincronisme i els de posicionament. Aquesta classificació correspon als dos grans processos que típicament realitza el receptor. Primer, s'estima la distancia relativa entre el receptor i el conjunt de satèl·lits visibles. Aquestes distancies es calculen estimant el retard patit pel senyal des de que és emès pel corresponent satèl·lit fins que és rebut pel receptor. De l'estimació i seguiment del retard se n'encarrega l'algorisme de sincronisme. Un cop calculades la distancies relatives als satèl·lits, multiplicant per la velocitat de la llum el retards estimats, l'algorisme de posicionament pot operar. El posicionament es realitza típicament pel procés de trilateralització: intersecció del conjunt d'esferes centrades als satèl·lits visibles i de radi les distancies estimades relatives al receptor GNSS. Així doncs, sincronització i posicionament es realitzen de forma seqüencial i ininterrompudament. La tesi fa contribucions a ambdues parts, com explicita el subtítol del document.Per una banda, la tesi investiga l'ús del filtrat Bayesià en el seguiment dels paràmetres de sincronisme (retards, desviaments Doppler i phases de portadora) del senyal rebut. Una de les fonts de degradació de la precisió en receptors GNSS és la presència de repliques del senyal directe, degudes a rebots en obstacles propers. És per això que els algorismes proposats en aquesta part de la tesi tenen com a objectiu la mitigació de l'efecte multicamí. La dissertació realitza una introducció dels fonaments teòrics del filtrat Bayesià, incloent un recull dels algorismes més populars. En particular, el Filtrat de Partícules (Particle Filter, PF) s'estudia com una de les alternatives més interessants actualment per a enfrontar-se a sistemes no-lineals i/o no-Gaussians. Els PF són mètodes basats en el mètode de Monte Carlo que realitzen una caracterització discreta de la funció de probabilitat a posteriori del sistema. Al contrari d'altres mètodes basats en simulacions, els PF tenen resultats de convergència que els fan especialment atractius en casos on la solució òptima no es pot trobar. En aquest sentit es proposa un PF que incorpora un seguit de característiques que el fan assolir millors prestacions i robustesa que altres algorismes, amb un nombre de partícules reduït. Per una banda, es fa un seguiment dels estats lineals del sistema mitjançant un Filtre de Kalman (KF), procediment conegut com a Rao-Blackwellization. Aquest fet provoca que la variància de les partícules decreixi i que un menor nombre d'elles siguin necessàries per a assolir una certa precisió en l'estimació de la distribució a posteriori. D'altra banda, un dels punts crítics en el disseny de PF és el disseny d'una funció d'importància (emprada per a generar les partícules) similar a l'òptima, que resulta ésser el posterior. Aquesta funció òptima no està disponible en general. En aquesta tesi, es proposa una aproximació de la funció d'importància òptima basada en el mètode de Laplace. Paral·lelament es proposen algorismes com l'Extended Kalman Filter (EKF) i l'Unscented Kalman Filter (UKF), comparant-los amb el PF proposat mitjançant simulacions numèriques.Per altra banda, la presentació d'un nou enfocament al problema del posicionament és una de les aportacions originals de la tesi. Si habitualment els receptors operen en dos passos (sincronització i posicionament), la proposta de la tesi rau en l'Estimació Directa de la Posició (Direct Position Estimation, DPE) a partir del senyal digital. Tenint en compte la novetat del mètode, es proporcionen motivacions qualitatives i quantitatives per a l'ús de DPE enfront al mètode convencional de posicionament. Se n'ha estudiat l'estimador de màxima versemblança (Maximum Likelihood, ML) i un algorisme per a la seva implementació pràctica basat en l'algorisme Accelerated Random Search (ARS). Els resultats de les simulacions numèriques mostren la robustesa de DPE a escenaris on el mètode convencional es veu degradat, com per exemple el cas d'escenaris rics en multicamí. Una de les reflexions fruit dels resultats és que l'ús conjunt dels senyals provinents dels satèl·lits visibles proporciona millores en l'estimació de la posició, doncs cada senyal està afectada per un canal de propagació independent. La tesi també presenta l'extensió de DPE dins el marc Bayesià: Bayesian DPE (BDPE). BDPE manté la filosofia de DPE, tot incloent-hi possibles fonts d'informació a priori referents al moviment del receptor. Es comenten algunes de les opcions com l'ús de sistemes de navegació inercials o la inclusió d'informació atmosfèrica. Tot i així, cal tenir en compte que la llista només està limitada per la imaginació i l'aplicació concreta on el marc BDPE s'implementi.Finalment, la tesi els límits teòrics en la precisió dels receptors GNSS. Alguns d'aquests límits teòrics eren ja coneguts, d'altres veuen ara la llum. El límit de Cramér-Rao (Cramér-Rao Bound, CRB) ens prediu la mínima variància que es pot obtenir en estimar un paràmetre mitjançant un estimador no esbiaixat. La tesi recorda el CRB dels paràmetres de sincronisme, resultat ja conegut. Una de les aportacions és la derivació del CRB de l'estimador de la posició pel cas convencional i seguint la metodologia DPE. Aquests resultats proporcionen una comparativa asimptòtica dels dos procediments pel posicionament de receptors GNSS. D'aquesta manera, el CRB de sincronisme pel cas Bayesià (Posterior Cramér-Rao Bound, PCRB) es presenta, com a límit teòric dels filtres Bayesians proposats en la tesi. / This dissertation deals with the design of satellite-based navigation receivers. The term Global Navigation Satellite Systems (GNSS) refers to those navigation systems based on a constellation of satellites, which emit ranging signals useful for positioning. Although the american GPS is probably the most popular, the european contribution (Galileo) will be operative soon. Other global and regional systems exist, all with the same objective: aid user's positioning. Initially, the thesis provides the state-of-the-art in GNSS: navigation signals structure and receiver architecture. The design of a GNSS receiver consists of a number of functional blocks. From the antenna to the final position calculation, the design poses challenges in many research areas. Although the Radio Frequency chain of the receiver is commented in the thesis, the main objective of the dissertation is on the signal processing algorithms applied after signal digitation. These algorithms can be divided into two: synchronization and positioning. This classification corresponds to the two main processes typically performed by a GNSS receiver. First, the relative distance between the receiver and the set of visible satellites is estimated. These distances are calculated after estimating the delay suffered by the signal traveling from its emission at the corresponding satellite to its reception at the receiver's antenna. Estimation and tracking of these parameters is performed by the synchronization algorithm. After the relative distances to the satellites are estimated, the positioning algorithm starts its operation. Positioning is typically performed by a process referred to as trilateration: intersection of a set of spheres centered at the visible satellites and with radii the corresponding relative distances. Therefore, synchronization and positioning are processes performed sequentially and in parallel. The thesis contributes to both topics, as expressed by the subtitle of the dissertation.On the one hand, the thesis delves into the use of Bayesian filtering for the tracking of synchronization parameters (time-delays, Doppler-shifts and carrier-phases) of the received signal. One of the main sources of error in high precision GNSS receivers is the presence of multipath replicas apart from the line-of-sight signal (LOSS). Wherefore the algorithms proposed in this part of the thesis aim at mitigating the multipath effect on synchronization estimates. The dissertation provides an introduction to the basics of Bayesian filtering, including a compendium of the most popular algorithms. Particularly, Particle Filters (PF) are studied as one of the promising alternatives to deal with nonlinear/nonGaussian systems. PF are a set of simulation-based algorithms, based on Monte-Carlo methods. PF provide a discrete characterization of the posterior distribution of the system. Conversely to other simulation-based methods, PF are supported by convergence results which make them attractive in cases where the optimal solution cannot be analytically found. In that vein, a PF that incorporates a set of features to enhance its performance and robustness with a reduced number of particles is proposed. First, the linear part of the system is optimally handled by a Kalman Filter (KF), procedure referred to as Rao-Blackwellization. The latter causes a reduction on the variance of the particles and, thus, a reduction on the number of required particles to attain a given accuracy when characterizing the posterior distribution. A second feature is the design of an importance density function (from which particles are generated) close to the optimal, not available in general. The selection of this function is typically a key issue in PF designs. The dissertation proposes an approximation of the optimal importance function using Laplace's method. In parallel, Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF) algorithms are considered, comparing these algorithms with the proposed PF by computer simulations.On the other hand, a novel point of view in the positioning problem constitutes one of the original contributions of the thesis. Whereas conventional receivers operate in a two-steps procedure (synchronization and positioning), the proposal of the thesis is a Direct Position Estimation (DPE) from the digitized signal. Considering the novelty of the approach, the dissertation provides both qualitative and quantitative motivations for the use of DPE instead of the conventional two-steps approach. DPE is studied following the Maximum Likelihood (ML) principle and an algorithm based on the Accelerated Random Search (ARS) is considered for a practical implementation of the derived estimator. Computer simulation results carried show the robustness of DPE in scenarios where the conventional approach fails, for instance in multipath-rich scenarios. One of the conclusions of the thesis is that joint processing of satellite's signals provides enhance positioning performances, due to the independent propagation channels between satellite links. The dissertation also presents the extension of DPE to the Bayesian framework: Bayesian DPE (BDPE). BDPE maintains DPE's philosophy, including the possibility of accounting for sources of side/prior information. Some examples are given, such as the use of Inertial Measurement Systems and atmospheric models. Nevertheless, we have to keep in mind that the list is only limited by imagination and the particular applications were BDPE is implemented. Finally, the dissertation studied the theoretical lower bounds of accuracy of GNSS receivers. Some of those limits were already known, others see the light as a result of the research reported in the dissertation. The Cramér-Rao Bound (CRB) is the theoretical lower bound of accuracy of any unbiased estimator of a parameter. The dissertation recalls the CRB of synchronization parameters, result already known. A novel contribution ofthe thesis is the derivation of the CRB of the position estimator for either conventional and DPE approaches. These results provide an asymptotical comparison of both GNSS positioning approaches. Similarly, the CRB of synchronization parameters for the Bayesian case (Posterior Cramér-Rao Bound, PCRB) is given, used as a fundamental limit of the Bayesian filters proposed in the thesis.
|
424 |
Prediction of recurrent eventsFredette, Marc January 2004 (has links)
In this thesis, we will study issues related to prediction problems and put an emphasis on those arising when recurrent events are involved. First we define the basic concepts of frequentist and Bayesian statistical prediction in the first chapter. In the second chapter, we study frequentist prediction intervals and their associated predictive distributions. We will then present an approach based on asymptotically uniform pivotals that is shown to dominate the plug-in approach under certain conditions. The following three chapters consider the prediction of recurrent events. The third chapter presents different prediction models when these events can be modeled using homogeneous Poisson processes. Amongst these models, those using random effects are shown to possess interesting features. In the fourth chapter, the time homogeneity assumption is relaxed and we present prediction models for non-homogeneous Poisson processes. The behavior of these models is then studied for prediction problems with a finite horizon. In the fifth chapter, we apply the concepts discussed previously to a warranty dataset coming from the automobile industry. The number of processes in this dataset being very large, we focus on methods providing computationally rapid prediction intervals. Finally, we discuss the possibilities of future research in the last chapter.
|
425 |
Recursive Residuals and Model Diagnostics for Normal and Non-Normal State Space ModelsFrühwirth-Schnatter, Sylvia January 1994 (has links) (PDF)
Model diagnostics for normal and non-normal state space models is based on recursive residuals which are defined from the one-step ahead predictive distribution. Routine calculation of these residuals is discussed in detail. Various tools of diagnostics are suggested to check e.g. for wrong observation distributions and for autocorrelation. The paper also covers such topics as model diagnostics for discrete time series, model diagnostics for generalized linear models, and model discrimination via Bayes factors. (author's abstract) / Series: Forschungsberichte / Institut für Statistik
|
426 |
Prediction of recurrent eventsFredette, Marc January 2004 (has links)
In this thesis, we will study issues related to prediction problems and put an emphasis on those arising when recurrent events are involved. First we define the basic concepts of frequentist and Bayesian statistical prediction in the first chapter. In the second chapter, we study frequentist prediction intervals and their associated predictive distributions. We will then present an approach based on asymptotically uniform pivotals that is shown to dominate the plug-in approach under certain conditions. The following three chapters consider the prediction of recurrent events. The third chapter presents different prediction models when these events can be modeled using homogeneous Poisson processes. Amongst these models, those using random effects are shown to possess interesting features. In the fourth chapter, the time homogeneity assumption is relaxed and we present prediction models for non-homogeneous Poisson processes. The behavior of these models is then studied for prediction problems with a finite horizon. In the fifth chapter, we apply the concepts discussed previously to a warranty dataset coming from the automobile industry. The number of processes in this dataset being very large, we focus on methods providing computationally rapid prediction intervals. Finally, we discuss the possibilities of future research in the last chapter.
|
427 |
Semi-automated search for abnormalities in mammographic X-ray imagesBarnett, Michael Gordon 24 October 2006 (has links)
Breast cancer is the most commonly diagnosed cancer among Canadian women; x-ray mammography is the leading screening technique for early detection. This work introduces a semi-automated technique for analyzing mammographic x-ray images to measure their degree of suspiciousness for containing abnormalities. The designed system applies the discrete wavelet transform to parse the images and extracts statistical features that characterize an images content, such as the mean intensity and the skewness of the intensity. A naïve Bayesian classifier uses these features to classify the images, achieving sensitivities as high as 99.5% for a data set containing 1714 images. To generate confidence levels, multiple classifiers are combined in three possible ways: a sequential series of classifiers, a vote-taking scheme of classifiers, and a network of classifiers tuned to detect particular types of abnormalities. The third method offers sensitivities of 99.85% or higher with specificities above 60%, making it an ideal candidate for pre-screening images. Two confidence level measures are developed: first, a real confidence level measures the true probability that an image was suspicious; and second, a normalized confidence level assumes that normal and suspicious images were equally likely to occur. The second confidence measure allows for more flexibility and could be combined with other factors, such as patient age and family history, to give a better true confidence level than assuming a uniform incidence rate. The system achieves sensitivities exceeding those in other current approaches while maintaining reasonable specificity, especially for the sequential series of classifiers and for the network of tuned classifiers.
|
428 |
Essays on econometric modeling of subjective perceptions of risks in environment and human healthNguyen, To Ngoc 15 May 2009 (has links)
A large body of literature studies the issues of the option price and other ex-ante
welfare measures under the microeconomic theory to valuate reductions of risks inherent
in environment and human health. However, it does not offer a careful discussion of how
to estimate risk reduction values using data, especially the modeling and estimating
individual perceptions of risks present in the econometric models. The central theme of
my dissertation is the approaches taken for the empirical estimation of probabilistic risks
under alternative assumptions about individual perceptions of risk involved: the
objective probability, the Savage subjective probability, and the subjective distributions
of probability. Each of these three types of risk specifications is covered in one of the
three essays.
The first essay addresses the problem of empirical estimation of individual
willingness to pay for recreation access to public land under uncertainty. In this essay I
developed an econometric model and applied it to the case of lottery-rationed hunting
permits. The empirical result finds that the model correctly predicts the responses of
84% of the respondents in the Maine moose hunting survey.
The second essay addresses the estimation of a logit model for individual binary
choices that involve heterogeneity in subjective probabilities. For this problem, I
introduce the use of the hierarchical Bayes to estimate, among others, the parameters of
distribution of subjective probabilities. The Monte Carlo study finds the estimator
asymptotically unbiased and efficient. The third essay addresses the problem of modeling perceived mortality risks
from arsenic concentrations in drinking water. I estimated a formal model that allows for
ambiguity about risk. The empirical findings revealed that perceived risk was positively
associated with exposure levels and also related individuating factors, in particular
smoking habits and one’s current health status. Further evidence was found that the
variance of the perceived risk distribution is non-zero.
In all, the three essays contribute methodological approaches and provide
empirical examples for developing empirical models and estimating value of risk
reductions in environment and human health, given the assumption about the
individual’s perceptions of risk, and accordingly, the reasonable specifications of risks
involved in the models.
|
429 |
A Clustering-based Approach to Document-Category IntegrationCheng, Tsang-Hsiang 04 September 2003 (has links)
E-commerce applications generate and consume tremendous amount of online information that is typically available as textual documents. Observations of textual document management practices by organizations or individuals suggest the popularity of using categories (or category hierarchies) to organize, archive and access documents. On the other hand, an organization (or individual) also constantly acquires new documents from various Internet sources. Consequently, integration of relevant categorized documents into existent categories of the organization (or individual) becomes an important issue in the e-commerce era. Existing categorization-based approach for document-category integration (specifically, the Enhanced Naïve Bayes classifier) incurs several limitations, including homogeneous assumption on categorization schemes used by master and source catalogs and requirement for a large-sized master categories as training data. In this study, we developed a Clustering-based Category Integration (CCI) technique to deal with integrating two document catalogs each of which is organized non-hierarchically (i.e., in a flat set). Using the Enhanced Naïve Bayes classifier as benchmarks, the empirical evaluation results showed that the proposed CCI technique appeared to improve the effectiveness of document-category integration accuracy in different integration scenarios and seemed to be less sensitive to the size of master categories than the categorization-based approach.
Furthermore, to integrate the document categories that are organized hierarchically, we proposed a Clustering-based category-Hierarchy Integration (referred to as CHI) technique extended the CCI technique and for category-hierarchy integration. The empirical evaluation results showed that the CHI technique appeared to improve the effectiveness of hierarchical document-category integration than that attained by CCI under homogeneous and comparable scenarios.
|
430 |
Bayesian Semiparametric Models For Nonignorable Missing Datamechanisms In Logistic RegressionOzturk, Olcay 01 May 2011 (has links) (PDF)
In this thesis, Bayesian semiparametric models for the missing data mechanisms of nonignorably missing covariates in logistic regression are developed. In the missing data literature, fully parametric approach is used to model the nonignorable missing data mechanisms. In that approach, a probit or a logit link of the conditional probability of the covariate being missing is modeled as a linear combination of all variables including the missing covariate itself. However, nonignorably missing covariates may not be linearly related with the probit (or logit) of this conditional probability. In our study, the relationship between the probit of the probability of the covariate being missing and the missing covariate itself is modeled by using a penalized spline regression based semiparametric approach. An efficient Markov chain Monte Carlo (MCMC) sampling algorithm to estimate the parameters is established. A WinBUGS code is constructed to sample from the full conditional posterior distributions of the parameters by using Gibbs sampling. Monte Carlo simulation experiments under different true missing data mechanisms are applied to compare the bias and efficiency properties of the resulting estimators with the ones from the fully parametric approach. These simulations show that estimators for logistic regression using semiparametric missing data models maintain better bias and efficiency properties than the ones using fully parametric missing data models when the true relationship between the missingness and the missing covariate has a nonlinear form. They are comparable when this relationship has a linear form.
|
Page generated in 0.0359 seconds