Global ETD Search

81	A Gaussian Mixture Model based Level Set Method for Volume Segmentation in Medical Images Webb, Grayson January 2018 (has links) This thesis proposes a probabilistic level set method to be used in segmentation of tumors with heterogeneous intensities. It models the intensities of the tumor and surrounding tissue using Gaussian mixture models. Through a contour based initialization procedure samples are gathered to be used in expectation maximization of the mixture model parameters. The proposed method is compared against a threshold-based segmentation method using MRI images retrieved from The Cancer Imaging Archive. The cases are manually segmented and an automated testing procedure is used to find optimal parameters for the proposed method and then it is tested against the threshold-based method. Segmentation times, dice coefficients, and volume errors are compared. The evaluation reveals that the proposed method has a comparable mean segmentation time to the threshold-based method, and performs faster in cases where the volume error does not exceed 40%. The mean dice coefficient and volume error are also improved while achieving lower deviation. Probablistic level set methods Gaussian mixture models image segmentation volume segmentation medical images Computational Mathematics Beräkningsmatematik
82	On the Construction of an Automatic Traffic Sign Recognition System Jonsson, Fredrik January 2017 (has links) This thesis proposes an automatic road sign recognition system, including all steps from the initial detection of road signs from a digital image to the final recognition step that determines the class of the sign. We develop a Bayesian approach for image segmentation in the detection step using colour information in the HSV (Hue, Saturation and Value) colour space. The image segmentation uses a probability model which is constructed based on manually extracted data on colours of road signs collected from real images. We show how the colour data is fitted using mixture multivariate normal distributions, where for the case of parameter estimation Gibbs sampling is used. The fitted models are then used to find the (posterior) probability of a pixel colour to belong to a road sign using the Bayesian approach. Following the image segmentation, regions of interest (ROIs) are detected by using the Maximally Stable Extremal Region (MSER) algorithm, followed by classification of the ROIs using a cascade of classifiers. Synthetic images are used in training of the classifiers, by applying various random distortions to a set of template images constituting most road signs in Sweden, and we demonstrate that the construction of such synthetic images provides satisfactory recognition rates. We focus on a large set of the signs on the Swedish road network, including almost 200 road signs. We use classification models such as the Support Vector Machine (SVM), and Random Forest (RF), where for features we use Histogram of Oriented Gradients (HOG). traffic sign recognition object detection image classification mixture models Gibbs sampling Probability Theory and Statistics Sannolikhetsteori och statistik
83	Long-term changes in abundances of Sonoran Desert lizards reveal complex responses to climatic variation Flesch, Aaron D., Rosen, Philip C., Holm, Peter 17 August 2017 (has links) Understanding how climatic variation affects animal populations and communities is essential for addressing threats posed by climate change, especially in systems where impacts are projected to be high. We evaluated abundance dynamics of five common species of diurnal lizards over 25 years in a Sonoran Desert transition zone where precipitation decreased and temperature increased across time, and assessed hypotheses for the influence of climatic flux on spatiotemporal variation in abundances. We repeatedly surveyed lizards in spring and summer of each year at up to 32 sites, and used hierarchical mixture models to estimate detection probabilities, abundances, and population growth rates. Among terrestrial species, abundances of a short-lived, winter-spring breeder increased markedly by an estimated 2375285% across time, while two larger spring-summer breeders with higher thermal preferences declined by up to 64%. Abundances of two arboreal species that occupy shaded and thus sheltered microhabitats fluctuated but did not decline systematically. Abundances of all species increased with precipitation at short lag times (151.5 yrs) likely due to enhanced food availability, but often declined after periods of high precipitation at longer lag times (254 yrs) likely due to predation and other biotic pressures. Although rising maximum daily temperatures (Tmax) are expected to drive global declines of lizards, associations with Tmax were variable and weak for most species. Instead, abundances of all species declined with rising daily minimum temperatures, suggesting degradation of cool refugia imposed widespread metabolic or other costs. Our results suggest climate warming and drying are having major impacts on lizard communities by driving declines of species with traits that augment exposure to abiotic extremes and by modifying species interactions. The complexity of patterns we report indicate that evaluating and responding to the influence of climate change on biodiversity must consider a broad array of ecological processes. Aridlands climate change ectotherms N-mixture models population trends precipitation Sonoran Desert temperature
84	Seleção de modelos através de um teste de hipótese genuinamente Bayesiano: misturas de normais multivariadas e hipóteses separadas / Model selection by a genuinely Bayesian significance test: Multivariate normal mixtures and separated hypotheses Marcelo de Souza Lauretto 03 October 2007 (has links) Nesta tese propomos o Full Bayesian Significance Test (FBST), apresentado por Pereira e Stern em 1999, para análise de modelos de misturas de normais multivariadas. Estendemos o conceito de modelos de misturas para explorar outro problema clássico em Estatística, o problema de modelos separados. Nas duas propostas, realizamos experimentos numéricos inspirados em problemas biológicos importantes: o problema de classificação não supervisionada de genes baseada em seus níveis de expressão, e o problema da discriminação entre os modelos Weibull e Gompertz - distribuições clássicas em análise de sobrevivência. / In this thesis we propose the Full Bayesian Significance Test (FBST) as a tool for multivariate normal mixture models. We extend the fundamental mixture concepts to another important problem in Statistics, the problem of separate models. In both methods, we perform numerical experiments based on important biological problems: the unsupervised classification of genes based on their expression profiles, and the problem of deciding between the Weibull and Gompertz models - two classical distributions widely used in survival analysis. hipóteses separadas modelos de misturas testes de significância mixture models separated hypotheses Significance tests
85	Growth and the college readiness of Iowa students : a longitudinal study linking growth to college outcomes Fina, Anthony 01 December 2014 (has links) As current educational policies continue to emphasize the importance of college readiness and growth, it is essential to understand the degree to which test scores collected throughout middle school and high school can provide information to make valid inferences about students' college readiness. This thesis sought to summarize the college readiness of Iowa students, describe the nature of student growth, and clarify the relationship between student growth and college readiness. Together, the results support the validity argument that scores from a general achievement test can be used for measuring student growth and making on-track interpretations about college readiness. Results of analyses on the use of benchmarks as indicators of college readiness are presented first. The analyses showed that the state's general achievement test was just as accurate as the ACT when the criterion was defined by grades in domain-specific, credit-bearing courses. Next, latent growth models and growth mixture models were used to summarize and evaluate longitudinal changes in student achievement and their relationship with college outcomes. A calibration sample representing potential college-bound students was used to set the growth trajectories. Then a cohort of students representing the full student population was used to provide validity evidence in support of the growth trajectories. It was shown that students in the highest-performing group could be considered college ready. Several applications of the growth models are also presented. The typical performance on a variety of college outcomes for each developmental group was presented for the validation sample. A second application illustrated how individual patterns of growth in Grade 8 could be used to predict future class membership in Grade 11. This thesis was predicated on the notion that understanding and documenting the nature of student growth, the college readiness of Iowa students, and the relationship between the two is an important step in improving the college readiness of Iowa students and meeting the future needs of an aligned K-16 educational system. As this study is among the first to examine the relationship between college readiness and student growth using modern latent variable modeling techniques with actual college outcomes, guidelines for future research are presented. publicabstract achievement benchmarks college readiness growth latent growth models mixture models Educational Psychology
86	Analyse statistique de données biologiques à haut débit / Statistical analysis of high-throughput biological data Aubert, Julie 07 February 2017 (has links) Les progrès technologiques des vingt dernières années ont permis l’avènement d'une biologie à haut-débit reposant sur l'obtention de données à grande échelle de façon automatique. Les statisticiens ont un rôle important à jouer dans la modélisation et l'analyse de ces données nombreuses, bruitées, parfois hétérogènes et recueillies à différentes échelles. Ce rôle peut être de plusieurs natures. Le statisticien peut proposer de nouveaux concepts ou méthodes inspirées par les questions posées par cette biologie. Il peut proposer une modélisation fine des phénomènes observés à l'aide de ces technologies. Et lorsque des méthodes existent et nécessitent seulement une adaptation, le rôle du statisticien peut être celui d'un expert, qui connaît les méthodes, leurs limites et avantages. Le travail présenté dans cette thèse se situe à l'interface entre mathématiques appliquées et biologie, et relève plutôt des deuxième et troisième type de rôles mentionnés.Dans une première partie, j’introduis différentes méthodes développées pour l'analyse de données biologiques à haut débit, basées sur des modèles à variables latentes. Ces modèles permettent d'expliquer un phénomène observé à l'aide de variables cachées. Le modèle à variables latentes le plus simple est le modèle de mélange. Les deux premières méthodes présentées en sont des exemples: la première dans un contexte de tests multiples et la deuxième dans le cadre de la définition d'un seuil d'hybridation pour des données issues de puces à ADN. Je présente également un modèle de chaînes de Markov cachées couplées pour la détection de variations du nombre de copies en génomique prenant en compte de la dépendance entre les individus, due par exemple à une proximité génétique. Pour ce modèle, nous proposons une inférence approchée fondée sur une approximation variationnelle, l'inférence exacte ne pouvant pas être envisagée dès lors que le nombre d'individus augmente. Nous définissons également un modèle à blocs latents modélisant une structure sous-jacente par bloc de lignes et colonnes adaptées à des données de comptage issue de l'écologie microbienne. Les données issues de méta-codebarres ou de métagénomique correspondent à l'abondance de chaque unité d'intérêt (par exemple micro-organisme) d'une communauté microbienne au sein d'environnement (rhizosphère de plante, tube digestif humain, océan par exemple). Ces données ont la particularité de présenter une dispersion plus forte qu'attendue sous les modèles les plus classiques (on parle de sur-dispersion). La classification croisée est une façon d'étudier les interactions entre la structure des communautés microbiennes et les échantillons biologiques dont elles sont issues. Nous avons proposé de modéliser ce phénomène à l'aide d'une distribution Poisson-Gamma et développé une autre approximation variationnelle pour ce modèle particulier ainsi qu'un critère de sélection de modèle. La flexibilité et la performance du modèle sont illustrées sur trois jeux de données réelles.Une deuxième partie est consacrée à des travaux dédiés à l'analyse de données de transcriptomique issues des technologies de puce à ADN et de séquençage de l’ARN. La première section concerne la normalisation des données (détection et correction de biais techniques) et présente deux nouvelles méthodes que j’ai proposées avec mes co-auteurs et une comparaison de méthodes à laquelle j’ai contribuée. La deuxième section dédiée à la planification expérimentale présente une méthode pour analyser les dispositifs dit en dye-switch.Dans une dernière partie, je montre à travers deux exemples de collaboration, issues respectivement d'une analyse de gènes différentiellement exprimés à partir de données issues de puces à ADN, et d'une analyse du traductome chez l'oursin à partir de données de séquençage de l'ARN, la façon dont les compétences statistiques sont mobilisées et la plus-value apportée par les statistiques aux projets de génomique. / The technological progress of the last twenty years allowed the emergence of an high-throuput biology basing on large-scale data obtained in a automatic way. The statisticians have an important role to be played in the modelling and the analysis of these numerous, noisy, sometimes heterogeneous and collected at various scales. This role can be from several nature. The statistician can propose new concepts, or new methods inspired by questions asked by this biology. He can propose a fine modelling of the phenomena observed by means of these technologies. And when methods exist and require only an adaptation, the role of the statistician can be the one of an expert, who knows the methods, their limits and the advantages.In a first part, I introduce different methods developed with my co-authors for the analysis of high-throughput biological data, based on latent variables models. These models make it possible to explain a observed phenomenon using hidden or latent variables. The simplest latent variable model is the mixture model. The first two presented methods constitutes two examples: the first in a context of multiple tests and the second in the framework of the definition of a hybridization threshold for data derived from microarrays. I also present a model of coupled hidden Markov chains for the detection of variations in the number of copies in genomics taking into account the dependence between individuals, due for example to a genetic proximity. For this model we propose an approximate inference based on a variational approximation, the exact inference not being able to be considered as the number of individuals increases. We also define a latent-block model modeling an underlying structure per block of rows and columns adapted to count data from microbial ecology. Metabarcoding and metagenomic data correspond to the abundance of each microorganism in a microbial community within the environment (plant rhizosphere, human digestive tract, ocean, for example). These data have the particularity of presenting a dispersion stronger than expected under the most conventional models (we speak of over-dispersion). Biclustering is a way to study the interactions between the structure of microbial communities and the biological samples from which they are derived. We proposed to model this phenomenon using a Poisson-Gamma distribution and developed another variational approximation for this particular latent block model as well as a model selection criterion. The model's flexibility and performance are illustrated on three real datasets.A second part is devoted to work dedicated to the analysis of transcriptomic data derived from DNA microarrays and RNA sequencing. The first section is devoted to the normalization of data (detection and correction of technical biases) and presents two new methods that I proposed with my co-authors and a comparison of methods to which I contributed. The second section devoted to experimental design presents a method for analyzing so-called dye-switch design.In the last part, I present two examples of collaboration, derived respectively from an analysis of genes differentially expressed from microrrays data, and an analysis of translatome in sea urchins from RNA-sequencing data, how statistical skills are mobilized, and the added value that statistics bring to genomics projects. Modèles de mélange Données de comptage Normalisation Analyse différentielle Métagénomique Mixture models Count data Normalization Differential analysis Metagenomics
87	A Gamma-Poisson topic model for short text Mazarura, Jocelyn Rangarirai January 2020 (has links) Most topic models are constructed under the assumption that documents follow a multinomial distribution. The Poisson distribution is an alternative distribution to describe the probability of count data. For topic modelling, the Poisson distribution describes the number of occurrences of a word in documents of fixed length. The Poisson distribution has been successfully applied in text classification, but its application to topic modelling is not well documented, specifically in the context of a generative probabilistic model. Furthermore, the few Poisson topic models in literature are admixture models, making the assumption that a document is generated from a mixture of topics. In this study, we focus on short text. Many studies have shown that the simpler assumption of a mixture model fits short text better. With mixture models, as opposed to admixture models, the generative assumption is that a document is generated from a single topic. One topic model, which makes this one-topic-per-document assumption, is the Dirichlet-multinomial mixture model. The main contributions of this work are a new Gamma-Poisson mixture model, as well as a collapsed Gibbs sampler for the model. The benefit of the collapsed Gibbs sampler derivation is that the model is able to automatically select the number of topics contained in the corpus. The results show that the Gamma-Poisson mixture model performs better than the Dirichlet-multinomial mixture model at selecting the number of topics in labelled corpora. Furthermore, the Gamma-Poisson mixture produces better topic coherence scores than the Dirichlet-multinomial mixture model, thus making it a viable option for the challenging task of topic modelling of short text. The application of GPM was then extended to a further real-world task: that of distinguishing between semantically similar and dissimilar texts. The objective was to determine whether GPM could produce semantic representations that allow the user to determine the relevance of new, unseen documents to a corpus of interest. The challenge of addressing this problem in short text from small corpora was of key interest. Corpora of small size are not uncommon. For example, at the start of the Coronavirus pandemic limited research was available on the topic. Handling short text is not only challenging due to the sparsity of such text, but some corpora, such as chats between people, also tend to be noisy. The performance of GPM was compared to that of word2vec under these challenging conditions on labelled corpora. It was found that the GPM was able to produce better results based on accuracy, precision and recall in most cases. In addition, unlike word2vec, GPM was shown to be applicable on datasets that were unlabelled and a methodology for this was also presented. Finally, a relevance index metric was introduced. This relevance index translates the similarity distance between a corpus of interest and a test document to the probability of the test document to be semantically similar to the corpus of interest. / Thesis (PhD (Mathematical Statistics))--University of Pretoria, 2020. / Statistics / PhD (Mathematical Statistics) / Unrestricted Topic modelling for short text Gamma-poisson mixture mixture models topic modelling document similarity
88	Human and animal classification using Doppler radar Van Eeden, Willem Daniel January 2017 (has links) South Africa is currently struggling to deal with a significant poaching and livestock theft problem. This work is concerned with the detection and classification of ground based targets using radar micro- Doppler signatures to aid in the monitoring of borders, nature reserves and farmlands. The research starts of by investigating the state of the art of ground target classification. Different radar systems are investigated with respect to their ability to classify targets at different operating frequencies. Finally, a Gaussian Mixture Model Hidden Markov Model based (GMM-HMM) classification approach is presented and tested in an operational environment. The GMM-HMM method is compared to methods in the literature and is shown to achieve reasonable (up to 95%) classification accuracy, marginally outperforming existing ground target classification methods. / Dissertation (MEng)--University of Pretoria, 2017. / Electrical, Electronic and Computer Engineering / MEng / Unrestricted UCTD Radar Classification Doppler Hidden Markov models (HMM) Gaussian mixture models (GMM)
89	Primena retke reprezentacije na modelima Gausovih mešavina koji se koriste za automatsko prepoznavanje govora / An application of sparse representation in Gaussian mixture models used inspeech recognition task Jakovljević Nikša 10 March 2014 (has links) <p>U ovoj disertaciji je predstavljen model koji aproksimira pune kova-<br />rijansne matrice u modelu gausovih mešavina (GMM) sa smanjenim<br />brojem parametara i izračunavanja koji su potrebni za izračunavanje<br />izglednosti. U predloženom modelu inverzne kovarijansne matrice su<br />aproksimirane korišćenjem retke reprezentacije njihovih karakteri-<br />stičnih vektora. Pored samog modela prikazan je i algoritam za<br />estimaciju parametara zasnovan na kriterijumu maksimizacije<br />izgeldnosti. Eksperimentalni rezultati na problemu prepoznavanja<br />govora su pokazali da predloženi model za isti nivo greške kao GMM<br />sa upunim kovarijansnim, redukuje broj parametara za 45%.</p> / <p>This thesis proposes a model which approximates full covariance matrices in<br />Gaussian mixture models with a reduced number of parameters and<br />computations required for likelihood evaluations. In the proposed model<br />inverse covariance (precision) matrices are approximated using sparsely<br />represented eigenvectors. A maximum likelihood algorithm for parameter<br />estimation and its practical implementation are presented. Experimental<br />results on a speech recognition task show that while keeping the word error<br />rate close to the one obtained by GMMs with full covariance matrices, the<br />proposed model can reduce the number of parameters by 45%.</p>
90	Spurious Heavy Tails / Falska tunga svansar Segerfors, Ted January 2015 (has links) Since the financial crisis which started in 2007, the risk awareness in the financial sector is greater than ever. Financial institutions such as banks and insurance companies are heavily regulated in order to create a harmonic and resilient global economic environment. Sufficiently large capital buffers may protect institutions from bankruptcy due to some adverse financial events leading to an undesirable outcome for the company. In many regulatory frameworks, the institutions are obliged to estimate high quantiles of their loss distributions. This is relatively unproblematic when large samples of relevant historical data are available. Serious statistical problems appear when only small samples of relevant data are available. One possible solution would be to pool two or more samples that appear to have the same distribution, in order to create a larger sample. This thesis identifies the advantages and risks of pooling of small samples. For some mixtures of normally distributed samples, with what is considered to be the same variances, the pooled data may indicate heavy tails. Since a finite mixture of normally distributed samples has light tails, this is an example of spurious heavy tails. Even though two samples may appear to have the same distribution function it is not necessarily better to pool the samples in order to obtain a larger sample size with the aim of more accurate quantile estimation. For two normally distributed samples of sizes m and n and standard deviations s and v, we find that when v=s is approximately 2, n+m is less than 100 and m=(m+n) is approximately 0.75, then there is a considerable risk of believing that the two samples have equal variance and that the pooled sample has heavy tails. / Efter den finansiella krisen som hade sin start 2007 har riskmedvetenheten inom den finansiella sektorn ökat. Finansiella institutioner så som banker och försäkringsbolag är noga reglerade och kontrollerade för att skapa en stark och stabil världsekonomi. Genom att banker och försäkringsbolag enligt regelverken måste ha kapitalbuffertar som ska skydda mot konkurser vid oväntade och oönskade händelser skapas en mer harmonisk finansiell marknad. Dessa regelverk som institutionerna måste följa innebär ofta att de ansvariga måste skatta höga kvantiler av institutionens förväntade förlustfunktion. Att skapa en pålitligt modell och sedan skatta höga kvantiler är lätt när det finns mycket relevant data tillgänglig. När det inte finns tillr äckligt med historisk data uppkommer statistiska problem. En lösning på problemet är att poola två eller _era grupper av data som ser ut att komma från samma fördelningsfunktion för att på så sätt skapa en större grupp med historisk data tillgänglig. Detta arbetet går igenom fördelar och risker med att poola data när det inte finns tillräckligt med relevant historisk data för att skapa en pålitlig modell. En viss mix av normalfördelade datagrupper som ser ut att ha samma varians kan uppfattas att komma från tungsvansade fördelningar. Eftersom normalfördelningen inte är en tungsvansad fördelning kan denna missuppfattning skapa problem, detta är ett exempel på falska tunga svansar. Även fast två datagrupper ser ut att komma från samma fördelningsfunktion så är det inte nödvändigtvis bättre att poola dessa grupper för att skapa ett större urval. För två normalfördelade datagrupper med storlekarna m och n och standardavvikelserna s och v, är det farligaste scenariot när v=s är ungefär 2, n+m är mindre än 100 och m=(m+n)är ungefär 0.75. När detta inträffar finns det en signifikant risk att de två datagrupperna ser ut att komma från samma fördelningsfunktion och att den poolade datan innehar tungsvansade egenskaper. Small samples Tail index estimation Normal mixture models Heavy tails Mathematical Analysis Matematisk analys

Search results