• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 160
  • 45
  • 32
  • 16
  • 4
  • 4
  • 4
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 312
  • 312
  • 80
  • 54
  • 52
  • 49
  • 44
  • 42
  • 42
  • 42
  • 35
  • 34
  • 32
  • 28
  • 26
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Robust estimation of the number of components for mixtures of linear regression

Meng, Li January 1900 (has links)
Master of Science / Department of Statistics / Weixin Yao / In this report, we investigate a robust estimation of the number of components in the mixture of regression models using trimmed information criterion. Compared to the traditional information criterion, the trimmed criterion is robust and not sensitive to outliers. The superiority of the trimmed methods in comparison with the traditional information criterion methods is illustrated through a simulation study. A real data application is also used to illustrate the effectiveness of the trimmed model selection methods.
52

Machine learning approaches for assessing moderate-to-severe diarrhea in children < 5 years of age, rural western Kenya 2008-2012

Ayers, Tracy L 13 May 2016 (has links)
Worldwide diarrheal disease is a leading cause of morbidity and mortality in children less than five years of age. Incidence and disease severity remain the highest in sub-Saharan Africa. Kenya has an estimated 400,000 severe diarrhea episodes and 9,500 diarrhea-related deaths per year in children. Current statistical methods for estimating etiological and exposure risk factors for moderate-to-severe diarrhea (MSD) in children are constrained by the inability to assess a large number of parameters due to limitations of sample size, complex relationships, correlated predictors, and model assumptions of linearity. This dissertation examines machine learning statistical methods to address weaknesses associated with using traditional logistic regression models. The studies presented here investigate data from a 4-year, prospective, matched case-control study of MSD among children less than five years of age in rural Kenya from the Global Enteric Multicenter Study. The three machine learning approaches were used to examine associations with MSD and include: least absolute shrinkage and selection operator, classification trees, and random forest. A principal finding in all three studies was that machine learning methodological approaches are useful and feasible to implement in epidemiological studies. All provided additional information and understanding of the data beyond using only logistic regression models. The results from all three machine learning approaches were supported by comparable logistic regression results indicating their usefulness as epidemiological tools. This dissertation offers an exploration of methodological alternatives that should be considered more frequently in diarrheal disease epidemiology, and in public health in general.
53

Fully Bayesian Analysis of Multivariate Latent Class Models with an Application to Metric Conjoint Analysis

Frühwirth-Schnatter, Sylvia, Otter, Thomas, Tüchler, Regina January 2000 (has links) (PDF)
In this paper we head for a fully Bayesian analysis of the latent class model with a priori unknown number of classes. Estimation is carried out by means of Markov Chain Monte Carlo (MCMC) methods. We deal explicitely with the consequences the unidentifiability of this type of model has on MCMC estimation. Joint Bayesian estimation of all latent variables, model parameters, and parameters determining the probability law of the latent process is carried out by a new MCMC method called permutation sampling. In a first run we use the random permutation sampler to sample from the unconstrained posterior. We will demonstrate that a lot of important information, such as e.g. estimates of the subject-specific regression coefficients, is available from such an unidentified model. The MCMC output of the random permutation sampler is explored in order to find suitable identifiability constraints. In a second run we use the permutation sampler to sample from the constrained posterior by imposing identifiablity constraints. The unknown number of classes is determined by formal Bayesian model comparison through exact model likelihoods. We apply a new method of computing model likelihoods for latent class models which is based on the method of bridge sampling. The approach is applied to simulated data and to data from a metric conjoint analysis in the Austrian mineral water market. (author's abstract) / Series: Forschungsberichte / Institut für Statistik
54

Narrowing the gap between network models and real complex systems

Viamontes Esquivel, Alcides January 2014 (has links)
Simple network models that focus only on graph topology or, at best, basic interactions are often insufficient to capture all the aspects of a dynamic complex system. In this thesis, I explore those limitations, and some concrete methods of resolving them. I argue that, in order to succeed at interpreting and influencing complex systems, we need to take into account  slightly more complex parts, interactions and information flows in our models.This thesis supports that affirmation with five actual examples of applied research. Each study case takes a closer look at the dynamic of the studied problem and complements the network model with techniques from information theory, machine learning, discrete maths and/or ergodic theory. By using these techniques to study the concrete dynamics of each system, we could obtain interesting new information. Concretely, we could get better models of network walks that are used on everyday applications like journal ranking. We could also uncover asymptotic characteristics of an agent-based information propagation model which we think is the basis for things like belief propaga-tion or technology adoption on society. And finally, we could spot associations between antibiotic resistance genes in bacterial populations, a problem which is becoming more serious every day.
55

Model selection strategies in genome-wide association studies

Keildson, Sarah January 2011 (has links)
Unravelling the genetic architecture of common diseases is a continuing challenge in human genetics. While genome-wide association studies (GWAS) have proven to be successful in identifying many new disease susceptibility loci, the extension of these studies beyond single-SNP methods of analysis has been limited. The incorporation of multi-locus methods of analysis may, however, increase the power of GWAS to detect genes of smaller effect size, as well as genes that interact with each other and the environment. This investigation carried out large-scale simulations of four multi-locus model selection techniques; namely forward and backward selection, Bayesian model averaging (BMA) and least angle regression with a lasso modification (lasso), in order to compare the type I error rates and power of each method. At a type I error rate of ~5%, lasso showed the highest power across varied effect sizes, disease frequencies and genetic models. Lasso penalized regression was then used to perform three different types of analysis on GWAS data. Firstly, lasso was applied to the Wellcome Trust Case Control Consortium (WTCCC) data and identified many of the WTCCC SNPs that had a moderate-strong association (p<10-5) type 2 diabetes (T2D), as well as some of the moderate WTCCC associations (p<10-4) that have since been replicated in a large-scale meta-analysis. Secondly, lasso was used to fine-map the 17q21 childhood asthma risk locus and identified putative secondary signals in the 17q21 region, that may further contribute to childhood asthma risk. Finally, lasso identified three potential interaction effects potentially contributing towards coronary artery disease (CAD) risk. While the validity of these findings hinges on their replication in follow-up studies, the results suggest that lasso may provide scientists with exciting new methods of dissecting, and ultimately understanding, the complex genetic framework underlying common human diseases.
56

Statistical Models and Analysis of Growth Processes in Biological Tissue

Xia, Jun 15 December 2016 (has links)
The mechanisms that control growth processes in biology tissues have attracted continuous research interest despite their complexity. With the emergence of big data experimental approaches there is an urgent need to develop statistical and computational models to fit the experimental data and that can be used to make predictions to guide future research. In this work we apply statistical methods on growth process of different biological tissues, focusing on development of neuron dendrites and tumor cells. We first examine the neuron cell growth process, which has implications in neural tissue regenerations, by using a computational model with uniform branching probability and a maximum overall length constraint. One crucial outcome is that we can relate the parameter fits from our model to real data from our experimental collaborators, in order to examine the usefulness of our model under different biological conditions. Our methods can now directly compare branching probabilities of different experimental conditions and provide confidence intervals for these population-level measures. In addition, we have obtained analytical results that show that the underlying probability distribution for this process follows a geometrical progression increase at nearby distances and an approximately geometrical series decrease for far away regions, which can be used to estimate the spatial location of the maximum of the probability distribution. This result is important, since we would expect maximum number of dendrites in this region; this estimate is related to the probability of success for finding a neural target at that distance during a blind search. We then examined tumor growth processes which have similar evolutional evolution in the sense that they have an initial rapid growth that eventually becomes limited by the resource constraint. For the tumor cells evolution, we found an exponential growth model best describes the experimental data, based on the accuracy and robustness of models. Furthermore, we incorporated this growth rate model into logistic regression models that predict the growth rate of each patient with biomarkers; this formulation can be very useful for clinical trials. Overall, this study aimed to assess the molecular and clinic pathological determinants of breast cancer (BC) growth rate in vivo.
57

Considerations for Screening Designs and Follow-Up Experimentation

Leonard, Robert D 01 January 2015 (has links)
The success of screening experiments hinges on the effect sparsity assumption, which states that only a few of the factorial effects of interest actually have an impact on the system being investigated. The development of a screening methodology to harness this assumption requires careful consideration of the strengths and weaknesses of a proposed experimental design in addition to the ability of an analysis procedure to properly detect the major influences on the response. However, for the most part, screening designs and their complementing analysis procedures have been proposed separately in the literature without clear consideration of their ability to perform as a single screening methodology. As a contribution to this growing area of research, this dissertation investigates the pairing of non-replicated and partially–replicated two-level screening designs with model selection procedures that allow for the incorporation of a model-independent error estimate. Using simulation, we focus attention on the ability to screen out active effects from a first order with two-factor interactions model and the possible benefits of using partial replication as part of an overall screening methodology. We begin with a focus on single-criterion optimum designs and propose a new criterion to create partially replicated screening designs. We then extend the newly proposed criterion into a multi-criterion framework where estimation of the assumed model in addition to protection against model misspecification are considered. This is an important extension of the work since initial knowledge of the system under investigation is considered to be poor in the cases presented. A methodology to reduce a set of competing design choices is also investigated using visual inspection of plots meant to represent uncertainty in design criterion preferences. Because screening methods typically involve sequential experimentation, we present a final investigation into the screening process by presenting simulation results which incorporate a single follow-up phase of experimentation. In this concluding work we extend the newly proposed criterion to create optimal partially replicated follow-up designs. Methodologies are compared which use different methods of incorporating knowledge gathered from the initial screening phase into the follow-up phase of experimentation.
58

Automated Support for Model Selection Using Analytic Hierarchy Process

Missakian, Mario Sarkis 01 January 2011 (has links)
Providing automated support for model selection is a significant research challenge in model management. Organizations maintain vast growing repositories of analytical models, typically in the form of spreadsheets. Effective reuse of these models could result in significant cost savings and improvements in productivity. However, in practice, model reuse is severely limited by two main challenges: (1) lack of relevant information about the models maintained in the repository, and (2) lack of end user knowledge that prevents them from selecting appropriate models for a given problem solving task. This study built on the existing model management literature to address these research challenges. First, this research captured the relevant meta-information about the models. Next, it identified the features based on which models are selected. Finally, it used Analytic Hierarchy Process (AHP) to select the most appropriate model for any specified problem. AHP is an established method for multi-criteria decision-making that is suitable for the model selection task. To evaluate the proposed method for automated model selection, this study developed a simulated prototype system that implemented this method and tested it in two realistic end-user model selection scenarios based on previously benchmarked test problems.
59

Modelos de regressão sobre dados composicionais / Regression model for Compositional data

Camargo, André Pierro de 09 December 2011 (has links)
Dados composicionais são constituídos por vetores cujas componentes representam as proporções de algum montante, isto é: vetores com entradas positivas cuja soma é igual a 1. Em diversas áreas do conhecimento, o problema de estimar as partes $y_1, y_2, \\dots, y_D$ correspondentes aos setores $SE_1, SE_2, \\dots, SE_D$, de uma certa quantidade $Q$, aparece com frequência. As porcentagens $y_1, y_2, \\dots, y_D$ de intenção de votos correspondentes aos candidatos $Ca_1, Ca_2, \\dots, Ca_D$ em eleições governamentais ou as parcelas de mercado correspondentes a industrias concorrentes formam exemplos típicos. Naturalmente, é de grande interesse analisar como variam tais proporções em função de certas mudanças contextuais, por exemplo, a localização geográfica ou o tempo. Em qualquer ambiente competitivo, informações sobre esse comportamento são de grande auxílio para a elaboração das estratégias dos concorrentes. Neste trabalho, apresentamos e discutimos algumas abordagens propostas na literatura para regressão sobre dados composicionais, assim como alguns métodos de seleção de modelos baseados em inferência bayesiana. \\\\ / Compositional data consist of vectors whose components are the proportions of some whole. The problem of estimating the portions $y_1, y_2, \\dots, y_D$ corresponding to the pieces $SE_1, SE_2, \\dots, SE_D$ of some whole $Q$ is often required in several domains of knowledge. The percentages $y_1, y_2, \\dots, y_D$ of votes corresponding to the competitors $Ca_1, Ca_2, \\dots, Ca_D$ in governmental elections or market share problems are typical examples. Of course, it is of great interest to study the behavior of such proportions according to some contextual transitions. In any competitive environmet, additional information of such behavior can be very helpful for the strategists to make proper decisions. In this work we present and discuss some approaches proposed by different authors for compositional data regression as well as some model selection methods based on bayesian inference.\\\\
60

Seleção de modelos cópula-GARCH: uma abordagem bayesiana / Copula-Garch model model selection: a bayesian approach

Rossi, João Luiz 04 June 2012 (has links)
Esta dissertação teve como objetivo o estudo de modelos para séries temporais bivariadas, que tem a estrutura de dependência determinada por meio de funções de cópulas. A vantagem desta abordagem é que as cópulas fornecem uma descrição completa da estrutura de dependência. Em termos de inferência, foi adotada uma abordagem Bayesiana com utilização dos métodos de Monte Carlo via cadeias de Markov (MCMC). Primeiramente, um estudo de simulações foi realizado para verificar como os seguintes fatores, tamanho das séries e variações nas funções de cópula, nas distribuições marginais, nos valores do parâmetro de cópula e nos métodos de estimação, influenciam a taxa de seleção de modelos segundo os critérios EAIC, EBIC e DIC. Posteriormente, foram realizadas aplicações a dados reais dos modelos com estrutura de dependência estática e variante no tempo / The aim of this work was to study models for bivariate time series, where the dependence structure among the series is modeled by copulas. The advantage of this approach is that copulas provide a complete description of dependence structure. In terms of inference was adopted the Bayesian approach with utilization of Markov chain Monte Carlo (MCMC) methods. First, a simulation study was performed to verify how the factors, length of the series and variations on copula functions, on marginal distributions, on copula parameter value and on estimation methods, may affect models selection rate given by EAIC, EBIC and DIC criteria. After that, we applied the models with static and time-varying dependence structure to real data

Page generated in 0.3787 seconds