Global ETD Search

121	Some Recent Advances in Non- and Semiparametric Bayesian Modeling with Copulas, Mixtures, and Latent Variables Murray, Jared January 2013 (has links) <p>This thesis develops flexible non- and semiparametric Bayesian models for mixed continuous, ordered and unordered categorical data. These methods have a range of possible applications; the applications considered in this thesis are drawn primarily from the social sciences, where multivariate, heterogeneous datasets with complex dependence and missing observations are the norm. </p><p>The first contribution is an extension of the Gaussian factor model to Gaussian copula factor models, which accommodate continuous and ordinal data with unspecified marginal distributions. I describe how this model is the most natural extension of the Gaussian factor model, preserving its essential dependence structure and the interpretability of factor loadings and the latent variables. I adopt an approximate likelihood for posterior inference and prove that, if the Gaussian copula model is true, the approximate posterior distribution of the copula correlation matrix asymptotically converges to the correct parameter under nearly any marginal distributions. I demonstrate with simulations that this method is both robust and efficient, and illustrate its use in an application from political science.</p><p>The second contribution is a novel nonparametric hierarchical mixture model for continuous, ordered and unordered categorical data. The model includes a hierarchical prior used to couple component indices of two separate models, which are also linked by local multivariate regressions. This structure effectively overcomes the limitations of existing mixture models for mixed data, namely the overly strong local independence assumptions. In the proposed model local independence is replaced by local conditional independence, so that the induced model is able to more readily adapt to structure in the data. I demonstrate the utility of this model as a default engine for multiple imputation of mixed data in a large repeated-sampling study using data from the Survey of Income and Participation. I show that it improves substantially on its most popular competitor, multiple imputation by chained equations (MICE), while enjoying certain theoretical properties that MICE lacks. </p><p>The third contribution is a latent variable model for density regression. Most existing density regression models are quite flexible but somewhat cumbersome to specify and fit, particularly when the regressors are a combination of continuous and categorical variables. The majority of these methods rely on extensions of infinite discrete mixture models to incorporate covariate dependence in mixture weights, atoms or both. I take a fundamentally different approach, introducing a continuous latent variable which depends on covariates through a parametric regression. In turn, the observed response depends on the latent variable through an unknown function. I demonstrate that a spline prior for the unknown function is quite effective relative to Dirichlet Process mixture models in density estimation settings (i.e., without covariates) even though these Dirichlet process mixtures have better theoretical properties asymptotically. The spline formulation enjoys a number of computational advantages over more flexible priors on functions. Finally, I demonstrate the utility of this model in regression applications using a dataset on U.S. wages from the Census Bureau, where I estimate the return to schooling as a smooth function of the quantile index.</p> / Dissertation Statistics Bayesian methods Bayesian Nonparametrics Copula Modeling Density Regression Mixture Modeling Multiple Imputation
122	Comparative approaches to handling missing data, with particular focus on multiple imputation for both cross-sectional and longitudinal models. Hassan, Ali Satty Ali. January 2012 (has links) Much data-based research are characterized by the unavoidable problem of incompleteness as a result of missing or erroneous values. This thesis discusses some of the various strategies and basic issues in statistical data analysis to address the missing data problem, and deals with both the problem of missing covariates and missing outcomes. We restrict our attention to consider methodologies which address a specific missing data pattern, namely monotone missingness. The thesis is divided into two parts. The first part placed a particular emphasis on the so called missing at random (MAR) assumption, but focuses the bulk of attention on multiple imputation techniques. The main aim of this part is to investigate various modelling techniques using application studies, and to specify the most appropriate techniques as well as gain insight into the appropriateness of these techniques for handling incomplete data analysis. This thesis first deals with the problem of missing covariate values to estimate regression parameters under a monotone missing covariate pattern. The study is devoted to a comparison of different imputation techniques, namely markov chain monte carlo (MCMC), regression, propensity score (PS) and last observation carried forward (LOCF). The results from the application study revealed that we have universally best methods to deal with missing covariates when the missing data pattern is monotone. Of the methods explored, the MCMC and regression methods of imputation to estimate regression parameters with monotone missingness were preferable to the PS and LOCF methods. This study is also concerned with comparative analysis of the techniques applied to incomplete Gaussian longitudinal outcome or response data due to random dropout. Three different methods are assessed and investigated, namely multiple imputation (MI), inverse probability weighting (IPW) and direct likelihood analysis. The findings in general favoured MI over IPW in the case of continuous outcomes, even when the MAR mechanism holds. The findings further suggest that the use of MI and direct likelihood techniques lead to accurate and equivalent results as both techniques arrive at the same substantive conclusions. The study also compares and contrasts several statistical methods for analyzing incomplete non-Gaussian longitudinal outcomes when the underlying study is subject to ignorable dropout. The methods considered include weighted generalized estimating equations (WGEE), multiple imputation after generalized estimating equations (MI-GEE) and generalized linear mixed model (GLMM). The current study found that the MI-GEE method was considerably robust, doing better than all the other methods in terms of small and large sample sizes, regardless of the dropout rates. The primary interest of the second part of the thesis falls under the non-ignorable dropout (MNAR) modelling frameworks that rely on sensitivity analysis in modelling incomplete Gaussian longitudinal data. The aim of this part is to deal with non-random dropout by explicitly modelling the assumptions that caused the dropout and incorporated this additional sub-model into the model for the measurement data, and to assess the sensitivity of the modelling assumptions. The study pays attention to the analysis of repeated Gaussian measures subject to potentially non-random dropout in order to study the influence on inference that might be caused in the data by the dropout process. We consider the construction of a particular type of selection model, namely the Diggle-Kenward model as a tool for assessing the sensitivity of a selection model in terms of the modelling assumptions. The major conclusions drawn were that there was evidence in favour of the MAR process rather than an MCAR process in the context of the assumed model. In addition, there was the need to obtain further insight into the data by comparing various sensitivity analysis frameworks. Lastly, two families of models were also compared and contrasted to investigate the potential influence on inference that dropout might have or exert on the dependent measurement data considered, and to deal with incomplete sequences. The models were based on selection and pattern mixture frameworks used for sensitivity analysis to jointly model the distribution of the dropout process and longitudinal measurement process. The results of the sensitivity analysis were in agreement and hence led to similar parameter estimates. Additional confidence in the findings was gained as both models led to similar results for significant effects such as marginal treatment effects. / Thesis (M.Sc.)-University of KwaZulu-Natal, Pietermaritzburg, 2012. Multiple imputation (Statistics) Multivariate analysis. Missing observations (Statistics)
123	Assessing the Impact of Genotype Imputation on Meta-analysis of Genetic Association Studies Omondi, Emmanuel 28 July 2014 (has links) In this thesis,we study how a meta-analysis of genetic association studies is inﬂuenced by the degree of genotype imputation uncertainty in the studies combined and the size of meta-analysis. We consider the ﬁxed effect meta-analysis model to evaluate the accuracy and efficiency of imputation-based meta-analysis results under different levels of imputation accuracy. We also examine the impact of genotype imputation on the between-study heterogeneity and type 1 error in the random effects meta-analysis model. Simulation results reaffirm that meta-analysis boosts the power of detecting genetic associations compared to individual study results. However, the power deteriorates with increasing uncertainty in imputed genotypes. Genotype imputation affects a random effects meta-analysis in a non-obvious way as estimation of between-study heterogeneity and interpretation of association results depend heavily on the number of studies combined. We propose an adjusted ﬁxed effect meta-analysis approach for adding imputation-based studies to a meta-analysis of existing typed studies in a controlled way to improve precision and reliability. The proposed method should help in designing an effective meta-analysis study. Between-study heterogeneity Dosage test Effect size Genotype imputation Inverse-variance weighting method
124	Survival analysis for breast cancer Liu, Yongcai 21 September 2010 (has links) This research carries out a survival analysis for patients with breast cancer. The influence of clinical and pathologic features, as well as molecular markers on survival time are investigated. Special attention focuses on whether the molecular markers can provide additional information in helping predict clinical outcome and guide therapies for breast cancer patients. Three outcomes, breast cancer specific survival (BCSS), local relapse survival (LRS) and distant relapse survival (DRS), are examined using two datasets, the large dataset with missing values in markers (n=1575) and the small (complete) dataset consisting of patient records without any missing values (n=910). Results show that some molecular markers, such as YB1, could join ER, PR and HER2 to be integrated into cancer clinical practices. Further clinical research work is needed to identify the importance of CK56. The 10 year survival probability at the mean of all the covariates (clinical variables and markers) for BCSS, LRS, and DRS is 77%, 91%, and 72% respectively. Due to the presence of a large portion of missing values in the dataset, a sophisticated multiple imputation method is needed to estimate the missing values so that an unbiased and more reliable analysis can be achieved. In this study, three multiple imputation (MI) methods, data augmentation (DA), multivariate imputations by chained equations (MICE) and AREG, are employed and compared. Results shows that AREG is the preferred MI approach. The reliability of MI results are demonstrated using various techniques. This work will hopefully shed light on the determination of appropriate MI methods for other similar research situations. Survival analysis Breast cancer Multiple imputation Molecular marker
125	Multiple imputation for marginal and mixed models in longitudinal data with informative missingness Deng, Wei, January 2005 (has links) Thesis (Ph. D.)--Ohio State University, 2005. / Title from first page of PDF file. Document formatted into pages; contains xiii, 108 p.; also includes graphics. Includes bibliographical references (p. 104-108). Available online via OhioLINK's ETD Center
126	A Monte Carlo study the impact of missing data in cross-classification random effects models / Alemdar, Meltem. January 2008 (has links) Thesis (Ph. D.)--Georgia State University, 2008. / Title from title page (Digital Archive@GSU, viewed July 20, 2010) Carolyn F. Furlow, committee chair; Philo A. Hutcheson, Phillip E. Gagne, Sheryl A. Gowen, committee members. Includes bibliographical references (p. 96-100).
127	Bayesian estimation of factor analysis models with incomplete data Merkle, Edgar C., January 2005 (has links) Thesis (Ph. D.)--Ohio State University, 2005. / Title from first page of PDF file. Document formatted into pages; contains xi, 106 p.; also includes graphics. Includes bibliographical references (p. 103-106). Available online via OhioLINK's ETD Center
128	Effects of Missing Values on Neural Network Survival Time Prediction Raoufi-Danner, Torrin January 2018 (has links) Data sets with missing values are a pervasive problem within medical research. Building lifetime prediction models based solely upon complete-case data can bias the results, so imputation is preferred over listwise deletion. In this thesis, artificial neural networks (ANNs) are used as a prediction model on simulated data with which to compare various imputation approaches. The construction and optimization of ANNs is discussed in detail, and some guidelines are presented for activation functions, number of hidden layers and other tunable parameters. For the simulated data, binary lifetime prediction at five years was examined. The ANNs here performed best with tanh activation, binary cross-entropy loss with softmax output and three hidden layers of between 15 and 25 nodes. The imputation methods examined are random, mean, missing forest, multivariate imputation by chained equations (MICE), pooled MICE with imputed target and pooled MICE with non-imputed target. Random and mean imputation performed poorly compared to the others and were used as a baseline comparison case. The other algorithms all performed well up to 50% missingness. There were no statistical differences between these methods below 30% missingness, however missing forest had the best performance above this amount. It is therefore the recommendation of this thesis that the missing forest algorithm is used to impute missing data when constructing ANNs to predict breast cancer patient survival at the five-year mark. Machine Learning Neural Network Imputation Breast Cancer Other Computer and Information Science Annan data- och informationsvetenskap
129	Comparação das águas dos rios Jaguari e Atibaia na região de lançamento de efluente de indústria petroquímica / Comparision of the water from rivers Jaguari and Atibaia at the region of wastewater release by a petrochemical industry Oliveira, Eduardo Schneider Bueno de [UNESP] 03 February 2016 (has links) Submitted by EDUARDO SCHNEIDER BUENO DE OLIVEIRA null (eduardosbdeoliveira@hotmail.com) on 2016-04-14T17:34:57Z No. of bitstreams: 1 Dissertação Final - Eduardo Schneider.pdf: 4265629 bytes, checksum: 4e5da4135aad7da51adb68c347b376b1 (MD5) / Approved for entry into archive by Felipe Augusto Arakaki (arakaki@reitoria.unesp.br) on 2016-04-18T13:08:57Z (GMT) No. of bitstreams: 1 oliveira_esb_me_bot.pdf: 4265629 bytes, checksum: 4e5da4135aad7da51adb68c347b376b1 (MD5) / Made available in DSpace on 2016-04-18T13:08:57Z (GMT). No. of bitstreams: 1 oliveira_esb_me_bot.pdf: 4265629 bytes, checksum: 4e5da4135aad7da51adb68c347b376b1 (MD5) Previous issue date: 2016-02-03 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / A ação antrópica na natureza é algo muito constante ao longo de toda a história, mas cada vez mais notam-se os efeitos negativos que por vezes ela pode trazer. Verificar esses efeitos, suas implicações, e aquilo que pode ser feito para evitar maiores problemas é de suma importância para a manutenção de nosso planeta em boas condições e consequentemente para a qualidade de vida do ser humano. O presente estudo realiza uma an álise da qualidade da água dos Rios Jaguari e Atibaia, entre os quais há o despejo de resíduos de uma indústria, além da qualidade da água após o processo de utilização pela indústria, antes de sua devolução ao rio. Com isso, pode-se verificar a qualidade do tratamento de resíduo de tal indústria e analisar possíveis efeitos que possa haver na qualidade da água após o despejo dos resíduos no rio. Para isso, com base em dados sobre características físicas, químicas e microbiológicas da água, são utilizadas técnicas estatísticas adequadas para realizar a análise necessária ao intuito anteriormente exposto. Como os dados possuem dependência entre si, é necessário que sejam utilizados métodos que permitam tal ocorrência, como o Bootstrap em Blocos não param étrico (Künsch, 1989; Politis & Romano, 1994). Também há a realização de imputação múltipla de dados, uma vez que há diversos meses do estudo com dados ausentes, através da técnica de Imputação de Dados Livre de Distribuição (Bergamo, 2007; Bergamo et al., 2008). / The anthropic action in nature is a constant factor along the history, but each day the negative effects that it brings can be increasingly seen. Check these effects, its implications and what can be done in order to avoid bigger problems has a great importance to the manteinance of our planet in good conditions and, consequently, to the human being life quality. This study performs an analysis of the water quality of the Jaguari and Atibaia rivers, among which happens the dumping of residuals from a petrochemical industry, as well as of the quality of the water after its utilization process by the industry, before its devolution to the river. Thus, it is possible to verify this industry’s residual treatment quality and to analyze possible effects to the water quality after the residual dumping at the river. For this, based on data about fisical, chemical and microbiological characteristics of the water, appropriate statistical techniques are used, aiming to do the necessary analysis to fullfill the exposed intention. Because of the existence of dependency, methods that allow this ocurrence shall be used, such as the non parametric Blocks Bootstrap (K¨unsch, 1989; Politis & Romano, 1994). There is also the realization of multiple imputation, using the technique of the Distribution-free Multiple Imputation (Bergamo, 2007; Bergamo et al., 2008), once for some months there are missing data. Qualidade da água Imputação de dados Bootstrap em blocos Water quality Data imputation Blocks bootstrap
130	Three-Level Multiple Imputation: A Fully Conditional Specication Approach January 2015 (has links) abstract: Currently, there is a clear gap in the missing data literature for three-level models. To date, the literature has only focused on the theoretical and algorithmic work required to implement three-level imputation using the joint model (JM) method of imputation, leaving relatively no work done on fully conditional specication (FCS) method. Moreover, the literature lacks any methodological evaluation of three-level imputation. Thus, this thesis serves two purposes: (1) to develop an algorithm in order to implement FCS in the context of a three-level model and (2) to evaluate both imputation methods. The simulation investigated a random intercept model under both 20% and 40% missing data rates. The ndings of this thesis suggest that the estimates for both JM and FCS were largely unbiased, gave good coverage, and produced similar results. The sole exception for both methods was the slope for the level-3 variable, which was modestly biased. The bias exhibited by the methods could be due to the small number of clusters used. This nding suggests that future research ought to investigate and establish clear recommendations for the number of clusters required by these imputation methods. To conclude, this thesis serves as a preliminary start in tackling a much larger issue and gap in the current missing data literature. / Dissertation/Thesis / Masters Thesis Psychology 2015 Statistics Fully Conditional Specification Missing Data Multilevel Modeling Multiple Imputation Three-level

Search results