Global ETD Search

11	Improved iterative schemes for REML estimation of variance parameters in linear mixed models. Knight, Emma January 2008 (has links) Residual maximum likelihood (REML) estimation is a popular method of estimation for variance parameters in linear mixed models, which typically requires an iterative scheme. The aim of this thesis is to review several popular iterative schemes and to develop an improved iterative strategy that will work for a wide class of models. The average information (AI) algorithm is a computationally convenient and efficient algorithm to use when starting values are in the neighbourhood of the REML solution. However when reasonable starting values are not available, the algorithm can fail to converge. The expectation-maximisation (EM) algorithm and the parameter expanded EM (PXEM) algorithm are good alternatives in these situations but they can be very slow to converge. The formulation of these algorithms for a general linear mixed model is presented, along with their convergence properties. A series of hybrid algorithms are presented. EM or PXEM iterations are used initially to obtain variance parameter estimates that are in the neighbourhood of the REML solution, and then AI iterations are used to ensure rapid convergence. Composite local EM/AI and local PXEM/AI schemes are also developed; the local EM and local PXEM algorithms update only the random effect variance parameters, with the estimates of the residual error variance parameters held fixed. Techniques for determining when to use EM-type iterations and when to switch to AI iterations are investigated. Methods for obtaining starting values for the iterative schemes are also presented. The performance of these various schemes is investigated for several different linear mixed models. A number of data sets are used, including published data sets and simulated data. The performance of the basic algorithms is compared to that of the various hybrid algorithms, using both uninformed and informed starting values. The theoretical and empirical convergence rates are calculated and compared for the basic algorithms. The direct comparison of the AI and PXEM algorithms shows that the PXEM algorithm, although an improvement over the EM algorithm, still falls well short of the AI algorithm in terms of speed of convergence. However, when the starting values are too far from the REML solution, the AI algorithm can be unstable. Instability is most likely to arise in models with a more complex variance structure. The hybrid schemes use EM-type iterations to move close enough to the REML solution to enable the AI algorithm to successfully converge. They are shown to be robust to choice of starting values like the EM and PXEM algorithms, while demonstrating fast convergence like the AI algorithm. / Thesis (Ph.D.) - University of Adelaide, School of Agriculture, Food and Wine, 2008 Expectation-maximization algorithms. Algorithms.
12	Estimation of Standardized Mortality Ratio in Geographic Epidemiology Kettermann, Anna January 2004 (has links) (PDF) No description available. Epidemiologic Methods Medical geography Medical statistics -- Data processing Mortality
13	A comparison of support vector machines and traditional techniques for statistical regression and classification Hechter, Trudie 04 1900 (has links) Thesis (MComm)--Stellenbosch University, 2004. / ENGLISH ABSTRACT: Since its introduction in Boser et al. (1992), the support vector machine has become a popular tool in a variety of machine learning applications. More recently, the support vector machine has also been receiving increasing attention in the statistical community as a tool for classification and regression. In this thesis support vector machines are compared to more traditional techniques for statistical classification and regression. The techniques are applied to data from a life assurance environment for a binary classification problem and a regression problem. In the classification case the problem is the prediction of policy lapses using a variety of input variables, while in the regression case the goal is to estimate the income of clients from these variables. The performance of the support vector machine is compared to that of discriminant analysis and classification trees in the case of classification, and to that of multiple linear regression and regression trees in regression, and it is found that support vector machines generally perform well compared to the traditional techniques. / AFRIKAANSE OPSOMMING: Sedert die bekendstelling van die ondersteuningspuntalgoritme in Boser et al. (1992), het dit 'n populêre tegniek in 'n verskeidenheid masjienleerteorie applikasies geword. Meer onlangs het die ondersteuningspuntalgoritme ook meer aandag in die statistiese gemeenskap begin geniet as 'n tegniek vir klassifikasie en regressie. In hierdie tesis word ondersteuningspuntalgoritmes vergelyk met meer tradisionele tegnieke vir statistiese klassifikasie en regressie. Die tegnieke word toegepas op data uit 'n lewensversekeringomgewing vir 'n binêre klassifikasie probleem sowel as 'n regressie probleem. In die klassifikasiegeval is die probleem die voorspelling van polisvervallings deur 'n verskeidenheid invoer veranderlikes te gebruik, terwyl in die regressiegeval gepoog word om die inkomste van kliënte met behulp van hierdie veranderlikes te voorspel. Die resultate van die ondersteuningspuntalgoritme word met dié van diskriminant analise en klassifikasiebome vergelyk in die klassifikasiegeval, en met veelvoudige linêere regressie en regressiebome in die regressiegeval. Die gevolgtrekking is dat ondersteuningspuntalgoritmes oor die algemeen goed vaar in vergelyking met die tradisionele tegnieke. Machine learning Regression analysis
14	A study on model selection of binary and non-Gaussian factor analysis. January 2005 (has links) An, Yujia. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references (leaves 71-76). / Abstracts in English and Chinese. / Abstract --- p.ii / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Background --- p.1 / Chapter 1.1.1 --- Review on BFA --- p.2 / Chapter 1.1.2 --- Review on NFA --- p.3 / Chapter 1.1.3 --- Typical model selection criteria --- p.5 / Chapter 1.1.4 --- New model selection criterion and automatic model selection --- p.6 / Chapter 1.2 --- Our contributions --- p.7 / Chapter 1.3 --- Thesis outline --- p.8 / Chapter 2 --- Combination of B and BI architectures for BFA with automatic model selection --- p.10 / Chapter 2.1 --- Implementation of BFA using BYY harmony learning with au- tomatic model selection --- p.11 / Chapter 2.1.1 --- Basic issues of BFA --- p.11 / Chapter 2.1.2 --- B-architecture for BFA with automatic model selection . --- p.12 / Chapter 2.1.3 --- BI-architecture for BFA with automatic model selection . --- p.14 / Chapter 2.2 --- Local minima in B-architecture and BI-architecture --- p.16 / Chapter 2.2.1 --- Local minima in B-architecture --- p.16 / Chapter 2.2.2 --- One unstable result in BI-architecture --- p.21 / Chapter 2.3 --- Combination of B- and BI-architecture for BFA with automatic model selection --- p.23 / Chapter 2.3.1 --- Combine B-architecture and BI-architecture --- p.23 / Chapter 2.3.2 --- Limitations of BI-architecture --- p.24 / Chapter 2.4 --- Experiments --- p.25 / Chapter 2.4.1 --- Frequency of local minima occurring in B-architecture --- p.25 / Chapter 2.4.2 --- Performance comparison for several methods in B-architecture --- p.26 / Chapter 2.4.3 --- Comparison of local minima in B-architecture and BI- architecture --- p.26 / Chapter 2.4.4 --- Frequency of unstable cases occurring in BI-architecture --- p.27 / Chapter 2.4.5 --- Comparison of performance of three strategies --- p.27 / Chapter 2.4.6 --- Limitations of BI-architecture --- p.28 / Chapter 2.5 --- Summary --- p.29 / Chapter 3 --- A Comparative Investigation on Model Selection in Binary Factor Analysis --- p.31 / Chapter 3.1 --- Binary Factor Analysis and ML Learning --- p.32 / Chapter 3.2 --- Hidden Factors Number Determination --- p.33 / Chapter 3.2.1 --- Using Typical Model Selection Criteria --- p.33 / Chapter 3.2.2 --- Using BYY harmony Learning --- p.34 / Chapter 3.3 --- Empirical Comparative Studies --- p.36 / Chapter 3.3.1 --- Effects of Sample Size --- p.37 / Chapter 3.3.2 --- Effects of Data Dimension --- p.37 / Chapter 3.3.3 --- Effects of Noise Variance --- p.39 / Chapter 3.3.4 --- Effects of hidden factor number --- p.43 / Chapter 3.3.5 --- Computing Costs --- p.43 / Chapter 3.4 --- Summary --- p.46 / Chapter 4 --- A Comparative Investigation on Model Selection in Non-gaussian Factor Analysis --- p.47 / Chapter 4.1 --- Non-Gaussian Factor Analysis and ML Learning --- p.48 / Chapter 4.2 --- Hidden Factor Determination --- p.51 / Chapter 4.2.1 --- Using typical model selection criteria --- p.51 / Chapter 4.2.2 --- BYY harmony Learning --- p.52 / Chapter 4.3 --- Empirical Comparative Studies --- p.55 / Chapter 4.3.1 --- Effects of Sample Size on Model Selection Criteria --- p.56 / Chapter 4.3.2 --- Effects of Data Dimension on Model Selection Criteria --- p.60 / Chapter 4.3.3 --- Effects of Noise Variance on Model Selection Criteria --- p.64 / Chapter 4.3.4 --- Discussion on Computational Cost --- p.64 / Chapter 4.4 --- Summary --- p.68 / Chapter 5 --- Conclusions --- p.69 / Bibliography --- p.71 Factor analysis--Data processing Bayesian statistical decision theory Mathematical statistics--Data processing
15	Designing and analyzing test programs with censored data for civil engineering applications Finley, Cynthia 28 August 2008 (has links) Not available / text Bayesian statistical decision theory Mathematical statistics--Data processing
16	Flexible statistical modeling of deaths by diarrhoea in South Africa. Mbona, Sizwe Vincent. 17 December 2013 (has links) The purpose of this study is to investigate and understand data which are grouped into categories. Various statistical methods was studied for categorical binary responses to investigate the causes of death from diarrhoea in South Africa. Data collected included death type, sex, marital status, province of birth, province of death, place of death, province of residence, education status, smoking status and pregnancy status. The objective of this thesis is to investigate which of the above explanatory variables was most affected by diarrhoea in South Africa. To achieve this objective, different sample survey data analysis techniques are investigated. This includes sketching bar graphs and using several statistical methods namely, logistic regression, surveylogistic, generalised linear model, generalised linear mixed model, and generalised additive model. In the selection of the fixed effects, a bar graph is applied to the response variable individual profile graphs. A logistic regression model is used to identify which of the explanatory variables are more affected by diarrhoea. Statistical applications are conducted in SAS (Statistical Analysis Software). Hosmer and Lemeshow (2000) propose a statistic that they show, through simulation, is distributed as chi‐square when there is no replication in any of the subpopulations. Due to the similarity of the Hosmer and Lemeshow test for logistic regression, Parzen and Lipsitz (1999) suggest using 10 risk score groups. Nevertheless, based on simulation results, May and Hosmer (2004) show that, for all samples or samples with a large percentage of censored observations, the test rejects the null hypothesis too often. They suggest that the number of groups be chosen such that G=integer of {maximum of 12 and minimum of 10}. Lemeshow et al. (2004) state that the observations are firstly sorted in increasing order of their estimated event probability. / Thesis (M.Sc.)-University of KwaZulu-Natal, Pietermaritzburg, 2013. Statistics--Mathematics. Mathematical statistics. Statistics--Data processing. Linear models (Statistics)
17	Pragmatic Statistical Approaches for Power Analysis, Causal Inference, and Biomarker Detection Fan Wu (16536675) 26 July 2023 (has links) <p>Mediation analyses play a critical role in social and personality psychology research. However, current approaches for assessing power and sample size in mediation models have limitations, particularly when dealing with complex mediation models and multiple mediator sequential models. These limitations stem from limited software options and the substantial computational time required. In this part, we address these challenges by extending the joint significance test and product of coefficients test to incorporate the fourth-pathed mediated effect and generalized kth-pathed mediated effect. Additionally, we propose a model-based bootstrap method and provide convenient R tools for estimating power in complex mediation models. Through our research, we demonstrate that power decreases as the number of mediators increases and as the influence of coefficients varies. We summarize our results and discuss the implications of power analysis in relation to mediator complexity and coefficient influence. We provide insights for researchers seeking to optimize study designs and enhance the reliability of their findings in complex mediation models. </p> <p>Matching is a crucial step in causal inference, as it allows for more robust and reasonable analyses by creating better-matched pairs. However, in real-world scenarios, data are often collected and stored by different local institutions or separate departments, posing challenges for effective matching due to data fragmentation. Additionally, the harmonization of such data needs to prioritize privacy preservation. In this part, we propose a new hierarchical framework that addresses these challenges by implementing differential privacy on raw data to protect sensitive information while maintaining data utility. We also design a data access control system with three different access levels for designers based on their roles, ensuring secure and controlled access to the matched datasets. Simulation studies and analyses of datasets from the 2017 Atlantic Causal Inference Conference Data Challenge are conducted to showcase the flexibility and utility of our framework. Through this research, we contribute to the advancement of statistical methodologies in matching and privacy-preserving data analysis, offering a practical solution for data integration and privacy protection in causal inference studies. </p> <p>Biomarker discovery is a complex and resource-intensive process, encompassing discovery, qualification, verification, and validation stages prior to clinical evaluation. Streamlining this process by efficiently identifying relevant biomarkers in the discovery phase holds immense value. In this part, we present a likelihood ratio-based approach to accurately identify truly relevant protein markers in discovery studies. Leveraging the observation of unimodal underlying distributions of expression profiles for irrelevant markers, our method demonstrates promising performance when evaluated on real experimental data. Additionally, to address non-normal scenarios, we introduce a kernel ratio-based approach, which we evaluate using non-normal simulation settings. Through extensive simulations, we observe the high effectiveness of the kernel method in discovering the set of truly relevant markers, resulting in precise biomarker identifications with elevated sensitivity and a low empirical false discovery rate. </p> Statistical data science Statistics Data processing causal inference analyses biomarker approach power analysis program
18	The robustness of LISREL estimates in structural equation models with categorical data Ethington, Corinna A. January 1985 (has links) This study was an examination of the effect of type of correlation matrix on the robustness of LISREL maximum likelihood and unweighted least squares structural parameter estimates for models with categorical manifest variables. Two types of correlation matrices were analyzed; one containing Pearson product-moment correlations and one containing tetrachoric, polyserial, and product-moment correlations as appropriate. Using continuous variables generated according to the equations defining the population model, three cases were considered by dichotomizing some of the variables with varying degrees of skewness. When Pearson product-moment correlations were used to estimate associations involving dichotomous variables, the structural parameter estimates were biased when skewness was present in the dichotomous variables. Moreover, the degree of bias was consistent for both the maximum likelihood and unweighted least squares estimates. The standard errors of the estimates were found to be inflated, making significance tests unreliable. The analysis of mixed matrices produced average estimates that more closely approximated the model parameters except in the case where the dichotomous variables were skewed in opposite directions. However, since goodness-of-fit statistics and standard errors are not available in LISREL when tetrachoric and polyserial correlations are used, the unbiased estimates are not of practical significance. Until alternative computer programs are available that employ distribution-free estimation procedures that consider the skewness and kurtosis of the variables, researchers are ill-advised to employ LISREL in the estimation of structural equation models containing skewed categorical manifest variables. / Ph. D. LISREL (Computer file) LD5655.V856 1985.E834 Factor analysis -- Data processing Matrices
19	Statistics preserving spatial interpolation methods for missing precipitation data Unknown Date (has links) Deterministic and stochastic weighting methods are commonly used methods for estimating missing precipitation rain gauge data based on values recorded at neighboring gauges. However, these spatial interpolation methods seldom check for their ability to preserve site and regional statistics. Such statistics and primarily defined by spatial correlations and other site-to-site statistics in a region. Preservation of site and regional statistics represents a means of assessing the validity of missing precipitation estimates at a site. This study evaluates the efficacy of traditional interpolation methods for estimation of missing data in preserving site and regional statistics. New optimal spatial interpolation methods intended to preserve these statistics are also proposed and evaluated in this study. Rain gauge sites in the state of Kentucky are used as a case study, and several error and performance measures are used to evaluate the trade-offs in accuracy of estimation and preservation of site and regional statistics. / by Husayn El Sharif. / Thesis (M.S.C.S.)--Florida Atlantic University, 2012. / Includes bibliography. / Mode of access: World Wide Web. / System requirements: Adobe Reader. Numerical analysis Meteorology--Statistical methods Atmospheric physics--Statistical methods
20	Estimating posterior expectation of distributions belonging to exponential and non exponential families Begum, Munni January 2001 (has links) Bayesian principle is conceptually simple and intuitively plausible to carry out but its numerical implementation is not always straightforward. Most of the times we have posterior distributions in terms of complicated analytical funs ions and be known only up to a multiplicative constant. Hence it becomes computationally difficult to attain the marginal densities and the moments of the posterior distributions in closed form. In the present study the leading methods, both analytical and numerical, for implementing Bayesian inference has been explored. In particular, the non-iterative Monte Carlo method known as Importance Sampling has been applied to approximate the posterior expectations of the Lognormal and Cauchy distributions, belonging to the Exponential family and the non-Exponential family of distributions respectively. Sample values from these distributions have been simulated through computer programming. Calculations are done mostly by C++ programming language and Mathematica. / Department of Mathematical Sciences Bayesian statistical decision theory. Exponential families (Statistics) Monte Carlo method -- Data processing. Statistics -- Data processing.

Search results