Global ETD Search

61	The Prior Distribution in Bayesian Statistics Chen, Kai-Tang 01 May 1979 (has links) A major problem associated with Bayesian estimation is selecting the prior distribution. The more recent literature on the selection of the prior is reviewed. Very little of a general nature on the selection of the prior is formed in the literature except for non-informative priors. This class of priors is seen to have limited usefulness. A method of selecting an informative prior is generalized in this thesis to include estimation of several parameters using a multivariate prior distribution. The concepts required for quantifying prior information is based on intuitive principles. In this way, it can be understood and controlled by the decision maker (i.e., those responsible for the consequences) rather than analysts. The information required is: (1) prior point estimates of the parameters being estimated and (2) an expression of the desired influence of the prior relative to the present data in determining the parameter estimates (e.g., item (2) implies twice as much influence as the data). These concepts (point estimates and influence) may be used equally with subjective or quantitative prior information. prior distribution bayesian statistics Applied Statistics Statistics and Probability
62	Unbalanced Analysis of Variance Comparing Standard and Proposed Approximation Techniques for Estimating the Variance Components Pugsley, James P. 01 May 1984 (has links) This paper considers the estimation of the components of variation for a two-factor unbalanced nested design and compares standard techniques with proposed approximation procedures. Current procedures are complicated and assume the unbalanced sample size to be fixed. This paper tests some simpler techniques, assuming sample sizes are random variables. Monte Carlo techniques were used to generate data for testing of these new procedures. unbalanced analysis variance approximation technique estimate Applied Statistics
63	Design Optimization Using Model Estimation Programming Brimhall, Richard Kay 01 May 1967 (has links) Model estimation programming provides a method for obtaining extreme solutions subject to constraints. Functions which are continuous with continuous first and second derivatives in the neighborhood of the solution are approximated using quadratic polynomials (termed estimating functions) derived from computed or experimental data points. Using the estimating functions, an approximation problem is solved by a numerical adaptation of the method of Lagrange. The method is not limited by the concavity of the objective function. Beginning with an initial array of data observations, an initial approximate solution is obtained. Using this approximate solution as a new datum point, the coefficients for the estimating function are recalculated with a constrained least squares fit which forces intersection of the functions and their estimating functions at the last three observations. The constraining of the least squares estimate provides a sequence of approximate solutions which converge to the desired extremal. A digital computer program employing the technique is used extensively by Thiokol Chemical Corporation's Wasatch Division, especially for vehicle design optimization where flight performance and hardware constraints must be satisfied simultaneously. design optimization model estimation programming Applied Statistics Statistics and Probability
64	Multicollinearity and the Estimation of Regression Coefficients Teed, John Charles 01 May 1978 (has links) The precision of the estimates of the regression coefficients in a regression analysis is affected by multicollinearity. The effect of certain factors on multicollinearity and the estimates was studied. The response variables were the standard error of the regression coefficients and a standarized statistic that measures the deviation of the regression coefficient from the population parameter. The estimates are not influenced by any one factor in particular, but rather some combination of factors. The larger the sample size, the better the precision of the estimates no matter how "bad" the other factors may be. The standard error of the regression coefficients proved to be the best indication of estimation problems. multicollinearity estimation regression coefficients Applied Statistics Statistics and Probability
65	Parameter Estimation for Generalized Pareto Distribution Lin, Der-Chen 01 May 1988 (has links) The generalized Pareto distribution was introduced by Pickands (1975). Three methods of estimating the parameters of the generalized Pareto distribution were compared by Hosking and Wallis (1987) . The methods are maximum likelihood, method of moments and probability-weighted moments. An alternate method of estimation for the generalized Pareto distribution, based on least square regression of expected order statistics (REOS), is developed and evaluated in this thesis . A Monte Carlo comparison is made between this method and the estimating methods considered by Hosking and Wallis (1987). This method is shown to be generally superior to the maximum likelihood, method of moments and probability-weighted moments parameter estimation pareto distribution generalized Applied Statistics Statistics and Probability
66	Adaptive Stochastic Gradient Markov Chain Monte Carlo Methods for Dynamic Learning and Network Embedding Tianning Dong (14559992) 06 February 2023 (has links) <p>Latent variable models are widely used in modern data science for both statistic and dynamic data. This thesis focuses on large-scale latent variable models formulated for time series data and static network data. The former refers to the state space model for dynamic systems, which models the evolution of latent state variables and the relationship between the latent state variables and observations. The latter refers to a network decoder model, which map a large network into a low-dimensional space of latent embedding vectors. Both problems can be solved by adaptive stochastic gradient Markov chain Monte Carlo (MCMC), which allows us to simulate the latent variables and estimate the model parameters in a simultaneous manner and thus facilitates the down-stream statistical inference from the data. </p> <p><br></p> <p>For the state space model, its challenge is on inference for high-dimensional, large scale and long series data. The existing algorithms, such as particle filter or sequential importance sampler, do not scale well to the dimension of the system and the sample size of the dataset, and often suffers from the sample degeneracy issue for long series data. To address the issue, the thesis proposes the stochastic approximation Langevinized ensemble Kalman filter (SA-LEnKF) for jointly estimating the states and unknown parameters of the dynamic system, where the parameters are estimated on the fly based on the state variables simulated by the LEnKF under the framework of stochastic approximation MCMC. Under mild conditions, we prove its consistency in parameter estimation and ergodicity in state variable simulations. The proposed algorithm can be used in uncertainty quantification for long series, large scale, and high-dimensional dynamic systems. Numerical results on simulated datasets and large real-world datasets indicate its superiority over the existing algorithms, and its great potential in statistical analysis of complex dynamic systems encountered in modern data science. </p> <p><br></p> <p>For the network embedding problem, an appropriate embedding dimension is hard to determine under the theoretical framework of the existing methods, where the embedding dimension is often considered as a tunable hyperparameter or a choice of common practice. The thesis proposes a novel network embedding method with a built-in mechanism for embedding dimension selection. The basic idea is to treat the embedding vectors as the latent inputs for a deep neural network (DNN) model. Then by an adaptive stochastic gradient MCMC algorithm, we can simulate of the embedding vectors and estimate the parameters of the DNN model in a simultaneous manner. By the theory of sparse deep learning, the embedding dimension can be determined via imposing an appropriate sparsity penalty on the DNN model. Experiments on real-world networks show that our method can perform dimension selection in network embedding and meanwhile preserve network structures. </p> <p><br></p> Applied statistics Adaptive MCMC State Space Model Network Embedding
67	Finding a Representative Distribution for the Tail Index Alpha, α, for Stock Return Data from the New York Stock Exchange Burns, Jett 01 May 2022 (has links) Statistical inference is a tool for creating models that can accurately display real-world events. Special importance is given to the financial methods that model risk and large price movements. A parameter that describes tail heaviness, and risk overall, is α. This research finds a representative distribution that models α. The absolute value of standardized stock returns from the Center for Research on Security Prices are used in this research. The inference is performed using R. Approximations for α are found using the ptsuite package. The GAMLSS package employs maximum likelihood estimation to estimate distribution parameters using the CRSP data. The distributions are selected by using AIC and worm plots. The Skew t family is found to be representative for the parameter α based on subsets of the CRSP data. The Skew t type 2 distribution is robust for multiple subsets of values calculated from the CRSP stock return data. applied statistics asset risk heavy tails maximum likelihood estimation risk model parameter estimation Applied Statistics Data Science Statistical Models
68	Software Profiling of Rogue Events in High-Volume Gauging Bering, Thomas P.K. 10 1900 (has links) Customers are placing ever increasing demands on automotive part manufacturers for high quality parts at low cost. Increasingly, the demand is for zero defects or defect rates in the less than one part per billion. This creates a significant challenge for manufacturers as to how to achieve these low defect levels economically while producing large volumes of parts. Importantly, the presence of infrequent process and measurement (gauge) events can adversely affect product quality. This thesis uses a statistical mixture model that allows one to assume a main production process that occurs most of the time, and secondary rogue events that occur infrequently. Often the rogue events correspond to necessary operator activity, like equipment repairs and tooling replacement. The mixture model predicts that some gauge observations will be influenced by combinations of these rogue events. Certain production applications, like those involving feedback or high-reliability gauging, are heavily influenced by rogue events and combinations of rogue events. A special runtime software profiler was created to collect information about rogue events, and statistical techniques (rogue event analysis) were used to estimate the waste generated by these rogue events. The value of these techniques was successfully demonstrated in three different industrial automotive part production applications. Two of these systems involve an automated feedback application with Computer Numerically Controlled (CNC) machining centers and Coordinate Measuring Machine (CMM) gauges. The third application involves a high-reliability inspection system that used optical, camera-based, machine-vision technology. The original system accepted reject parts at a rate of 98.7 part per million (ppm), despite multiple levels of redundancy. The final system showed no outgoing defects on a 1 million part factory data sample, and a 100 million part simulated data sample. It is expected that the final system reliability will meet the 0.001 ppm specification, which represents a huge improvement. / Doctor of Philosophy (PhD) Profiling Mixture Models Gauge Reliability Automotive Production GR&R Cpk Applied Statistics Manufacturing Mechanical Engineering Applied Statistics
69	A statistical framework to detect gene-environment interactions influencing complex traits Deng, Wei Q. 27 August 2014 (has links) <p>Advancements in human genomic technology have helped to improve our understanding of how genetic variation plays a central role in the mechanism of disease susceptibility. However, the very high dimensional nature of the data generated from large-scale genetic association studies has limited our ability to thoroughly examine genetic interactions. A prioritization scheme – Variance Prioritization (VP) – has been developed to select genetic variants based on differences in the quantitative trait variance between the possible genotypes using Levene’s test (Pare et al., 2010). Genetic variants with Levene’s test p-values lower than a pre-determined level of significance are selected to test for interactions using linear regression models. Under a variety of scenarios, VP has increased power to detect interactions over an exhaustive search as a result of reduced search space. Nevertheless, the use of Levene’s test does not take into account that the variance will either monotonically increase or decrease with the number of minor alleles when interactions are present. To address this issue, I propose a maximum likelihood approach to test for trends in variance between the genotypes, and derive a closed-form representation of the likelihood ratio test (LRT) statistic. Using simulations, I examine the performance of LRT in assessing the inequality of quantitative traits variance stratified by genotypes, and subsequently in identifying potentially interacting genetic variants. LRT is also used in an empirical dataset of 2,161 individuals to prioritize genetic variants for gene-environment interactions. The interaction p-values of the prioritized genetic variants are consistently lower than expected by chance compared to the non-prioritized, suggesting improved statistical power to detect interactions in the set of prioritized genetic variants. This new statistical test is expected to complement the existing VP framework and accelerate the process of genetic interaction discovery in future genome-wide studies and meta-analyses.</p> / Master of Health Sciences (MSc) genetic epidemiology gene-environment interactions variance heterogeneity genetics Applied Statistics Bioinformatics Biostatistics Computational Biology Genetics Genomics Statistical Methodology Applied Statistics
70	Sparse Principal Component Analysis for High-Dimensional Data: A Comparative Study Bonner, Ashley J. 10 1900 (has links) <p><strong>Background:</strong> Through unprecedented advances in technology, high-dimensional datasets have exploded into many fields of observational research. For example, it is now common to expect thousands or millions of genetic variables (p) with only a limited number of study participants (n). Determining the important features proves statistically difficult, as multivariate analysis techniques become flooded and mathematically insufficient when n < p. Principal Component Analysis (PCA) is a commonly used multivariate method for dimension reduction and data visualization but suffers from these issues. A collection of Sparse PCA methods have been proposed to counter these flaws but have not been tested in comparative detail. <strong>Methods:</strong> Performances of three Sparse PCA methods were evaluated through simulations. Data was generated for 56 different data-structures, ranging p, the number of underlying groups and the variance structure within them. Estimation and interpretability of the principal components (PCs) were rigorously tested. Sparse PCA methods were also applied to a real gene expression dataset. <strong>Results:</strong> All Sparse PCA methods showed improvements upon classical PCA. Some methods were best at obtaining an accurate leading PC only, whereas others were better for subsequent PCs. There exist different optimal choices of Sparse PCA methods when ranging within-group correlation and across-group variances; thankfully, one method repeatedly worked well under the most difficult scenarios. When applying methods to real data, concise groups of gene expressions were detected with the most sparse methods. <strong>Conclusions:</strong> Sparse PCA methods provide a new insightful way to detect important features amidst complex high-dimension data.</p> / Master of Science (MSc) Principal Component Analysis (PCA) High Dimensional Data Simulations Loading Vectors Tuning Parameters Applied Statistics Biostatistics Multivariate Analysis Statistical Methodology Applied Statistics

Search results