Spelling suggestions: "subject:"applied estatistics"" "subject:"applied cstatistics""
61 |
Automated Circulation Control for the Utah State University LibraryMontgomery, Richard M. 01 May 1967 (has links)
This package of programs is a result of the U.S.U. Library incorporating an automated control on the circulation of their books, which would provide the library with a daily record of all books in circulation, or not available for circulation, and send notices when books were overdue.
Because of the long-range program of the Data Processing Department of the University, it was decided to develop the software for this project rather than purchase the hardware.
The then existing hardware included the IBM 1401 computer (4K), 1402 card reader, 1403 on line printer, and a card sorter. The only additional hardware required by the Data Processing Department was the "read punch feed" feature on the card reader.
This report includes information for operating the programs involved in processing the data. Any information required in setting up the data collection system may be obtained from the U.S. U. Library.
These programs were developed to be compatible with the previously mentioned hardware and were used until the data processing facilities of the University were updated. All programs were written in the SSPS II symbolic language.
|
62 |
The Prior Distribution in Bayesian StatisticsChen, Kai-Tang 01 May 1979 (has links)
A major problem associated with Bayesian estimation is selecting the prior distribution. The more recent literature on the selection of the prior is reviewed. Very little of a general nature on the selection of the prior is formed in the literature except for non-informative priors. This class of priors is seen to have limited usefulness. A method of selecting an informative prior is generalized in this thesis to include estimation of several parameters using a multivariate prior distribution. The concepts required for quantifying prior information is based on intuitive principles. In this way, it can be understood and controlled by the decision maker (i.e., those responsible for the consequences) rather than analysts. The information required is: (1) prior point estimates of the parameters being estimated and (2) an expression of the desired influence of the prior relative to the present data in determining the parameter estimates (e.g., item (2) implies twice as much influence as the data). These concepts (point estimates and influence) may be used equally with subjective or quantitative prior information.
|
63 |
Unbalanced Analysis of Variance Comparing Standard and Proposed Approximation Techniques for Estimating the Variance ComponentsPugsley, James P. 01 May 1984 (has links)
This paper considers the estimation of the components of variation for a two-factor unbalanced nested design and compares standard techniques with proposed approximation procedures. Current procedures are complicated and assume the unbalanced sample size to be fixed. This paper tests some simpler techniques, assuming sample sizes are random variables. Monte Carlo techniques were used to generate data for testing of these new procedures.
|
64 |
Design Optimization Using Model Estimation ProgrammingBrimhall, Richard Kay 01 May 1967 (has links)
Model estimation programming provides a method for obtaining extreme solutions subject to constraints. Functions which are continuous with continuous first and second derivatives in the neighborhood of the solution are approximated using quadratic polynomials (termed estimating functions) derived from computed or experimental data points. Using the estimating functions, an approximation problem is solved by a numerical adaptation of the method of Lagrange. The method is not limited by the concavity of the objective function.
Beginning with an initial array of data observations, an initial approximate solution is obtained. Using this approximate solution as a new datum point, the coefficients for the estimating function are recalculated with a constrained least squares fit which forces intersection of the functions and their estimating functions at the last three observations. The constraining of the least squares estimate provides a sequence of approximate solutions which converge to the desired extremal.
A digital computer program employing the technique is used extensively by Thiokol Chemical Corporation's Wasatch Division, especially for vehicle design optimization where flight performance and hardware constraints must be satisfied simultaneously.
|
65 |
Multicollinearity and the Estimation of Regression CoefficientsTeed, John Charles 01 May 1978 (has links)
The precision of the estimates of the regression coefficients in a regression analysis is affected by multicollinearity. The effect of certain factors on multicollinearity and the estimates was studied. The response variables were the standard error of the regression coefficients and a standarized statistic that measures the deviation of the regression coefficient from the population parameter.
The estimates are not influenced by any one factor in particular, but rather some combination of factors. The larger the sample size, the better the precision of the estimates no matter how "bad" the other factors may be.
The standard error of the regression coefficients proved to be the best indication of estimation problems.
|
66 |
Parameter Estimation for Generalized Pareto DistributionLin, Der-Chen 01 May 1988 (has links)
The generalized Pareto distribution was introduced by Pickands (1975). Three methods of estimating the parameters of the generalized Pareto distribution were compared by Hosking and Wallis (1987) . The methods are maximum likelihood, method of moments and probability-weighted moments.
An alternate method of estimation for the generalized Pareto distribution, based on least square regression of expected order statistics (REOS), is developed and evaluated in this thesis . A Monte Carlo comparison is made between this method and the estimating methods considered by Hosking and Wallis (1987). This method is shown to be generally superior to the maximum likelihood, method of moments and probability-weighted moments
|
67 |
Adaptive Stochastic Gradient Markov Chain Monte Carlo Methods for Dynamic Learning and Network EmbeddingTianning Dong (14559992) 06 February 2023 (has links)
<p>Latent variable models are widely used in modern data science for both statistic and dynamic data. This thesis focuses on large-scale latent variable models formulated for time series data and static network data. The former refers to the state space model for dynamic systems, which models the evolution of latent state variables and the relationship between the latent state variables and observations. The latter refers to a network decoder model, which map a large network into a low-dimensional space of latent embedding vectors. Both problems can be solved by adaptive stochastic gradient Markov chain Monte Carlo (MCMC), which allows us to simulate the latent variables and estimate the model parameters in a simultaneous manner and thus facilitates the down-stream statistical inference from the data. </p>
<p><br></p>
<p>For the state space model, its challenge is on inference for high-dimensional, large scale and long series data. The existing algorithms, such as particle filter or sequential importance sampler, do not scale well to the dimension of the system and the sample size of the dataset, and often suffers from the sample degeneracy issue for long series data. To address the issue, the thesis proposes the stochastic approximation Langevinized ensemble Kalman filter (SA-LEnKF) for jointly estimating the states and unknown parameters of the dynamic system, where the parameters are estimated on the fly based on the state variables simulated by the LEnKF under the framework of stochastic approximation MCMC. Under mild conditions, we prove its consistency in parameter estimation and ergodicity in state variable simulations. The proposed algorithm can be used in uncertainty quantification for long series, large scale, and high-dimensional dynamic systems. Numerical results on simulated datasets and large real-world datasets indicate its superiority over the existing algorithms, and its great potential in statistical analysis of complex dynamic systems encountered in modern data science. </p>
<p><br></p>
<p>For the network embedding problem, an appropriate embedding dimension is hard to determine under the theoretical framework of the existing methods, where the embedding dimension is often considered as a tunable hyperparameter or a choice of common practice. The thesis proposes a novel network embedding method with a built-in mechanism for embedding dimension selection. The basic idea is to treat the embedding vectors as the latent inputs for a deep neural network (DNN) model. Then by an adaptive stochastic gradient MCMC algorithm, we can simulate of the embedding vectors and estimate the parameters of the DNN model in a simultaneous manner. By the theory of sparse deep learning, the embedding dimension can be determined via imposing an appropriate sparsity penalty on the DNN model. Experiments on real-world networks show that our method can perform dimension selection in network embedding and meanwhile preserve network structures. </p>
<p><br></p>
|
68 |
<b>Sample Size Determination for Subsampling in the Analysis of Big Data, Multiplicative models for confidence intervals and Free-Knot changepoint models</b>Sheng Zhang (18468615) 11 June 2024 (has links)
<p dir="ltr">We studied the relationship between subsample size and the accuracy of resulted estimation under big data setup.</p><p dir="ltr">We also proposed a novel approach to the construction of confidence intervals based on improved concentration inequalities.</p><p dir="ltr">Lastly, we studied irregular change-point models using free-knot splines.</p>
|
69 |
Qwixx Strategies Using Simulation and MCMC MethodsBlank, Joshua W 01 June 2024 (has links) (PDF)
This study explores optimal strategies for maximizing scores and winning in the popular dice game Qwixx, analyzing both single and multiplayer gameplay scenarios. Through extensive simulations, various strategies were tested and compared, including a scorebased approach that uses a formula tuned by MCMC random walks, and race-to-lock approaches which use absorbing Markov chain qualities of individual score sheet rows to find ways to lock rows as quickly as possible. Results indicate that employing a scorebased strategy, considering gap, count, position, skip, and likelihood scores, significantly improves performance in single player games, while move restrictions based on specific dice roll sums in the race-to-lock strategy were found to enhance winning and scoring points in multiplayer games. While the results do not achieve the optimal scores attained by prior informal work, the study provides valuable insights into decision-making processes and gameplay optimization for Qwixx enthusiasts, offering practical guidance for players seeking to enhance their performance and strategic prowess in the game. It also serves as a lesson for how to approach optimization problems in the future.
|
70 |
Using Plankton EDNA To Estimate Whale Abundances Off The California Coast: Data Integration And Statistical ModelingChan, Katherine 01 June 2024 (has links) (PDF)
Understanding marine mammal populations and how they are affected by human activity and ocean conditions is vital, especially in tracking population declines and monitoring endangered species. However, tracking marine mammal populations and their distribution is challenging due to difficulties in observation and costs. Using surrounding plankton environmental DNA (eDNA) has the potential to provide an indirect measure of monitoring cetacean abundances based on ecological associations. This project aims to apply statistical methods to assess the relationship of visual abundances of common species of baleen whales with amplicon sequence variants (ASV) of plankton eDNA samples from the NOAA-CalCOFI Ocean Genomics (NCOG) project. Modeling this relationship of eDNA with marine mammal sightings may greatly aid the ability to predict the abundance of whales in the ocean.
There are several key challenges associated with the analysis of this NCOG data. Plankton eDNA samples are an example of compositional data, where the proportions of each ASV must sum to one; this provides a challenging constraint for statistical analysis and interpretation. High dimensionality (the number of parameters exceeds the observations) and sparsity (many observed zeros) of the genetic sequencing data also pose challenges in estimating parameters. Finally, the model associations should be adjusted for related factors, including seasonality and oceanographic factors, the latter of which goes beyond this project's scope.
This thesis develops and fits models to estimate cetacean abundance from plankton eDNA by leveraging methods of compositional data analysis and high-dimensional regression. This project applies log-ratio data transformations and corresponding log-contrast models to address the compositional aspect of eDNA reads. Regression methods involving high-dimensional data typically rely on dimensionality reduction or regularization. This project implements both reduction and regularization through sparse partial least squares (sPLS) regression. In addition to the data modeling objective of using plankton eDNA to predict baleen whale abundances, this project also identifies ecological correlations between whale abundance and plankton eDNA.
|
Page generated in 0.0977 seconds