Global ETD Search

171	Application of Machine Learning and Statistical Learning Methods for Prediction in a Large-Scale Vegetation Map Brookey, Carla M. 01 December 2017 (has links) Original analyses of a large vegetation cover dataset from Roosevelt National Forest in northern Colorado were carried out by Blackard (1998) and Blackard and Dean (1998; 2000). They compared the classification accuracies of linear and quadratic discriminant analysis (LDA and QDA) with artificial neural networks (ANN) and obtained an overall classification accuracy of 70.58% for a tuned ANN compared to 58.38% for LDA and 52.76% for QDA. Because there has been tremendous development of machine learning classification methods over the last 35 years in both computer science and statistics, as well as substantial improvements in the speed of computer hardware, I applied five modern machine learning algorithms to the data to determine whether significant improvements in the classification accuracy were possible using one or more of these methods. I found that only a tuned gradient boosting machine had a higher accuracy (71.62%) that the ANN of Blackard and Dean (1998), and the difference in accuracies was only about 1%. Of the other four methods, Random Forests (RF), Support Vector Machines (SVM), Classification Trees (CT), and adaboosted trees (ADA), a tuned SVM and RF had accuracies of 67.17% and 67.57%, respectively. The partition of the data by Blackard and Dean (1998) was unusual in that the training and validation datasets had equal representation of the seven vegetation classes, even though 85% of the data fell into classes 1 and 2. For the second part of my analyses I randomly selected 60% of the data for the training data and 20% for each of the validation data and test data. On this partition of the data a single classification tree achieved an accuracy of 92.63% on the test data and the accuracy of RF is 83.98%. Unsurprisingly, most of the gains in accuracy were in classes 1 and 2, the largest classes which also had the highest misclassification rates under the original partition of the data. By decreasing the size of the training data but maintaining the same relative occurrences of the vegetation classes as in the full dataset I found that even for a training dataset of the same size as that of Blackard and Dean (1998) a single classification tree was more accurate (73.80%) that the ANN of Blackard and Dean (1998) (70.58%). The final part of my thesis was to explore the possibility that combining several of the machine learning classifiers predictions could result in higher predictive accuracies. In the analyses I carried out, the answer seems to be that increased accuracies do not occur with a simple voting of five machine learning classifiers. Multiclass Classification covertype GBM SVM Random Forests Statistics and Probability
172	Separation of Points and Interval Estimation in Mixed Dose-Response Curves with Selective Component Labeling Flake, Darl D., II 01 May 2016 (has links) This dissertation develops, applies, and investigates new methods to improve the analysis of logistic regression mixture models. An interesting dose-response experiment was previously carried out on a mixed population, in which the class membership of only a subset of subjects (survivors) were subsequently labeled. In early analyses of the dataset, challenges with separation of points and asymmetric confidence intervals were encountered. This dissertation extends the previous analyses by characterizing the model in terms of a mixture of penalized (Firth) logistic regressions and developing methods for constructing profile likelihood-based confidence and inverse intervals, and confidence bands in the context of such a model. The proposed methods are applied to the motivating dataset and another related dataset, resulting in improved inference on model parameters. Additionally, a simulation experiment is carried out to further illustrate the benefits of the proposed methods and to begin to explore better designs for future studies. The penalized model is shown to be less biased than the traditional model and profile likelihood-based intervals are shown to have better coverage probability than Wald-type intervals. Some limitations, extensions, and alternatives to the proposed methods are discussed. logistic regression mixture models bias coverage probability Statistics and Probability
173	Extracting and Visualizing Data from Mobile and Static Eye Trackers in R and Matlab Li, Chunyang 01 December 2017 (has links) Eye tracking is the process of measuring where people are looking at with an eye tracker device. Eye tracking has been used in many scientific fields, such as education, usability research, sports, psychology, and marketing. Eye tracking data are often obtained from a static eye tracker or are manually extracted from a mobile eye tracker. Visualization usually plays an important role in the analysis of eye tracking data. So far, there existed no software package that contains a whole collection of eye tracking data processing and visualization tools. In this dissertation, we review the eye tracking technology, the eye tracking techniques, the existing software related to eye tracking, and the research on eye tracking for posters and related media. We then discuss the three main goals we have achieved in this dissertation: (i) development of a Matlab toolbox for automatically extracting mobile eye tracking data; (ii) development of the linked microposter plots family as new means for the visualization of eye tracking data; (iii) development of an R package for automatically extracting and visualizing data from mobile and static eye trackers. Oocyte ARTs Gene expression developmental competence blastocyst Mathematics Statistics and Probability
174	Linear Regression of the Poisson Mean Brown, Duane Steven 01 May 1982 (has links) The purpose of this thesis was to compare two estimation procedures, the method of least squares and the method of maximum likelihood, on sample data obtained from a Poisson distribution. Point estimates of the slope and intercept of the regression line and point estimates of the mean squared error for both the slope and intercept were obtained. It is shown that least squares, the preferred method due to its simplicity, does yield results as good as maximum likelihood. Also, confidence intervals were computed by Monte Carlo techniques and then were tested for accuracy. For the method of least squares, confidence bands for the regression line were computed under two different assumptions concerning the variance. It is shown that the assumption of constant variance produces false confidence bands. However, the assumption of the variance equal to the mean yielded accurate results. linear regression poisson mean intercept Applied Statistics Statistics and Probability
175	Explanation of the Fast Fourier Transform and Some Applications Endo, Alan Kazuo 01 May 1981 (has links) This report describes the Fast Fourier Transform and so~ of its applications. It describes the continuous Fourier transform and some of its properties. Finally, it describes the Fast Fourier Transform and its applications to hurricane risk analysis, ocean wave analysis, and hydrology. explanation Fourier transform applications Applied Statistics Statistics and Probability
176	A Tournament Approach to Price Discovery in the US Cattle Market Wright, Jeffrey 01 May 2017 (has links) Cattle price discovery is a process of determining the price in the market through the interactions of cattle buyers (packers) and sellers (ranchers). Locating the price discovery center or market, and estimating price interactions among the regional fed cattle markets and also among feeder cattle markets can help define a relevant fed cattle procurement market. This research identifies that the U.S. cattle markets is discovered in the futures markets, feeder cattle futures and fed futures. price discovery cattle commodity tournament Economics Statistics and Probability
177	An Evaluation of Truncated Sequential Test Chang, Ryh-Thinn 01 May 1975 (has links) The development of sequential analysis has led to the proposal of tests that are more economical in that the Average Sample Number (A. S. N.) of the sequential test is smaller than the sample size of the fixed sample test. Although these tests usually have a smaller A. S. N. than the equivelent fixed sample procedure, there still remains the possibility that an extremely large sample size will be necessary to make a decision. To remedy this, truncated sequential tests have been developed. A method of truncation for testing a composite hypotheses is studied. This method is formed by mixing a fixed sample test and a sequential test and is applied to the exponential distribution and normal distribution to establish its usefulness. It is proved that our truncation method can give a similar Operating Characteristic (O. C.) curve to that of corresponding fixed sample test if the test parameters are properly chosen. The average sample size required by our truncation method as compared with other existing truncation methods gives us a satisfactory result. Though the truncation method we suggested in this study is not an optimum truncation, it is still worthwhile, especially, when we are interested in the testing of a composite hypotheses. evaluation truncated sequential test Applied Statistics Mathematics Statistics and Probability
178	The Prior Distribution in Bayesian Statistics Chen, Kai-Tang 01 May 1979 (has links) A major problem associated with Bayesian estimation is selecting the prior distribution. The more recent literature on the selection of the prior is reviewed. Very little of a general nature on the selection of the prior is formed in the literature except for non-informative priors. This class of priors is seen to have limited usefulness. A method of selecting an informative prior is generalized in this thesis to include estimation of several parameters using a multivariate prior distribution. The concepts required for quantifying prior information is based on intuitive principles. In this way, it can be understood and controlled by the decision maker (i.e., those responsible for the consequences) rather than analysts. The information required is: (1) prior point estimates of the parameters being estimated and (2) an expression of the desired influence of the prior relative to the present data in determining the parameter estimates (e.g., item (2) implies twice as much influence as the data). These concepts (point estimates and influence) may be used equally with subjective or quantitative prior information. prior distribution bayesian statistics Applied Statistics Statistics and Probability
179	Fortran Programs for the Calculation of Most of the Commonly Used Experimental Design Models Greenhalgh, H. Wain 01 May 1967 (has links) Two computer programs were developed using a CDC 3100. They were written in FORTRAN IV. One program uses four tape drives, one card reader, and one printer. It will calculate factorial analysis of variance with or without covariance and/or multivariate analysis for one to eight factors and up to twenty-five variables. The other program is used for completely randomized designs, randomized block designs, and latin square designs. It will handle twenty-five treatments, rows (blocks), and columns. The program can handle fifteen variables using any number of these variables for covariates. FORTRAN programs calculation experimental design models Statistics and Probability
180	Design Optimization Using Model Estimation Programming Brimhall, Richard Kay 01 May 1967 (has links) Model estimation programming provides a method for obtaining extreme solutions subject to constraints. Functions which are continuous with continuous first and second derivatives in the neighborhood of the solution are approximated using quadratic polynomials (termed estimating functions) derived from computed or experimental data points. Using the estimating functions, an approximation problem is solved by a numerical adaptation of the method of Lagrange. The method is not limited by the concavity of the objective function. Beginning with an initial array of data observations, an initial approximate solution is obtained. Using this approximate solution as a new datum point, the coefficients for the estimating function are recalculated with a constrained least squares fit which forces intersection of the functions and their estimating functions at the last three observations. The constraining of the least squares estimate provides a sequence of approximate solutions which converge to the desired extremal. A digital computer program employing the technique is used extensively by Thiokol Chemical Corporation's Wasatch Division, especially for vehicle design optimization where flight performance and hardware constraints must be satisfied simultaneously. design optimization model estimation programming Applied Statistics Statistics and Probability

Search results