Global ETD Search

111	A New Screening Methodology for Mixture Experiments Weese, Maria 01 May 2010 (has links) Many materials we use in daily life are comprised of a mixture; plastics, gasoline, food, medicine, etc. Mixture experiments, where factors are proportions of components and the response depends only on the relative proportions of the components, are an integral part of product development and improvement. However, when the number of components is large and there are complex constraints, experimentation can be a daunting task. We study screening methods in a mixture setting using the framework of the Cox mixture model [1]. We exploit the easy interpretation of the parameters in the Cox mixture model and develop methods for screening in a mixture setting. We present specific methods for adding a component, removing a component and a general method for screening a subset of components in mixtures with complex constraints. The variances of our parameter estimates are comparable with the typically used Scheff ́e model variances and our methods provide a reduced run size for screening experiments with mixtures containing a large number of components. We then further extend the new screening methods by using Evolutionary Operation (EVOP) developed by Box and Draper [2]. EVOP methods use small movement in a subset of process parameters and replication to reveal effects out of the process noise. Mixture experiments inherently have small movements (since the proportions can only range from zero to unity) and the effects have large variances. We update the EVOP methods by using sequential testing of effects opposed to the confidence interval method originally proposed by Box and Draper. We show that the sequential testing approach as compared with a fixed sample size reduced the required sample size as much as 50 percent with all other testing parameters held constant. We present two methods for adding a component and a general screening method using a graphical sequential t-test and provide R-code to reproduce the limits for the test. mixture experiments screening Cox model Applied Statistics Design of Experiments and Sample Surveys Statistical Methodology
112	An Analysis of Boosted Regression Trees to Predict the Strength Properties of Wood Composites Carty, Dillon Matthew 01 August 2011 (has links) The forest products industry is a significant contributor to the U.S. economy contributing six percent of the total U.S. manufacturing gross domestic product (GDP), placing it on par with the U.S. automotive and plastics industries. Sustaining business competitiveness by reducing costs and maintaining product quality will be essential in the long term for this industry. Improved production efficiency and business competitiveness is the primary rationale for this work. A challenge facing this industry is to develop better knowledge of the complex nature of process variables and their relationship with final product quality attributes. Quantifying better the relationships between process variables (e.g., press temperature) and final product quality attributes plus predicting the strength properties of final products are the goals of this study. Destructive lab tests are taken at one to two hour intervals to estimate internal bond (IB) tensile strength and modulus of rupture (MOR) strength properties. Significant amounts of production occur between destructive test samples. In the absence of a real-time model that predicts strength properties, operators may run higher than necessary feedstock input targets (e.g., weight, resin, etc.). Improved prediction of strength properties using boosted regression tree (BRT) models may reduce the costs associated with rework (i.e., remanufactured panels due to poor strength properties), reduce feedstocks costs (e.g., resin and wood), reduce energy usage, and improve wood utilization from the valuable forest resource. Real-time, temporal process data sets were obtained from a U.S. particleboard manufacturer. In this thesis, BRT models were developed to predict the continuous response variables MOR and IB from a pool of possible continuous predictor variables. BRT model comparisons were done using the root mean squared error for prediction (RMSEP) and the RMSEP relative to the mean of the response variable as a percent (RMSEP%) for the validation data set(s). Overall, for MOR, RMSEP values ranged from 0.99 to 1.443 MPa, and RMSEP% values ranged from 7.9% to 11.6%. Overall, for IB, RMSEP values ranged from 0.074 to 0.108 MPa, and RMSEP% values ranged from 12.7% to 18.6%. Boosted Regression Tree Predictive Models Internal Bond Modulus of Rupture Particleboard Applied Statistics
113	Applying Localized Realized Volatility Modeling to Futures Indices Fu, Luella 01 January 2011 (has links) This thesis extends the application of the localized realized volatility model created by Ying Chen, Wolfgang Karl Härdle, and Uta Pigorsch to other futures markets, particularly the CAC 40 and the NI 225. The research attempted to replicate results though ultimately, those results were invalidated by procedural difficulties. Volatility Modeling Applied Statistics Statistical Models
114	Assessing Changes in the Abundance of the Continental Population of Scaup Using a Hierarchical Spatio-Temporal Model Ross, Beth E. 01 January 2012 (has links) In ecological studies, the goal is often to describe and gain further insight into ecological processes underlying the data collected during observational studies. Because of the nature of observational data, it can often be difficult to separate the variation in the data from the underlying process or `state dynamics.' In order to better address this issue, it is becoming increasingly common for researchers to use hierarchical models. Hierarchical spatial, temporal, and spatio-temporal models allow for the simultaneous modeling of both first and second order processes, thus accounting for underlying autocorrelation in the system while still providing insight into overall spatial and temporal pattern. In this particular study, I use two species of interest, the lesser and greater scaup (Aythya affnis and Aythya marila), as an example of how hierarchical models can be utilized in wildlife management studies. Scaup are the most abundant and widespread diving duck in North America, and are important game species. Since 1978, the continental population of scaup has declined to levels that are 16% below the 1955-2010 average and 34% below the North American Waterfowl Management Plan goal. The greatest decline in abundance of scaup appears to be occurring in the western boreal forest, where populations may have depressed rates of reproductive success, survival, or both. In order to better understand the causes of the decline, and better understand the biology of scaup in general, a level of high importance has been placed on retrospective analyses that determine the spatial and temporal changes in population abundance. In order to implement Bayesian hierarchical models, I used a method called Integrated Nested Laplace Approximation (INLA) to approximate the posterior marginal distribution of the parameters of interest, rather than the more common Markov Chain Monte Carlo (MCMC) approach. Based on preliminary analysis, the data appeared to be overdispersed, containing a disproportionately high number of zeros along with a high variance relative to the mean. Thus, I considered two potential data models, the negative binomial and the zero-inflated negative binomial. Of these models, the zero-inflated negative binomial had the lowest DIC, thus inference was based on this model. Results from this model indicated that a large proportion of the strata were not decreasing (i.e., the estimated slope of the parameter was not significantly different from zero). However, there were important exceptions with strata in the northwest boreal forest and southern prairie parkland habitats. Several strata in the boreal forest habitat had negative slope estimates, indicating a decrease in breeding pairs, while some of the strata in the prairie parkland habitat had positive slope estimates, indicating an increase in this region. Additionally, from looking at plots of individual strata, it seems that the strata experiencing increases in breeding pairs are experiencing dramatic increases. Overall, my results support previous work indicating a decline in population abundance in the northern boreal forest of Canada, and additionally indicate that the population of scaup has increased rapidly in the prairie pothole region since 1957. Yet, by accounting for spatial and temporal autocorrelation in the data, it appears that declines in abundance are not as widespread as previously reported. Bayesin hierchiacal modeling wildlife management hierchical spatial-temporal model Applied Statistics Statistical Models Statistics and Probability
115	New Results in ell_1 Penalized Regression Roualdes, Edward A. 01 January 2015 (has links) Here we consider penalized regression methods, and extend on the results surrounding the l1 norm penalty. We address a more recent development that generalizes previous methods by penalizing a linear transformation of the coefficients of interest instead of penalizing just the coefficients themselves. We introduce an approximate algorithm to fit this generalization and a fully Bayesian hierarchical model that is a direct analogue of the frequentist version. A number of benefits are derived from the Bayesian persepective; most notably choice of the tuning parameter and natural means to estimate the variation of estimates – a notoriously difficult task for the frequentist formulation. We then introduce Bayesian trend filtering which exemplifies the benefits of our Bayesian version. Bayesian trend filtering is shown to be an empirically strong technique for fitting univariate, nonparametric regression. Through a simulation study, we show that Bayesian trend filtering reduces prediction error and attains more accurate coverage probabilities over the frequentist method. We then apply Bayesian trend filtering to real data sets, where our method is quite competitive against a number of other popular nonparametric methods. linear model penalized regression Bayesian analysis Hierarchical Models Applied Statistics Statistical Models
116	A complex survey data analysis of TB and HIV mortality in South Africa. Murorunkwere, Joie Lea. January 2012 (has links) Many countries in the world record annual summary statistics such as economic indicators like Gross Domestic Product (GDP) and vital statistics for example the number of births and deaths. In this thesis we focus on mortality data from various causes including Tuberculosis (TB) and HIV. TB is an infectious disease caused by bacteria called Mycobacterium tuberculosis. It is the main cause of death in the world among all infectious diseases. An additional complexity is that HIV/AIDS acts as a catalyst to the occurrence of TB. Vaidyanathan and Singh revealed that people infected with mycobacterium tuberculosis alone have an approximately 10% life time risk of developing active TB, compared to 60% or more in persons co-infected with HIV and mycobacterium tuberculosis. South Africa was ranked seventh highest by the World Health Organization among the 22 TB high burden countries in the world and fourth highest in Africa. The research work in this thesis uses the 2007 Statistics South Africa (STATSSA) data on TB and HIV as the primary cause of death to build statistical models that can be used to investigate factors associated with death due to TB. Logistic regression, Survey Logistic regression and generalized linear models (GLM) will be used to assess the effect of risk factors or predictors to the probability of deaths associated with TB and HIV. This study will be guided by a theoretical approach to understanding factors associated with TB and HIV deaths. Bayesian modeling using WINBUGS will be used to assess spatial modeling of relative risk and spatial prior distributions for disease mapping models. Of the 615312 deceased, 546917 (89%) died from natural death, 14179 (2%) were stillborn and 54216 (9%) from non-natural death possibly accidents, murder, suicide. Among those who died from natural death and disease, 65052 (12%) died of TB and 13718 (2%) died of HIV. The results of the analysis revealed risk factors associated with TB and HIV mortality. / Thesis (M.Sc.)-University of KwaZulu-Natal, Pietermaritzburg, 2012. Tuberculosis--Mortality--South Africa. HIV infections. Theses--Applied statistics.
117	Applications of Monte Carlo Methods in Statistical Inference Using Regression Analysis Huh, Ji Young 01 January 2015 (has links) This paper studies the use of Monte Carlo simulation techniques in the field of econometrics, specifically statistical inference. First, I examine several estimators by deriving properties explicitly and generate their distributions through simulations. Here, simulations are used to illustrate and support the analytical results. Then, I look at test statistics where derivations are costly because of the sensitivity of their critical values to the data generating processes. Simulations here establish significance and necessity for drawing statistical inference. Overall, the paper examines when and how simulations are needed in studying econometric theories. Monte Carlo Econometrics Statistical Reference Estimator Dickey-Fuller Durbin-Watson Applied Statistics Econometrics Statistical Models
118	BAYESIAN SEMIPARAMETRIC GENERALIZATIONS OF LINEAR MODELS USING POLYA TREES Schoergendorfer, Angela 01 January 2011 (has links) In a Bayesian framework, prior distributions on a space of nonparametric continuous distributions may be defined using Polya trees. This dissertation addresses statistical problems for which the Polya tree idea can be utilized to provide efficient and practical methodological solutions. One problem considered is the estimation of risks, odds ratios, or other similar measures that are derived by specifying a threshold for an observed continuous variable. It has been previously shown that fitting a linear model to the continuous outcome under the assumption of a logistic error distribution leads to more efficient odds ratio estimates. We will show that deviations from the assumption of logistic error can result in great bias in odds ratio estimates. A one-step approximation to the Savage-Dickey ratio will be presented as a Bayesian test for distributional assumptions in the traditional logistic regression model. The approximation utilizes least-squares estimates in the place of a full Bayesian Markov Chain simulation, and the equivalence of inferences based on the two implementations will be shown. A framework for flexible, semiparametric estimation of risks in the case that the assumption of logistic error is rejected will be proposed. A second application deals with regression scenarios in which residuals are correlated and their distribution evolves over an ordinal covariate such as time. In the context of prediction, such complex error distributions need to be modeled carefully and flexibly. The proposed model introduces dependent, but separate Polya tree priors for each time point, thus pooling information across time points to model gradual changes in distributional shapes. Theoretical properties of the proposed model will be outlined, and its potential predictive advantages in simulated scenarios and real data will be demonstrated. Polya trees risk estimation logistic regression Bayesian nonparametrics longitudinal data Applied Statistics Statistical Methodology
119	Measurement of body posture using multivariate statistical techniques Petkov , John January 2005 (has links) The aim of this thesis is to develop a quantitative measure of postural defects known as lordosis and kyphosis. The measurement of these is an important part of their identification and treatment. Body Posture Statistical Theory Applied Statistics Biological Mathematics Measurement Statistical techniques
120	Statistical models for earthquakes incorporating ancillary data : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Statistics at Massey University, Palmerston North, New Zealand Wang, Ting January 2010 (has links) This thesis consists of two parts. The first part proposes a new model – the Markov-modulated Hawkes process with stepwise decay (MMHPSD) to investigate the seismicity rate. The MMHPSD is a self-exciting process which switches among different states, in each of which the process has distinguishable background seismicity and decay rates. Parameter estimation is developed via the expectation maximization algorithm. The model is applied to data from the Landers earthquake sequence, demonstrating that it is useful for modelling changes in the temporal patterns of seismicity. The states in the model can capture the behavior of main shocks, large aftershocks, secondary aftershocks and a period of quiescence with different background rates and decay rates. The state transitions can then explain the seismicity rate changes and help indicate if there is any seismicity shadow or relative quiescence. The second part of this thesis develops statistical methods to examine earthquake sequences possessing ancillary data, in this case groundwater level data or GPS measurements of deformation. For the former, signals from groundwater level data at Tangshan Well, China, are extracted for the period from 2002 to 2005 using a moving window method. A number of different statistical techniques are used to detect and quantify coseismic responses to P, S, Love and Rayleigh wave arrivals. The P phase arrivals appear to trigger identifiable oscillations in groundwater level, whereas the Rayleigh waves amplify the water level movement. Identifiable coseismic responses are found for approximately 40 percent of magnitude 6+ earthquakes worldwide. A threshold in the relationship between earthquake magnitude and well–epicenter distance is also found, satisfied by 97% of the identified coseismic responses, above which coseismic changes in groundwater level at Tangshan Well are most likely. A non-linear filter measuring short-term deformation rate changes is introduced to extract signals from GPS data. For two case studies of a) deep earthquakes in central North Island, New Zealand, and b) shallow earthquakes in Southern California, a hidden Markov model (HMM) is fitted to the output from the filter. Mutual information analysis indicates that the state having the largest variation of deformation rate contains precursory information that indicates an elevated probability for earthquake occurrence. Statistical modelling Earthquakes

Search results