Global ETD Search

21	Penalized likelihood estimation of a fixed-effect and a mixed-effect transfer function model Hansen, Elizabeth Ann 01 January 2006 (has links) Motivated by the need of estimating the main spawning period of North Sea cod, we develop a common transfer function model with a panel of contemporaneously correlated times series data. This model incorporates (i) the smoothness on the parameters by assuming that the second differences are small and (ii) the contemporaneous correlation by assuming that the errors have a general variance-covariance matrix. Penalized likelihood estimation of this model requires an iterative procedure that is developed in this work. We develop three methods for determining confidence bands: frequentist, Bayesian, and bootstrap (both nonparametric and parametric). A simulation study on the frequentist and Bayesian confidence bands motivated by the cod spawning data is conducted and the results of those simulations are compared. The model is then used on the cod spawning data, with all confidence bands computed. The results of this analysis are discussed. We then delve further into our model by discussing the theory behind this model. We prove a theorem that shows that the estimated regression parameter vector is a consistent estimate of the true regression parameter. We further prove that this estimated regression parameter vector has an asymptotic normal distribution. Both theorems are proved while assuming mild conditions. We further develop our model by incorporating between-series variation in the transfer function, with the random effect assumed to have a normal distribution with a "smooth" mean vector. We implement the EM algorithm to do the penalized likelihood estimation. We consider five different specifications of the variance-covariance matrix of the random transfer function model, namely, a general variance-covariance matrix, a diagonal matrix, a multiple of the identity matrix, an autoregressive matrix of order one, and a multiplicative error specification. Since the computation of confidence bands would lead to numerical problems, we introduce a bootstrap approach for estimating the confidence bands. We consider both the nonparametric and parametric bootstrap approaches. We then apply this model to estimate the cod spawning period, while also looking into the different specifications of the variance-covariance matrix of the random effect, the two types of bootstrapped confidence bands, and model checking. Statistics and Probability
22	Statistical analysis of non-linear diffusion process Su, Fei 01 December 2011 (has links) In this paper, we study the problem of statistical inference of continuous-time diffusion processes and their higher-order analogues, and develop methods for modeling threshold diffusion processes in particular. The limiting properties of such estimators are also discussed. We also proposed the likelihood ratio test statistics for testing threshold diffusion process against its linear alternative. We begin in Chapter 1 with an introduction of continuous-time non-linear diffusion processes where I summarized the literature on model estimation. The most natural extension from affine to non-linear model would be piecewise linear diffusion process with piecewise constant variance functions. It can also be considered as a continuous-time threshold autoregressive model (CTAR), the continuous-time analogue of AR model for discrete-time time-series data. The order-one CTAR model is discussed in detail. The discussion is directed more toward the estimation techniques other than the mathematical details. Existing inferential methods (estimation and testing) generally assume known functional form of the (instantaneous) variance function. In practice, the functional form of the variance function is hardly known. So, it is important to develop new methods for estimating a diffusion model that does not rely on knowledge on the functional form of the variance function. In the second Chapter, we propose the quasi-likelihood method to estimate the parameters indexing the mean function of a threshold diffusion model without prior knowledge of its instantaneous variance structure. (and apply to other nonlinear diffusion models, which will be further investigated later.) We also explore the limiting properties of the quasi-likelihood estimators. We focus on estimating the mean function, after which the functional form of the instantaneous variance function can be explored and subsequently estimated from quadratic variation considerations. We show that, under mild regularity conditions, the quasi-likelihood estimators of the parameters in the linear mean function of each regime are consistent and are asymptotically normal, whereas the threshold parameter is super consistent and weakly converges to some non-Gaussian continuous distribution. A notable feature is that the limiting distribution of the threshold parameter admits a closed-form probability density function, which enables the construction of its confidence interval; in contrast, for the discrete-time TAR models, the construction of the confidence interval for the threshold parameter has, so far, not been practically solved. A simulation study is provided to illustrate the asymptotic results. We also use the threshold model to estimate the term structure of a long time series of US interest rates. It is also of theoretical and practical interest that whether the observed process indeed satisfy the threshold model. In Chapter 3, we propose a likelihood ratio test scheme to test the existence of thresholds. It can test for non-linearity. Most importantly, we shall study how to price and predict value processes with nonlinear diffusion processes.be shown, under the null hypothesis of no threshold, the test statistics converges to a central Gaussian process asymptotically. Also the test is asymptotically powerful and the asymptotic distribution of the test statistic under the alternative hypothesis converge to a non-central Gaussian distribution. Further, the limiting distribution is the same as that of its discrete analogues for testing TAR(1) model against autoregressive model. Thus the upper percentage points of the asymptotic statistics for the discrete case are immediately applicable for our tests. Simulation studies are also conducted to show the empirical size and power of the tests. The application of our current method leads to more future work briefly discussed in Chapter 4. For example, we would like to extend our estimation methods to higher order and higher dimensional cases, use more general underlying mean processes, and most importantly, we shall study how to price and predict value processes with nonlinear diffusion processes. Statistics and Probability
23	Approximations to Continuous Dynamical Processes in Hierarchical Models Cangelosi, Amanda 01 December 2008 (has links) Models for natural nonlinear processes, such as population dynamics, have been given much attention in applied mathematics. For example, species competition has been extensively modeled by differential equations. Often, the scientist has preferred to model the underlying dynamical processes (i.e., theoretical mechanisms) in continuous-time. It is of both scientific and mathematical interest to implement such models in a statistical framework to quantify uncertainty associated with the models in the presence of observations. That is, given discrete observations arising from the underlying continuous process, the unobserved process can be formally described while accounting for multiple sources of uncertainty (e.g., measurement error, model choice, and inherent stochasticity of process parameters). In addition to continuity, natural processes are often bounded; specifically, they tend to have non-negative support. Various techniques have been implemented to accommodate non-negative processes, but such techniques are often limited or overly compromising. This study offers an alternative to common differential modeling practices by using a bias-corrected truncated normal distribution to model the observations and latent process, both having bounded support. Parameters of an underlying continuous process are characterized in a Bayesian hierarchical context, utilizing a fourth-order Runge-Kutta approximation. Statistics and Probability
24	“Reliability Analysis of Oriented Strand Board’s Strength with a Simulation Study of the Median Censored Method for Estimating of Lower Percentile Strength Wang, Yang 01 August 2007 (has links) Oriented Strand Board (OSB), an engineered wood product, has gained increased market acceptance as a construction material. Because of its growing market, the product’s manufacturing and performance have become the focus of much research. Internal Bond (IB) and Parallel and Perpendicular Elasticity Indices (EI), are important strength metrics of OSB and are analyzed in this thesis using statistical reliability methods. The data for this thesis consists of 529 destructive tests of OSB panels. They were tested from July 2005 to January 2006. These OSB panels came from a modern OSB manufacture in the Southeastern United States with the wood furnish being primarily Southern Pine (Pinus spp.). The 529 records are for 7/16” thickness OSB strength, which is rated for roof sheathing (i.e., 7/16” RS). Descriptive statistics of IB and EI are summarized including mean, median, standard deviation, Interquartile range, skewness etc. Visual tools such as histograms and box plots are utilized to identify outliers and improve the understanding of the data. Survival plots or Kaplan-Meier curves are important methods for conducting nonparametric analyses of life (or strength) reliability data and are used in this thesis to estimate the strength survival function of the IB and EI of OSB. Probability Plots and Information Criteria are used to determine the best underlying distribution or probability density function. The OSB data used in this thesis fit the lognormal distribution best for both IB and EI. One outlier is excluded for the IB data and six outliers are excluded for the EI data. Estimation of lower percentiles is very important for quality assurance. In many reliability studies, there is great interest in estimating the lower percentiles of life or strength. In OSB, the lower percentiles of strength may result in catastrophic failures during installation of OSB panels. Catastrophic failure of 7/16” RS OSB, which is used primarily for residential construction of roofs, may result in severe injury or death of construction workers. The liability and risk to OSB manufacturers from severe injury or death to construction workers from an OSB panel failure during construction can result in extreme loss of market share and significant financial losses. In reliability data, “multiple failure modes” is common. Simulated data of mixed distribution of the two two-parameter Weibull distribution is produced to mimic the multiple failure modes for the reliability data A forced median censored method is adopted to estimate lower percentiles of the simulated data. Results of the simulation indicate that the estimated lower percentiles median censored method is relatively close to the true parametric percentiles when compared to not using the median censored method. I conclude that the median censoring method is a useful tool for improving estimation of the lower percentiles for OSB panel failure. Statistics and Probability
25	An Applied Statistical Reliability Analysis of the Modulus of Elasticity and the Modulus of Rupture for Wood-Plastic Composites Perhac, Diane Goodman 01 August 2007 (has links) Wood-plastic composites (WPC) are materials comprised of wood fiber within a thermoplastic matrix and are a growing and important source of alternative wood products in the forest products industry. WPC is gaining market share in the building industry because of durability/maintenance advantages of WPC over traditional wood products and because of the removal of chromated copper arsenate (CCA) pressuretreated wood from the market. The reliability methods outlined in this thesis can be used to improve the quality of WPC and lower manufacturing costs by reducing raw material inputs and minimizing WPC waste. Statistical methods are described for analyzing both tensile strength and bending measures of WPC. These key measures include stiffness (tangent modulus of elasticity: MOE) and flexural strength (modulus of rupture: MOR) results from both tensile strength and bending tests. As with any real data analysis, the possibility of outliers is assessed and addressed. With this data, different WPC subsets are evaluated with and without the presence of a coupling agent. Separate subsets without outliers are also reviewed. Descriptive statistics, histograms, probability plots, and survival curves from these test data are presented and interpreted. To provide a more objective assessment of appropriate parametric modeling, Akaike’s Information Criterion is used in conjunction with probability plotting. Selection of the best underlying distribution for the data is an important result that may be used to further explore and analyze the given data. In this thesis, these underlying distributional assumptions are utilized to better understand the product’s lower percentiles. These lower percentiles provide practitioners with an evaluation of the product’s early failures along with providing information for specification limits, warranty, and cost analysis. Estimation of lower percentiles is sometimes difficult, since substantive data is often sparse in the lower tails. Bootstrap techniques provide important solutions for confidence interval assessments of these percentiles. Bootstrapping is a computer intensive resampling method that may be used for both parametric and nonparametric models. This thesis briefly describes several bootstrapping methods and applies these methods to appraise MOE and MOR test results on sampled WPC. The reliability and bootstrapping methods outlined in this thesis may directly benefit WPC manufacturers through a better evaluation of strength and stiffness measures, which can lead to process improvements with enhanced reliability, thereby creating greater manufacturer and customer satisfaction. Statistics and Probability
26	Examining Regression Analysis Beyond the Mean of the Distribution using Quantile Regression: A Case Study of Modeling the Internal Bond of Medium Density Fiberboard using Multiple Linear Regression and Quantile Regression with an Example of Reliability Methods using R Software Shaffer, Leslie Brooke 01 August 2007 (has links) The thesis examines the causality of the central tendency of the Internal Bond (IB) of Medium Density Fiberboard (MDF) with predictor variables from the MDF manufacturing process. Multiple linear regression (MLR) models are developed using a best model criterion for all possible subsets of IB for four MDF thickness products reported in inches, e.g., 0.750”, 0.625”, 0.6875”, and 0.500”. Quantile Regression (QR) models of the median IB are also developed. The adjusted coefficient of determination (R2 a) of the MLR models range from 72% with 53 degrees of freedom to 81% with 42 degrees of freedom, respectively. The Root Mean Square Errors (RMSE) range from 6.05 pounds per square inch (p.s.i.) to 6.23 p.s.i. A common independent variable for the 0.750” and 0.625” products is “Refiner Resin Scavenger %”. QR models for 0.750” and 0.625” have similar slopes for the median and average but different slopes for the 5th and 95th percentiles. “Face Humidity” is a common predictor for the 0.6875” and 0.500” products. QR models for 0.6875” and 0.500” indicate different slopes for the median and average with different slopes for the outer 5th and 95th percentiles. The MLR and QR validation models for the 0.750”, 0.625” and 0.6875” product types have coefficients of determination for the validation data set ( R2validation ) ranging from 40% to 60% and RMSEP ranging from 26.5 p.s.i. to 27.85 p.s.i.. The MLR validation model for the 0.500” product has a R2validation and RMSEP of 64% and 23.63 p.s.i. while the QR validation model has a R2validation and RMSEP of 66% and 19.18 p.s.i. The IB for 0.500” has departure from normality that is reflected in the results of the validation models. The thesis results provide further evidence that QR is a more defendable method for modeling the central tendency of a response variable when the response variable departs from normality. The use of QR provides MDF manufacturers with an opportunity to examine causality beyond the mean of the distribution. Examining the lower and upper percentiles of a distribution may provide significant insight for identifying process variables that influence IB failure or extreme IB strength. Keywords. Statistics and Probability
27	“Reliability Analysis of Oriented Strand Board’s Strength with a Simulation Study of the Median Censored Method for Estimating of Lower Percentile Strength Wang, Yang 01 August 2007 (has links) Oriented Strand Board (OSB), an engineered wood product, has gained increased market acceptance as a construction material. Because of its growing market, the product’s manufacturing and performance have become the focus of much research. Internal Bond (IB) and Parallel and Perpendicular Elasticity Indices (EI), are important strength metrics of OSB and are analyzed in this thesis using statistical reliability methods.The data for this thesis consists of 529 destructive tests of OSB panels. They were tested from July 2005 to January 2006. These OSB panels came from a modern OSB manufacture in the Southeastern United States with the wood furnish being primarily Southern Pine (Pinus spp.). The 529 records are for 7/16” thickness OSB strength, which is rated for roof sheathing (i.e., 7/16” RS).Descriptive statistics of IB and EI are summarized including mean, median, standard deviation, Interquartile range, skewness etc. Visual tools such as histograms and box plots are utilized to identify outliers and improve the understanding of the data. Survival plots or Kaplan-Meier curves are important methods for conducting nonparametric analyses of life (or strength) reliability data and are used in this thesis to estimate the strength survival function of the IB and EI of OSB. Probability Plots and Information Criteria are used to determine the best underlying distribution or probability density function. The OSB data used in this thesis fit the lognormal distribution best for both IB and EI. One outlier is excluded for the IB data and six outliers are excluded for the EI data.Estimation of lower percentiles is very important for quality assurance. In many reliability studies, there is great interest in estimating the lower percentiles of life or strength. In OSB, the lower percentiles of strength may result in catastrophic failures during installation of OSB panels. Catastrophic failure of 7/16” RS OSB, which is used primarily for residential construction of roofs, may result in severe injury or death of construction workers. The liability and risk to OSB manufacturers from severe injury or death to construction workers from an OSB panel failure during construction can result in extreme loss of market share and significant financial losses.In reliability data, “multiple failure modes” is common. Simulated data of mixed distribution of the two two-parameter Weibull distribution is produced to mimic the multiple failure modes for the reliability data A forced median censored method is adopted to estimate lower percentiles of the simulated data. Results of the simulation indicate that the estimated lower percentiles median censored method is relatively close to the true parametric percentiles when compared to not using the median censored method. I conclude that the median censoring method is a useful tool for improving estimation of the lower percentiles for OSB panel failure. Statistics and Probability
28	Behavioral Modeling of Botnet Populations Viewed through Internet Protocol Address Space Weaver, Rhiannon 01 May 2012 (has links) A botnet is a collection of computers infected by a shared set of malicious software, that maintain communications to a single human administrator or small organized group. Botnets are indirectly observable populations; cyber-analysts often measure a botnet’s threat in terms of its size, but size is derived from a count of the observable network touchpoints through which infected machines communicate. Activity is often a count of packets or connection attempts, representing logins to command and control servers, spam messages sent, peer-to-peer communications, or other discrete network behavior. Front line analysts use sandbox testing of a botnet’s malicious software to discover signatures for detecting an infected computer and shutting it down, but there is less focus on modeling the botnet population as a collection of machines obscured by the kaleidoscope view of Internet Protocol (IP) address space. This research presents a Bayesian model for generic modeling of a botnet due to its observable activity across a network. A generation-allocation model is proposed, that separates observable network activity at time t into the counts yt generated by the malicious software, and the network’s allocation of these counts among available IP addresses. As a first step, the framework outlines how to develop a directly observable behavioral model informed by sandbox tests and day-to-day user activity, and then how to use this model as a basis for population estimation in settings using proxies or Network Address Translation (NAT) in which only the aggregate sum of all machine activity is observed. The model is explored via a case study using the Conficker-C botnet that emerged in March of 2009. Statistics and Probability
29	High-Dimensional Adaptive Basis Density Estimation Buchman, Susan 01 May 2011 (has links) In the realm of high-dimensional statistics, regression and classification have received much attention, while density estimation has lagged behind. Yet there are compelling scientific questions which can only be addressed via density estimation using high-dimensional data, such as the paths of North Atlantic tropical cyclones. If we cast each track as a single high-dimensional data point, density estimation allows us to answer such questions via integration or Monte Carlo methods. In this dissertation, I present three new methods for estimating densities and intensities for high-dimensional data, all of which rely on a technique called diffusion maps. This technique constructs a mapping for high-dimensional, complex data into a low-dimensional space, providing a new basis that can be used in conjunction with traditional density estimation methods. Furthermore, I propose a reordering of importance sampling in the high-dimensional setting. Traditional importance sampling estimates high-dimensional integrals with the aid of an instrumental distribution chosen specifically to minimize the variance of the estimator. In many applications, the integral of interest is with respect to an estimated density. I argue that in the high-dimensional realm, performance can be improved by reversing the procedure: instead of estimating a density and then selecting an appropriate instrumental distribution, begin with the instrumental distribution and estimate the density with respect to it directly. The variance reduction follows from the improved density estimate. Lastly, I present some initial results in using climatic predictors such as sea surface temperature as spatial covariates in point process estimation. Statistics and Probability
30	The Short Time Fourier Transform and Local Signals Okamura, Shuhei 01 June 2011 (has links) In this thesis, I examine the theoretical properties of the short time discrete Fourier transform (STFT). The STFT is obtained by applying the Fourier transform by a fixed-sized, moving window to input series. We move the window by one time point at a time, so we have overlapping windows. I present several theoretical properties of the STFT, applied to various types of complex-valued, univariate time series inputs, and their outputs in closed forms. In particular, just like the discrete Fourier transform, the STFT’s modulus time series takes large positive values when the input is a periodic signal. One main point is that a white noise time series input results in the STFT output being a complex-valued stationary time series and we can derive the time and time-frequency dependency structure such as the cross- covariance functions. Our primary focus is the detection of local periodic signals. I present a method to detect local signals by computing the probability that the squared modulus STFT time series has consecutive large values exceeding some threshold after one exceeding observation following one observation less than the threshold. We discuss a method to reduce the computation of such probabilities by the Box-Cox transformation and the delta method, and show that it works well in comparison to the Monte Carlo simulation method. Statistics and Probability

Search results