Global ETD Search

221	Computational approaches for maximum likelihood estimation for nonlinearmixed models. Hartford, Alan Hughes 19 July 2000 (has links) <p><P>The nonlinear mixed model is an important tool for analyzingpharmacokinetic and other repeated-measures data.In particular, these models are used when the measured response for anindividual,,has a nonlinear relationship with unknown, random, individual-specificparameters,.Ideally, the method of maximum likelihood is used to find estimates forthe parameters ofthe model after integrating out the random effects in the conditionallikelihood. However, closed form solutions tothe integral are generally not available. As a result, methods have beenpreviously developed to find approximatemaximum likelihood estimates for the parameters in the nonlinear mixedmodel. These approximate methods include FirstOrder linearization, Laplace's approximation, importance sampling, andGaussian quadrature. The methods are availabletoday in several software packages for models of limited sophistication;constant conditional error variance is requiredfor proper utilization of most software. In addition, distributionalassumptions are needed. This work investigates howrobust two of these methods, First Order linearization and Laplace'sapproximation, are to these assumptions. The findingis that Laplace's approximation performs well, resulting in betterestimation than first order linearization when bothmodels converge to a solution. </P><P>A method must provide good estimates of the likelihood at points inthe parameter space near the solution. This workcompares this ability among the numerical integration techniques,Gaussian quadrature, importance sampling, and Laplace'sapproximation. A new "scaled" and "centered" version of Gaussianquadrature is found to be the most accurate technique.In addition, the technique requires evaluation of the integrand at onlya few abscissas. Laplace's method also performswell; it is more accurate than importance sampling with even 100importance samples over two dimensions. Even so,Laplace's method still does not perform as well as Gaussian quadrature.Overall, Laplace's approximation performs betterthan expected, and is shown to be a reliable method while stillcomputationally less demanding. </P><P>This work also introduces a new method to maximize the likelihood.This method can be sharpened to any desired levelof accuracy. Stochastic approximation is incorporated to continuesampling until enough information is gathered to resultin accurate estimation. This new method is shown to work well for linearmixed models, but is not yet successful for thenonlinear mixed model.</P><P> Statistics
222	Evaluation of Frequentist and Bayesian Inferences by Relevant Simulation Kim, Yuntae 03 August 2000 (has links) <p>Statistical inference procedures in real situations often assume not only the basic assumptions on which the justification of the inference is based, but also some additional assumptions which could affect the justification of the inference. In many cases, the inference procedure is too complicated to be evaluated analytically. Simulation-based evaluation for such an inference could be an alternative, but generally simulation results would be valid only under specific simulated circumstances. For simulation in the parametric model set-up, the simulation result may depend on the chosen parameter value. In this study, we suggest an evaluation methodology relying on an observation-based simulation for frequentist and Bayesian inferences on parametric models. We describe our methodology with the suggestions for three aspects: factors to be measured for the evaluation, measurement of the factors, and evaluation of the inference based on the factors measured. Additionally, we provide an adjustment method for inferences found to be invalid. The suggested methodology can be applied to any inference procedure, no matter how complicated as long as it can be codified for repetition on the computer. Unlike general simulation in the parametric model, the suggested methodology provides results which do not depend on specific parameter values.The argument about the new EPA standards for particulate matter (PM) has led to some statistical analyses for the effect of PM on mortality. Typically, regression model of mortality of the elderly is constructed. The covariates of the regression model includes a suite of highly correlated particulate matter related variables in addition to trend and meteorology variables as the confounding nuisance variables. The classical strategy for the regression model having a suite of highly correlated covariates is to select a model based on a subset of the covariates and then make inference assuming the subset model. However, one might be concerned about the validity of the inferences under his classical strategy since it ignores the uncertainty in the choice of a subset model. The suggested evaluation methodology was applied to evaluate the inference procedures for the PM-mortality regression model with some data from Phoenix, Arizona taken from 1995 to early 1998 . The suggested methodology provided valid evaluation results for various inference procedures and also provided adjusted inferences superior to Bonferroni adjustment. Based on our results, we are able to make conclusions about the short-term effect of PM which does not suffer from validity concerns.<P> Statistics
223	Nonparametric Spatial analysis in spectral and space domains Kim, Hyon-Jung 23 August 2000 (has links) <p>KIM, HYON-JUNG. Variance Estimation in Spatial Regression Using a NonparametricSemivariogram Based on Residuals. (Under the direction of Professor Dennis D. Boos.)The empirical semivariogram of residuals from a regression model withstationary errors may be used to estimate the covariance structure of the underlyingprocess.For prediction (Kriging) the bias of the semivariogram estimate induced byusing residuals instead of errors has only a minor effect because thebias is small for small lags. However, for estimating the variance of estimatedregression coefficients and of predictions,the bias due to using residuals can be quite substantial. Thus wepropose a method for reducing the bias in empirical semivariogram estimatesbased on residuals. The adjusted empirical semivariogram is then isotonizedand made positive definite and used to estimate the variance of estimatedregression coefficients in a general estimating equations setup.Simulation results for least squares and robust regression show that theproposed method works well in linear models withstationary correlated errors. Spectral Analysis with Spatial Periodogram and Data Tapers.(Under the direction of Professor Montserrat Fuentes.)The spatial periodogram is a nonparametric estimate of the spectral density, which is the Fourier Transform of the covariance function. The periodogram is a useful tool to explain the dependence structure of aspatial process.Tapering (data filtering) is an effective technique to remove the edge effects even inhigh dimensional problemsand can be applied to the spatial data in order to reduce the bias of the periodogram.However, the variance of the periodogram increases as the bias is reduced.We present a method to choose an appropriate smoothing parameter for datatapers and obtain better estimates of the spectral densityby improving the properties of the periodogram.The smoothing parameter is selected taking intoaccount the trade-off between bias and variance of the taperedperiodogram. We introduce a new asymptotic approach for spatial datacalled `shrinking asymptotics', which combines theincreasing-domain and the fixed-domain asymptotics.With this approach, the tapered spatial periodogram can be usedto determine uniquely the spectral density of the stationary process,avoiding the aliasing problem. <P> Statistics
224	PARAMETRIC MODELING IN THE PRESENCE OF MEASUREMENT ERROR: MONTE CARLO CORRECTED SCORES Novick, Steven Jon 23 October 2000 (has links) <p>Parametric estimation is complicated when data are measured with error. The problem of regression modeling when one or more covariates are measured with error is considered in this paper. It is often the case that, evaluated at the observed error-prone data, the unbiased true-data estimating equations yield an inconsistent estimator.The proposed method is a variant of Nakamura's (1990) method of corrected scores and is closely related to the simulation-based algorithm introduced by Cook and Stefanski (1994). The corrected-score method depends critically on finding a function of the observed data having the property that its conditional expectation given the true data equals a true-data, unbiased score function. Nakamura (1990) gives corrected score functions for special cases, but offers no general solution.It is shown that for a certain class of smooth true-data score functions, a corrected score can be determined by Monte Carlo methods, if not analytically. The relationshipbetween the corrected score method and Cook and Stefanski's (1994) simulation method is studied in detail. The properties of the Monte Carlo generated corrected scorefunctions, and of the estimators they define, are also given considerable attention. Special cases are examined in detail, comparing the proposed method with establishedmethods.<P> Statistics
225	Data Reduction and Model Selection with Wavelet Transforms Martell, Leah A. 07 November 2000 (has links) <p>Martell, Leah. Data Reduction and Model Selection with Wavelet Transforms. (Under the directiion of Dr. J. C. Lu.)With modern technology massive quantities of data are being collectedcontinuously. The purpose of our research has been to develop amethod for data reduction and model selection applicable to large data setsand replicated data. We propose a novel wavelet shrinkage method byintroducing a new model selection criterion. The proposed shrinkage rule hasat least two advantages over the current shrinkage methods. First, it isadaptive to the smoothness of the signal regardless of whether it has a sparsewavelet representation, since we consider both the deterministic and thestochastic cases. The wavelet decomposition not only catches the signalcomponents for a pure signal, but de-noises and extracts these signalcomponents for a signal contaminated by external influences. Second, theproposed method allows for fine ``tuning'' based on the particular data athand. Our simulation studyshows that the methods based on the model selection criterion have better meansquare error (MSE)over the methods currently known. Two aspects make wavelet analysis the analytical tool of choice.First, thelargest in magnitude wavelet coefficients in the discrete wavelet transform (DWT) ofthe data, extract the relevant information, while discarding the resteliminates the noise component. Second, the DWT allows for a fast algorithmcalculation of computational complexity O(n). For the deterministic case we derive a bound on the approximation error of thenonlinear wavelet estimate determined by the largest in magnitude discrete wavelet coefficients. Upper bounds for the approximation error and the rateof increase of the number of wavelet coefficients in the model areobtained for the new wavelet shrinkage estimate. When the signal comes from astochastic process,a bound for the MSE is found, and for the bias of its estimate. A corrected version of the model selection criterion is introduced and some of its properties are studied. The new wavelet shrinkage is employed in the case of replicated data. An algorithm for model selection is proposed,based on which a manufacturing process can be automatically supervised for quality and efficiency. Weapply it to two real life examples. <P> Statistics
226	Statistical Inference for Gap Data Yang, Liqiang 13 November 2000 (has links) <p>This thesis research is motivated by a special type of missing data - Gap Data, which was first encountered in a cardiology study conducted at Duke Medical School. This type of data include multiple observations of certain event time (in this medical study the event is the reopenning of a certain artery), some of them may have one or more missing periods called ``gaps'' before observing the``first'' event. Therefore, for those observations, the observed first event may not be the true first event because the true first event might have happened in one of the missing gaps. Due to this kind of missing information, estimating the survival function of the true first event becomes very difficult. No research nor discussion has been done on this type of data by now. In this thesis, the auther introduces a new nonparametric estimating method to solve this problem. This new method is currently called Imputed Empirical Estimating (IEE) method. According to the simulation studies, the IEE method provide a very good estimate of the survival function of the true first event. It significantly outperforms all the existing estimating approaches in our simulation studies. Besides the new IEE method, this thesis also explores the Maximum Likelihood Estimate in thegap data case. The gap data is introduced as a special type of interval censored data for thefirst time. The dependence between the censoring interval (in the gap data case is the observedfirst event time point) and the event (in the gap data case is the true first event) makes the gap data different from the well studied regular interval censored data. This thesis points of theonly difference between the gap data and the regular interval censored data, and provides a MLEof the gap data under certain assumptions.The third estimating method discussed in this thesis is the Weighted Estimating Equation (WEE)method. The WEE estimate is a very popular nonparametric approach currently used in many survivalanalysis studies. In this thesis the consistency and asymptotic properties of the WEE estimateused in the gap data are discussed. Finally, in the gap data case, the WEE estimate is showed to be equivalent to the Kaplan-Meier estimate. Numerical examples are provied in this thesis toillustrate the algorithm of the IEE and the MLE approaches. The auther also provides an IEE estimate of the survival function based on the real-life data from Duke Medical School. A series of simulation studies are conducted to assess the goodness-of-fit of the new IEE estimate. Plots and tables of the results of the simulation studies are presentedin the second chapter of this thesis.<P> Statistics
227	SAMPLE SIZE DETERMINATION AND STATIONARITY TESTING IN THE PRESENCE OF TREND BREAKS Huh, Seungho 23 February 2001 (has links) <p>Traditionally it is believed that most macroeconomic time series represent stationary fluctuations around a deterministic trend. However, simple applications of the Dickey-Fuller test have, in many cases, been unable to show that major macroeconomic variables are stationary univariate time series structure. One possible reason for non-rejection of unit roots is that the simple mean or linear trend function used by the tests are not sufficient to describe the deterministic part of the series. To address this possibility, unit root tests in the presence of trend breaks have been studied by several researchers.In our work, we deal with some issues associated with unit root testing in time series with a trend break.The performance of various unit root test statistics is compared with respect to the break induced size distortion problem. We examine the effectiveness of tests based on symmetric estimators as compared to those based on the least squares estimator.In particular, we show that tests based on the weighted symmetric estimator not only eliminate thespurious rejection problem but also have reasonably good power properties when modified to allow for a break.We suggest alternative test statistics for testing the unit root null hypothesis in the presence of a trend break. Our new test procedure, which we call the ``bisection'' method, is based on the idea of subgrouping. This is simpler than other methods since the necessity of searching for the break is avoided.Using stream flow data from the US Geological Survey, we perform a temporal analysis of some hydrologicvariables. We first show that the time series for the target variables are stationary, then focus on finding the sample size necessary to detect a mean change if one occurs. Three different approaches are used to solve this problem: OLS, GLS and a frequency domain method. A cluster analysis of stations is also performed using these sample sizes as data.We investigate whether available geographic variables can be used to predict cluster membership. <P> Statistics
228	STATISTICAL ANALYSIS AND MODELING OF PHARMACOKINETIC DATA FROM PERCUTANEOUS ABSORPTION Budsaba, Kamon 26 March 2001 (has links) <p>Statistical analysis applied to percutaneous absorption andcutaneous disposition of different types of jet fuels is presented. A slightly different question is addressed with methyl salicylate absorption, namely when one compound can be used as a simulant for another compound. A new graphical statistics method, called ``Compass Plot'', is introduced for displaying the results in the design of experiments, especially for balanced factorial experiments. An example of compass plots for visualizing significant interactions in complex toxicology studies is provided. It is followed by a simulation study on an approximated F-test to determine whether a random effects model is needed for the exponential difference model. A new multivariate coefficient of variation, used as an index to determine which effects have a random component, is also introduced and investigated by simulations and two real datasets.<P> Statistics
229	Comparing Bayesian, Maximum Likelihood and Classical Estimates for the Jolly-Seber Model Brown, George Gordon Jr. 30 May 2001 (has links) <p>BROWN Jr., GEORGE GORDON . Comparing Bayesian, Maximum Likelihood and Classical Estimates for the Jolly-Seber Model. (Under the direction of John F. Monahan and Kenneth H. Pollock)In 1965 Jolly and Seber proposed a model to analyze data for open population capture-recapture studies. Despite frequent use of the Jolly-Seber model, likelihood-based inference is complicated by the presence of a number of unobservable variables that cannot be easily integrated from the likelihood. In order to avoid integration, various statistical methods have been employed to obtain meaningful parameter estimates. Conditional maximum likelihood, suggested by both Jolly and Seber, has become the standard method. Two new parameter estimation methods, applied to the Jolly-Seber Model D, are presented in this thesis. The first new method attempts to obtain maximum likelihood estimates after integrating all of the unobservable variables from the Jolly-Seber Model D likelihood. Most of the unobservable variables can be analytically integrated from the likelihood. However, the variables dealing with the abundance of uncaptured individuals must be numerically integrated. A FORTRAN program was constructed to perform the numerical integration and search for MLEs using a combination of fixed quadrature and Newton's method. Since numerical integration tends to be very time consuming, MLEs could only be obtained from capture-recapture studies with a small number of sampling periods. In order to test the validity of the MLE, a simulation experiment was conducted that obtained MLEs from simulated data for a wide variety of parameter values. Variance estimates for these MLEs were obtained using the Chapman-Robbins lower bound. These variances estimates were used to construct 90% confidence intervals with approximately correct coverage. However, in cases with few recaptures the MLEs performed poorly. In general, the MLEs tended to perform well on a wide variety of the simulated data sets and appears to be a valid tool for estimating population characteristics for open populations. The second new method employs the Gibbs sampler on an unintegrated and an integrated version of the Jolly-Seber Model D likelihood. For both version full conditional distributions are easily obtained for all parameters of interest. However, sampling from these distributions is non-trivial. Two FORTRAN programs were developed to run the Gibbs sampler for the unintegrated and the integrated likelihoods respectively. Means, medians, modes and variances were constructed from the resulting empirical posterior distributions and used for inference. Spectral density was used to construct a variance estimate for the posterior mean. Equal-tailed posterior density regions were directly calculated from the posteriors distributions. A simulation experiment was conducted to test the validity of density regions. These density regions also have approximately the proper coverage provided that the capture probability is not too small. Convergence to a stationary distribution is explored for both version of the likelihood. Often, convergence was difficult to detect, therefore a test of convergence was constructed by comparing two independent chains from both version of the Gibbs sampler. Finally, an experiment was constructed to compare these two new methods and the traditional conditional maximum likelihood estimates using data simulated from a capture-recapture experiment with 4 sampling periods. This experiment showed that there is little difference between the conditional maximum likelihood estimates and the 'true' maximum likelihood estimates when the population size is large. A second simulation experiment was conducted to determine which of the 3 estimation methods provided the 'best' estimators. This experiment was largely inconclusive as no single method routinely outperformed the others<P> Statistics
230	ROBUST METHODS FOR ESTIMATING ALLELE FREQUENCIES Huang, Shu-Pang 21 June 2001 (has links) <p>HUANG, SHU-PANG. ROBUST METHODS FOR ESTIMATING ALLELE FREQUENCIES (Advisor: Bruce S. Weir) The distribution of allele frequencies has beena major focus in population genetics. Classical approaches usingstochastic arguments depend highly on the choice of mutationmodel. Unfortunately, it is hard to justify which mutation modelis suitable for a particular sample. We propose two methods toestimate allele frequencies, especially for rare alleles, withoutassuming a mutation model. The first method achieves its goalthrough two steps. First it estimates the number of alleles in apopulation using a sample coverage method and then models rankedfrequencies for these alleles using the stretchedexponential/Weibull distribution. Simulation studies have shownthat both steps are robust to different mutation models. Thesecond method uses Bayesian approach to estimate both the numberof alleles and their frequencies simultaneously by assuming anon-informative prior distribution. The Bayesian approach is alsorobust to mutation models. Questions concerning the probability offinding a new allele, and the possible highest (or lowest)probability for a new-found allele can be answered by bothmethods. The advantages of our approaches include robustness tomutation model and ability to be easily extended to genotypic,haploid and protein structure data.<P> Statistics

Search results