31 |
Misclassification of the dependent variable in binary choice modelsGu, Yuanyuan, Economics, Australian School of Business, UNSW January 2006 (has links)
Survey data are often subject to a number of measurement errors. The measurement error associated with a multinomial variable is called a misclassification error. In this dissertation we study such errors when the outcome is binary. It is known that ignoring such misclassification errors may affect the parameter estimates, see for example Hausman, Abrevaya and Scott-Morton (1998). However, previous studies showed that robust estimation of the parameters is achievable if we take misclassification into account. There are many attempts to do so in the literature and the major problem in implementing them is to avoid poor or fragile identifiability of the misclassification probabilities. Generally we restrict these parameters by imposing prior information on them. Such prior constraints on the parameters are simple to impose within a Bayesian framework. Hence we consider a Bayesian logistic regression model that takes into account the misclassification of the dependent variable. A very convenient way to implement such a Bayesian analysis is to estimate the hierarchical model using the WinBUGS software package developed by the MRC biostatistics group, Institute of Public Health, at Cambridge University. WinGUGS allows us to estimate the posterior distributions of all the parameters using relatively little programming and once the program is written it is trivial to change the link function, for example from logit to probit. If we wish to have more control over the sampling scheme or to deal with more complex models, then we propose a data augmentation approach using the Metropolis-Hastings algorithm within a Gibbs sampling framework. The sampling scheme can be made more efficient by using a one-step Newton-Raphson algorithm to form the Metropolis-Hastings proposal. Results from empirically analyzing real data and from the simulation studies suggest that if suitable priors are specified for the misclassification parameters and the regression parameters, then logistic regression allowing for misclassification results in better estimators than the estimators that do not take misclassification into account.
|
32 |
Statistical aspects of two measurement problems : defining taxonomic richness and testing with unanchored responsesRitter, Kerry 03 April 2001 (has links)
Statisticians often focus on sampling or experimental design and data analysis while
paying less attention to how the response is measured. However, the ideas of statistics may be
applied to measurement problems with fruitful results. By examining the errors of measured
responses, we may gain insight into the limitations of current measures and develop a better
understanding of how to interpret and qualify the results. The first chapter considers the
problem of measuring taxonomic richness as an index of habitat quality and stream health. In
particular, we investigate numerical taxa richness (NTR), or the number of observed taxa in a
fixed-count, as a means to assess differences in taxonomic composition and reduce cost.
Because the number of observed taxa increases with the number of individuals counted, rare
taxa are often excluded from NTR with smaller counts. NTR measures based on different
counts effectively assess different levels of rarity, and hence target different parameters.
Determining the target parameter that NTR is "really" estimating is an important step toward
facilitating fair comparisons based on different sized samples. Our first study approximates the
parameter unbiasedly estimated by NTR and explores alternatives for estimation based on
smaller and larger counts.
The second investigation considers response error resulting from panel evaluations.
Because people function as the measurement instrument, responses are particularly susceptible
to variation not directly related to the experimental unit. As a result, observed differences may
not accurately reflect real differences in the products being measured. Chapter Two offers
several linear models to describe measurement error resulting from unanchored responses
across successive evaluations over time, which we call u-errors. We examine changes to Type I
and Type II error probabilities for standard F-tests in balanced factorial models where u-errors
are confounded with an effect under investigation. We offer a relatively simple method for
determining whether or not distributions of mean square ratios for testing fixed effects change
in the presence of u-error. In addition, the validity of the test is shown to depend both on the
level of confounding and whether not u-errors vary about a nonzero mean. / Graduation date: 2002
|
33 |
Logistic regression with misclassified covariates using auxiliary dataDong, Nathan Nguyen. January 2009 (has links)
Thesis (PhD.) -- University of Texas at Arlington, 2009.
|
34 |
Production log analysis and statistical error minimizationLi, Huitang, January 2000 (has links)
Thesis (Ph. D.)--University of Texas at Austin, 2000. / Vita. Includes bibliographical references (leaves 182-185). Available also in a digital version from Dissertation Abstracts.
|
35 |
Variance reduction and variable selection methods for Alho's logistic capture recapture model with applications to census data /Caples, Jerry Joseph, January 2000 (has links)
Thesis (Ph. D.)--University of Texas at Austin, 2000. / Vita. Includes bibliographical references (leaves 224-226). Available also in a digital version from Dissertation Abstracts.
|
36 |
The accuracy of parameter estimates and coverage probability of population values in regression models upon different treatments of systematically missing dataOthuon, Lucas Onyango A. 11 1900 (has links)
Several methods are available for the treatment of missing data. Most of the methods are
based on the assumption that data are missing completely at random (MCAR). However, data
sets that are MCAR are rare in psycho-educational research. This gives rise to the need for
investigating the performance of missing data treatments (MDTs) with non-randomly or
systematically missing data, an area that has not received much attention by researchers in the
past.
In the current simulation study, the performance of four MDTs, namely, mean
substitution (MS), pairwise deletion (PW), expectation-maximization method (EM), and
regression imputation (RS), was investigated in a linear multiple regression context. Four
investigations were conducted involving four predictors under low and high multiple R² , and nine
predictors under low and high multiple R² . In addition, each investigation was conducted under
three different sample size conditions (94, 153, and 265). The design factors were missing
pattern (2 levels), percent missing (3 levels) and non-normality (4 levels). This design gave rise
to 72 treatment conditions. The sampling was replicated one thousand times in each condition.
MDTs were evaluated based on accuracy of parameter estimates. In addition, the bias in
parameter estimates, and coverage probability of regression coefficients, were computed.
The effect of missing pattern, percent missing, and non-normality on absolute error for
R² estimate was of practical significance. In the estimation of R², EM was the most accurate under
the low R² condition, and PW was the most accurate under the high R² condition. No MDT was
consistently least biased under low R² condition. However, with nine predictors under the high
R² condition, PW was generally the least biased, with a tendency to overestimate population R².
The mean absolute error (MAE) tended to increase with increasing non-normality and increasing
percent missing. Also, the MAE in R²
estimate tended to be smaller under monotonic pattern than
under non-monotonic pattern. MDTs were most differentiated at the highest level of percent
missing (20%), and under non-monotonic missing pattern.
In the estimation of regression coefficients, RS generally outperformed the other MDTs
with respect to accuracy of regression coefficients as measured by MAE . However, EM was
competitive under the four predictors, low R² condition. MDTs were most differentiated only in
the estimation of β₁, the coefficient of the variable with no missing values. MDTs were
undifferentiated in their performance in the estimation for b₂,...,bp, p = 4 or 9, although the MAE
remained fairly the same across all the regression coefficients. The MAE increased with
increasing non-normality and percent missing, but decreased with increasing sample size. The
MAE was generally greater under non-monotonic pattern than under monotonic pattern. With
four predictors, the least bias was under RS regardless of the magnitude of population R². Under
nine predictors, the least bias was under PW regardless of population R².
The results for coverage probabilities were generally similar to those under estimation of
regression coefficients, with coverage probabilities closest to nominal alpha under RS. As
expected, coverage probabilities decreased with increasing non-normality for each MDT, with
values being closest to nominal value for normal data. MDTs were most differentiated with
respect to coverage probabilities under non-monotonic pattern than under monotonic pattern.
Important implications of the results to researchers are numerous. First, the choice of
MDT was found to depend on the magnitude of population R², number of predictors, as well as
on the parameter estimate of interest. With the estimation of R² as the goal of analysis, use of EM
is recommended if the anticipated R² is low (about .2). However, if the anticipated R² is high
(about .6), use of PW is recommended. With the estimation of regression coefficients as the goal
of analysis, the choice of MDT was found to be most crucial for the variable with no missing
data. The RS method is most recommended with respect to estimation accuracy of regression
coefficients, although greater bias was recorded under RS than under PW or MS when the
number of predictors was large (i.e., nine predictors). Second, the choice of MDT seems to be of
little concern if the proportion of missing data is 10 percent, and also if the missing pattern is
monotonic rather than non-monotonic. Third, the proportion of missing data seems to have less
impact on the accuracy of parameter estimates under monotonic missing pattern than under non-monotonic
missing pattern. Fourth, it is recommended for researchers that in the control of Type
I error rates under low R² condition, the EM method should be used as it produced coverage
probability of regression coefficients closest to nominal value at .05 level. However, in the
control of Type I error rates under high R² condition, the RS method is recommended.
Considering that simulated data were used in the present study, it is suggested that future research
should attempt to validate the findings of the present study using real field data. Also, a future
investigator could modify the number of predictors as well as the confidence interval in the
calculation of coverage probabilities to extend generalization of results.
|
37 |
The performance of three fitting criteria for multidimensional scaling /McGlynn, Marion January 1990 (has links)
A Monte Carlo study was performed to investigate the ability of MSCAL to recover by Euclidean metric multi-dimensional scaling (MDS) the true structure for dissimilarity data with different underlying error distributions. Error models for three typical error distributions: normal, lognormal, and squared normal are implemented in MSCAL through data transformations incorporated into the criterion function. Recovery of the true configuration and true distances for (i) single replication data with low error levels and (ii) matrix conditional data with high error levels was studied as a function of the type of error distribution, fitting criterion, and dimensionality. Results indicated that if the data conform to the error distribution hypotheses, then the corresponding fitting criteria provide improved recovery, but only for data with low error levels when the true dimensionality is known.
|
38 |
Truncated multiplications and divisions for the negative two's complement number systemPark, Hyuk, January 1900 (has links)
Thesis (Ph. D.)--University of Texas at Austin, 2007. / Vita. Includes bibliographical references.
|
39 |
Assessment of record linkage and measurement error in cohort mortality studies /Mallick, Ranjeeta, January 1900 (has links)
Thesis (Ph.D.) - Carleton University, 2005. / Includes bibliographical references (p. 128-139). Also available in electronic format on the Internet.
|
40 |
A rigorous approach to comprehensive performance analysis of state-of-the-art airborne mobile mapping systemsMay, Nora Csanyi, January 2008 (has links)
Thesis (Ph. D.)--Ohio State University, 2008. / Title from first page of PDF file. Includes bibliographical references (p. 172-180).
|
Page generated in 0.3936 seconds