Spelling suggestions: "subject:"predictive men matching""
1 |
Determining the Size of a Galaxy's Globular Cluster Population through Imputation of Incomplete Data with Measurement UncertaintyRichard, Michael R. 11 1900 (has links)
A globular cluster is a collection of stars that orbits the center of its galaxy as a
single satellite. Understanding what influences the formations of these clusters provides understanding of galaxy structure and insight into their early development. We
continue the work of Harris et al. (2013), who identified a set of predictors that accurately determined the number of clusters Ngc, through analysis of an incomplete dataset.
We aimed to improve upon these results through imputation of the missing data. A
small amount of precision was gained for the slope of Ngc~ R_e*sigma_ e, while the intercept
suffered a small loss of precision. Estimates of intrinsic variance also increased with
the addition of imputed data.
We also found galaxy morphological type to be a significant predictor of Ngc in
a model with R_e*sigma_ e. Although it increased precision of the slope and reduced the
residual variance, its overall contribution was negligible. / Thesis / Master of Science (MSc)
|
2 |
The Single Imputation Technique in the Gaussian Mixture Model FrameworkAisyah, Binti M.J. January 2018 (has links)
Missing data is a common issue in data analysis. Numerous techniques have
been proposed to deal with the missing data problem. Imputation is the most
popular strategy for handling the missing data. Imputation for data analysis is
the process to replace the missing values with any plausible values. Two most
frequent imputation techniques cited in literature are the single imputation and
the multiple imputation.
The multiple imputation, also known as the golden imputation technique, has
been proposed by Rubin in 1987 to address the missing data. However, the
inconsistency is the major problem in the multiple imputation technique. The
single imputation is less popular in missing data research due to bias and less
variability issues. One of the solutions to improve the single imputation
technique in the basic regression model: the main motivation is that, the
residual is added to improve the bias and variability. The residual is drawn by
normal distribution assumption with a mean of 0, and the variance is equal to
the residual variance. Although new methods in the single imputation
technique, such as stochastic regression model, and hot deck imputation,
might be able to improve the variability and bias issues, the single imputation
techniques suffer with the uncertainty that may underestimate the R-square or
standard error in the analysis results.
The research reported in this thesis provides two imputation solutions for the
single imputation technique. In the first imputation procedure, the wild
bootstrap is proposed to improve the uncertainty for the residual variance in
the regression model. In the second solution, the predictive mean matching
(PMM) is enhanced, where the regression model is taking the main role to generate the recipient values while the observations in the donors are taken
from the observed values. Then the missing values are imputed by randomly
drawing one of the observations in the donor pool. The size of the donor pool
is significant to determine the quality of the imputed values. The fixed size of
donor is used to be employed in many existing research works with PMM
imputation technique, but might not be appropriate in certain circumstance
such as when the data distribution has high density region. Instead of using
the fixed size of donor pool, the proposed method applies the radius-based
solution to determine the size of donor pool. Both proposed imputation
procedures will be combined with the Gaussian mixture model framework to
preserve the original data distribution.
The results reported in the thesis from the experiments on benchmark and
artificial data sets confirm improvement for further data analysis. The proposed
approaches are therefore worthwhile to be considered for further investigation
and experiments.
|
3 |
Performance Comparison of Imputation Algorithms on Missing at Random DataAddo, Evans Dapaa 01 May 2018 (has links)
Missing data continues to be an issue not only the field of statistics but in any field, that deals with data. This is due to the fact that almost all the widely accepted and standard statistical software and methods assume complete data for all the variables included in the analysis. As a result, in most studies, statistical power is weakened and parameter estimates are biased, leading to weak conclusions and generalizations.
Many studies have established that multiple imputation methods are effective ways of handling missing data. This paper examines three different imputation methods (predictive mean matching, Bayesian linear regression and linear regression, non Bayesian) in the MICE package in the statistical software, R, to ascertain which of the three imputation methods imputes data that yields parameter estimates closest to the parameter estimates of a complete data given different percentages of missingness. In comparing the parameter estimates of the complete data and the imputed data, the parameter estimates in each model were evaluated and compared. The paper extends the analysis by generating a pseudo data of the original data to establish how the imputation methods perform under varying conditions.
|
4 |
Data analysis and multiple imputation for two-level nested designsBailey, Brittney E. 25 October 2018 (has links)
No description available.
|
Page generated in 0.1077 seconds