Spelling suggestions: "subject:"missingdata"" "subject:"usingdata""
1 |
Praleistų duomenų įrašymo metodai baigtinių populiacijų statistikoje / Missing data imputation in finite population statisticsUtovkaitė, Jurgita 04 March 2009 (has links)
Netgi tobuliausiai suplanuotame tyrime atsiranda įvairių rūšių klaidų, dėl kurių gali būti gauti nepatikimi ar nepakankamai tikslūs tyrimo rezultatai, taigi labai svarbu kiek įmanoma labiau sumažinti tų klaidų įtaką tyrimo rezultatams – sumų, vidurkių, santykių įvertiniams. Vienas iš galimų statistinio tyrimo klaidų tipų yra klaidos dėl neatsakymo į apklausą. Jos atsiranda tuomet, kai atsakytojas neatsako į vieną ar kelis klausimyno klausimus. Neatsakymai tyrimuose pasitaiko dėl įvairių priežasčių. Jie iššaukia standartinių įvertinių, kuriuose neatsižvelgiama į neatsakymus, nuokrypį nuo tikrųjų mus dominančių reikšmių, o taip pat šių įvertinių dispersijos padidėjimą. Dabartinėje praktikoje neatsakymai į apklausą nagrinėjami dviem požiūriais: visų pirma bandoma išvengti arba sumažinti neatsakymų lygį. Yra nemažai literatūros ir metodologinės medžiagos tyrinėjančios neatsakymų priežastis bei pateikiančios rekomendacijas kaip sumažinti neatsakymų lygį, tačiau, kai tyrime jau yra neatsakymų, dominančius įvertinius reikia sukonstruoti taip, kad tyrimo rezultatai būtų kuo tikslesni. Neatsakymų sukeliamiems tyrimo rezultatų nuokrypiams sumažinti naudojami įvairūs būdai. Vienas tokių metodų yra praleistų reikšmių įrašymas. Įrašymas – tai trūkstamų duomenų užpildymo būdas, kuris yra labai naudingas analizuojant nepilnas duomenų sekas. Jis išsprendžia duomenų trūkumo problemą duomenų analizės pradžioje. Praleistų reikšmių įrašymo metodika šiuo metu sparčiai vystosi, galima rasti... [toliau žr. visą tekstą] / Nonresponse has been a matter of concern for several decades in survey theory and practice. The problem can be viewed from two different angles: the prevention or avoidance of nonresponse before it occurs, and the special estimation techniques when nonresponse has occurred. The objective of this work is to describe main methods of estimation when nonresponse occurs. Special attention is drawn on one nonresponse estimation method – imputation. Imputation is the procedure when missing values for one or more study variables are “filled in” with substitutes constructed according to some rules, or observed values for elements other than nonrespondents. In this work imputation methods based on some of the more commonly used statistical rules are considered. Some of them are tested on data set having the same distribution as the data of the real survey taken in Statistics Lithuania. The imputation methods are compared with each other and the best imputation method for this data set is picked up. Special attention is paid on regression imputation.
|
2 |
Missing Data Treatments at the Second Level of Hierarchical Linear ModelsSt. Clair, Suzanne W. 08 1900 (has links)
The current study evaluated the performance of traditional versus modern MDTs in the estimation of fixed-effects and variance components for data missing at the second level of an hierarchical linear model (HLM) model across 24 different study conditions. Variables manipulated in the analysis included, (a) number of Level-2 variables with missing data, (b) percentage of missing data, and (c) Level-2 sample size. Listwise deletion outperformed all other methods across all study conditions in the estimation of both fixed-effects and variance components. The model-based procedures evaluated, EM and MI, outperformed the other traditional MDTs, mean and group mean substitution, in the estimation of the variance components, outperforming mean substitution in the estimation of the fixed-effects as well. Group mean substitution performed well in the estimation of the fixed-effects, but poorly in the estimation of the variance components. Data in the current study were modeled as missing completely at random (MCAR). Further research is suggested to compare the performance of model-based versus traditional MDTs, specifically listwise deletion, when data are missing at random (MAR), a condition that is more likely to occur in practical research settings.
|
3 |
Influence of Correlation and Missing Data on Sample Size Determination in Mixed ModelsChen, Yanran 26 July 2013 (has links)
No description available.
|
4 |
Bayesian nonparametric analysis of longitudinal data with non-ignorable non-monotone missingnessCao, Yu 01 January 2019 (has links)
In longitudinal studies, outcomes are measured repeatedly over time, but in reality clinical studies are full of missing data points of monotone and non-monotone nature. Often this missingness is related to the unobserved data so that it is non-ignorable. In such context, pattern-mixture model (PMM) is one popular tool to analyze the joint distribution of outcome and missingness patterns. Then the unobserved outcomes are imputed using the distribution of observed outcomes, conditioned on missing patterns. However, the existing methods suffer from model identification issues if data is sparse in specific missing patterns, which is very likely to happen with a small sample size or a large number of repetitions. We extend the existing methods using latent class analysis (LCA) and a shared-parameter PMM. The LCA groups patterns of missingness with similar features and the shared-parameter PMM allows a subset of parameters to be different among latent classes when fitting a model, thus restoring model identifiability. A novel imputation method is also developed using the distribution of observed data conditioned on latent classes. We develop this model for continuous response data and extend it to handle ordinal rating scale data. Our model performs better than existing methods for data with small sample size. The method is applied to two datasets from a phase II clinical trial that studies the quality of life for patients with prostate cancer receiving radiation therapy, and another to study the relationship between the perceived neighborhood condition in adolescence and the drinking habit in adulthood.
|
5 |
Analysis of routinely collected repeated patient outcomesHolm Hansen, Christian January 2014 (has links)
Clinical practice should be based on the best available evidence. Ideally such evidence is obtained through rigorously conducted, purpose-designed clinical studies such as randomised controlled trials and prospective cohort studies. However gathering information in this way requires a massive effort, can be prohibitively expensive, is time consuming, and may not always be ethical or practicable. When answers are needed urgently and purpose-designed prospective studies are not feasible, retrospective healthcare data may offer the best evidence there is. But can we rely on analysis with such data to give us meaningful answers? The current thesis studies this question through analysis with repeated psychological symptom screening data that were routinely collected from over 20,000 outpatients who attended selected oncology clinics in Scotland. Linked to patients’ oncology records these data offer a unique opportunity to study the progress of distress symptoms on an unprecedented scale in this population. However, the limitations to such routinely collected observational healthcare data are many. We approach the analysis within a missing data context and develop a Bayesian model in WinBUGS to estimate the posterior predictive distribution for the incomplete longitudinal response and covariate data under both Missing At Random and Missing Not At Random mechanisms and use this model to generate multiply imputed datasets for further frequentist analysis. Additional to the routinely collected screening data we also present a purpose-designed, prospective cohort study of distress symptoms in the same cancer outpatient population. This study collected distress outcome scores from enrolled patients at regular intervals and with very little missing data. Consequently it contained many of the features that were lacking in the routinely collected screening data and provided a useful contrast, offering an insight into how the screening data might have been were it not for the limitations. We evaluate the extent to which it was possible to reproduce the clinical study results with the analysis of the observational screening data. Lastly, using the modelling strategy previously developed we analyse the abundant screening data to estimate the prevalence of depression in a cancer outpatient population and the associations with demographic and clinical characteristics, thereby addressing important clinical research questions that have not been adequately studied elsewhere. The thesis concludes that analysis with observational healthcare data can potentially be advanced considerably with the use of flexible and innovative modelling techniques now made practicable with modern computing power.
|
6 |
Handling missing data in RCTs; a review of the top medical journalsBell, Melanie, Fiero, Mallorie, Horton, Nicholas J, Hsu, Chiu-Hsieh January 2014 (has links)
UA Open Access Publishing Fund / Background
Missing outcome data is a threat to the validity of treatment effect estimates in randomized controlled trials. We aimed to evaluate the extent, handling, and sensitivity analysis of missing data and intention-to-treat (ITT) analysis of randomized controlled trials (RCTs) in top tier medical journals, and compare our findings with previous reviews related to missing data and ITT in RCTs.
Methods
Review of RCTs published between July and December 2013 in the BMJ, JAMA, Lancet, and New England Journal of Medicine, excluding cluster randomized trials and trials whose primary outcome was survival.
Results
Of the 77 identified eligible articles, 73 (95%) reported some missing outcome data. The median percentage of participants with a missing outcome was 9% (range 0 – 70%). The most commonly used method to handle missing data in the primary analysis was complete case analysis (33, 45%), while 20 (27%) performed simple imputation, 15 (19%) used model based methods, and 6 (8%) used multiple imputation. 27 (35%) trials with missing data reported a sensitivity analysis. However, most did not alter the assumptions of missing data from the primary analysis. Reports of ITT or modified ITT were found in 52 (85%) trials, with 21 (40%) of them including all randomized participants. A comparison to a review of trials reported in 2001 showed that missing data rates and approaches are similar, but the use of the term ITT has increased, as has the report of sensitivity analysis.
Conclusions
Missing outcome data continues to be a common problem in RCTs. Definitions of the ITT approach remain inconsistent across trials. A large gap is apparent between statistical methods research related to missing data and use of these methods in application settings, including RCTs in top medical journals.
|
7 |
Missing Data in the Relational ModelMorrissett, Marion 25 April 2013 (has links)
This research provides improved support for missing data in the relational model and relational database systems. There is a need for a systematic method to represent and interpret missing data values in the relational model. A system that processes missing data needs to enable making reasonable decisions when some data values are unknown. The user must be able to understand query results with respect to these decisions. While a number of approaches have been suggested, none have been completely implemented in a relational database system. This research describes a missing data model that works within the relational model, is implemented in MySQL, and was validated by a user feasibility study.
|
8 |
An Investigation of Methods for Missing Data in Hierarchical Models for Discrete DataAhmed, Muhamad Rashid January 2011 (has links)
Hierarchical models are applicable to modeling data from complex
surveys or longitudinal data when a clustered or multistage sample
design is employed. The focus of this thesis is to investigate
inference for discrete hierarchical models in the presence of
missing data. This thesis is divided into two parts: in the first
part, methods are developed to analyze the discrete and ordinal
response data from hierarchical longitudinal studies. Several
approximation methods have been developed to estimate the parameters
for the fixed and random effects in the context of generalized
linear models. The thesis focuses on two likelihood-based
estimation procedures, the pseudo likelihood (PL) method and the adaptive
Gaussian quadrature (AGQ) method.
The simulation results suggest that AGQ
is preferable to PL when the
goal is to estimate the variance of the random intercept in a
complex hierarchical model. AGQ provides smaller biases
for the estimate of the variance of the random intercept.
Furthermore, it permits greater
flexibility in accommodating user-defined likelihood functions.
In the second part, simulated data are used to develop a method for
modeling longitudinal binary data when non-response depends on
unobserved responses. This simulation study modeled three-level
discrete hierarchical data with 30% and 40% missing data
using a missing not at random (MNAR) missing-data mechanism. It
focused on a monotone missing data-pattern. The imputation methods
used in this thesis are: complete case analysis (CCA), last
observation carried forward (LOCF), available case missing value
(ACMVPM) restriction, complete case missing value (CCMVPM)
restriction, neighboring case missing value (NCMVPM) restriction,
selection model with predictive mean matching method (SMPM), and
Bayesian pattern mixture model. All three restriction methods and
the selection model used the predictive mean matching method to
impute missing data. Multiple imputation is used to impute the
missing values. These m imputed values for each missing data
produce m complete datasets. Each dataset is analyzed and the
parameters are estimated. The results from the m analyses are then
combined using the method of Rubin(1987), and inferences are
made from these results. Our results suggest that restriction
methods provide results that are superior to those of other methods.
The selection model provides smaller biases than the LOCF methods
but as the proportion of missing data increases the selection model
is not better than LOCF. Among the three restriction methods the
ACMVPM method performs best. The proposed method provides an
alternative to standard selection and pattern-mixture modeling
frameworks when data are not missing at random. This method is
applied to data from the third Waterloo Smoking Project, a
seven-year smoking prevention study having substantial non-response
due to loss-to-follow-up.
|
9 |
The Use of Kalman Filter in Handling Imprecise and Missing Data for Mobile Group MiningHung, Tzu-yen 01 August 2006 (has links)
As the advances of communication techniques, some services related to location information came into existence successively. On such application is on finding out the mobile groups that exhibit spatial and temporal proximities called mobile group mining. Although there exists positioning devices that are capable of achieving a high accuracy with low measurement error. Many consumer-grades, inexpensive positioning devices that incurred various extent of higher measurement error are much more popular. In addition, some natural factors such as temperature, humidity, and pressure may have influences on the precision of position measurement. Worse, moving objects may sometimes become untraceable voluntarily or involuntarily. In this thesis, we extend the previous work on mobile group mining and adopt Kalman filter to correct the noisy data and predict the missing data. Several methods based on Kalman filter that correct/predict either correction data or pair-wise distance data. These methods have been evaluated using synthetic data generated using IBM City Simulator. We identify the operating regions in which each method has the best performance.
|
10 |
A comparison of procedures for handling missing school identifiers with the MMREM and HLMSmith, Lindsey Janae 10 July 2012 (has links)
This simulation study was designed to assess the impact of three ad hoc procedures for handling missing level two (here, school) identifiers in multilevel modeling. A multiple membership data structure was generated and both conventional hierarchical linear modeling (HLM) and multiple membership random effects modeling (MMREM) were employed. HLM models purely hierarchical data structures while MMREM appropriately models multiple membership data structures. Two of the ad hoc procedures investigated involved removing different subsamples of students from the analysis (HLM-Delete and MMREM-Delete) while the other procedure retained all subjects and involved creating a pseudo-identifier for the missing level two identifier (MMREM-Unique). Relative parameter and standard error (SE) bias were calculated for each parameter estimated to assess parameter recovery. Across the conditions and parameters investigated, each procedure had some level of substantial bias. MMREM-Unique and MMREM-Delete resulted in the least amount of relative parameter bias while HLM-Delete resulted in the least amount of relative SE bias. Results and implications for applied researchers are discussed. / text
|
Page generated in 0.0638 seconds