Global ETD Search

1	Praleistų duomenų įrašymo metodai baigtinių populiacijų statistikoje / Missing data imputation in finite population statistics Utovkaitė, Jurgita 04 March 2009 (has links) Netgi tobuliausiai suplanuotame tyrime atsiranda įvairių rūšių klaidų, dėl kurių gali būti gauti nepatikimi ar nepakankamai tikslūs tyrimo rezultatai, taigi labai svarbu kiek įmanoma labiau sumažinti tų klaidų įtaką tyrimo rezultatams – sumų, vidurkių, santykių įvertiniams. Vienas iš galimų statistinio tyrimo klaidų tipų yra klaidos dėl neatsakymo į apklausą. Jos atsiranda tuomet, kai atsakytojas neatsako į vieną ar kelis klausimyno klausimus. Neatsakymai tyrimuose pasitaiko dėl įvairių priežasčių. Jie iššaukia standartinių įvertinių, kuriuose neatsižvelgiama į neatsakymus, nuokrypį nuo tikrųjų mus dominančių reikšmių, o taip pat šių įvertinių dispersijos padidėjimą. Dabartinėje praktikoje neatsakymai į apklausą nagrinėjami dviem požiūriais: visų pirma bandoma išvengti arba sumažinti neatsakymų lygį. Yra nemažai literatūros ir metodologinės medžiagos tyrinėjančios neatsakymų priežastis bei pateikiančios rekomendacijas kaip sumažinti neatsakymų lygį, tačiau, kai tyrime jau yra neatsakymų, dominančius įvertinius reikia sukonstruoti taip, kad tyrimo rezultatai būtų kuo tikslesni. Neatsakymų sukeliamiems tyrimo rezultatų nuokrypiams sumažinti naudojami įvairūs būdai. Vienas tokių metodų yra praleistų reikšmių įrašymas. Įrašymas – tai trūkstamų duomenų užpildymo būdas, kuris yra labai naudingas analizuojant nepilnas duomenų sekas. Jis išsprendžia duomenų trūkumo problemą duomenų analizės pradžioje. Praleistų reikšmių įrašymo metodika šiuo metu sparčiai vystosi, galima rasti... [toliau žr. visą tekstą] / Nonresponse has been a matter of concern for several decades in survey theory and practice. The problem can be viewed from two different angles: the prevention or avoidance of nonresponse before it occurs, and the special estimation techniques when nonresponse has occurred. The objective of this work is to describe main methods of estimation when nonresponse occurs. Special attention is drawn on one nonresponse estimation method – imputation. Imputation is the procedure when missing values for one or more study variables are “filled in” with substitutes constructed according to some rules, or observed values for elements other than nonrespondents. In this work imputation methods based on some of the more commonly used statistical rules are considered. Some of them are tested on data set having the same distribution as the data of the real survey taken in Statistics Lithuania. The imputation methods are compared with each other and the best imputation method for this data set is picked up. Special attention is paid on regression imputation. Įrašymas Imputation Missing data
2	Missing Data Treatments at the Second Level of Hierarchical Linear Models St. Clair, Suzanne W. 08 1900 (has links) The current study evaluated the performance of traditional versus modern MDTs in the estimation of fixed-effects and variance components for data missing at the second level of an hierarchical linear model (HLM) model across 24 different study conditions. Variables manipulated in the analysis included, (a) number of Level-2 variables with missing data, (b) percentage of missing data, and (c) Level-2 sample size. Listwise deletion outperformed all other methods across all study conditions in the estimation of both fixed-effects and variance components. The model-based procedures evaluated, EM and MI, outperformed the other traditional MDTs, mean and group mean substitution, in the estimation of the variance components, outperforming mean substitution in the estimation of the fixed-effects as well. Group mean substitution performed well in the estimation of the fixed-effects, but poorly in the estimation of the variance components. Data in the current study were modeled as missing completely at random (MCAR). Further research is suggested to compare the performance of model-based versus traditional MDTs, specifically listwise deletion, when data are missing at random (MAR), a condition that is more likely to occur in practical research settings. hierarchical linear models missing data treatments missing data
3	Influence of Correlation and Missing Data on Sample Size Determination in Mixed Models Chen, Yanran 26 July 2013 (has links) No description available. Statistics Missing Data Sample Size
4	Bayesian nonparametric analysis of longitudinal data with non-ignorable non-monotone missingness Cao, Yu 01 January 2019 (has links) In longitudinal studies, outcomes are measured repeatedly over time, but in reality clinical studies are full of missing data points of monotone and non-monotone nature. Often this missingness is related to the unobserved data so that it is non-ignorable. In such context, pattern-mixture model (PMM) is one popular tool to analyze the joint distribution of outcome and missingness patterns. Then the unobserved outcomes are imputed using the distribution of observed outcomes, conditioned on missing patterns. However, the existing methods suffer from model identification issues if data is sparse in specific missing patterns, which is very likely to happen with a small sample size or a large number of repetitions. We extend the existing methods using latent class analysis (LCA) and a shared-parameter PMM. The LCA groups patterns of missingness with similar features and the shared-parameter PMM allows a subset of parameters to be different among latent classes when fitting a model, thus restoring model identifiability. A novel imputation method is also developed using the distribution of observed data conditioned on latent classes. We develop this model for continuous response data and extend it to handle ordinal rating scale data. Our model performs better than existing methods for data with small sample size. The method is applied to two datasets from a phase II clinical trial that studies the quality of life for patients with prostate cancer receiving radiation therapy, and another to study the relationship between the perceived neighborhood condition in adolescence and the drinking habit in adulthood. Bayesian nonparametric analysis longitudinal data missing data non-ignorable missing data non-monotone missing data Biostatistics
5	Analysis of routinely collected repeated patient outcomes Holm Hansen, Christian January 2014 (has links) Clinical practice should be based on the best available evidence. Ideally such evidence is obtained through rigorously conducted, purpose-designed clinical studies such as randomised controlled trials and prospective cohort studies. However gathering information in this way requires a massive effort, can be prohibitively expensive, is time consuming, and may not always be ethical or practicable. When answers are needed urgently and purpose-designed prospective studies are not feasible, retrospective healthcare data may offer the best evidence there is. But can we rely on analysis with such data to give us meaningful answers? The current thesis studies this question through analysis with repeated psychological symptom screening data that were routinely collected from over 20,000 outpatients who attended selected oncology clinics in Scotland. Linked to patients’ oncology records these data offer a unique opportunity to study the progress of distress symptoms on an unprecedented scale in this population. However, the limitations to such routinely collected observational healthcare data are many. We approach the analysis within a missing data context and develop a Bayesian model in WinBUGS to estimate the posterior predictive distribution for the incomplete longitudinal response and covariate data under both Missing At Random and Missing Not At Random mechanisms and use this model to generate multiply imputed datasets for further frequentist analysis. Additional to the routinely collected screening data we also present a purpose-designed, prospective cohort study of distress symptoms in the same cancer outpatient population. This study collected distress outcome scores from enrolled patients at regular intervals and with very little missing data. Consequently it contained many of the features that were lacking in the routinely collected screening data and provided a useful contrast, offering an insight into how the screening data might have been were it not for the limitations. We evaluate the extent to which it was possible to reproduce the clinical study results with the analysis of the observational screening data. Lastly, using the modelling strategy previously developed we analyse the abundant screening data to estimate the prevalence of depression in a cancer outpatient population and the associations with demographic and clinical characteristics, thereby addressing important clinical research questions that have not been adequately studied elsewhere. The thesis concludes that analysis with observational healthcare data can potentially be advanced considerably with the use of flexible and innovative modelling techniques now made practicable with modern computing power. 610.72
6	Missing Data in the Relational Model Morrissett, Marion 25 April 2013 (has links) This research provides improved support for missing data in the relational model and relational database systems. There is a need for a systematic method to represent and interpret missing data values in the relational model. A system that processes missing data needs to enable making reasonable decisions when some data values are unknown. The user must be able to understand query results with respect to these decisions. While a number of approaches have been suggested, none have been completely implemented in a relational database system. This research describes a missing data model that works within the relational model, is implemented in MySQL, and was validated by a user feasibility study. relational database missing data incomplete information Engineering
7	A comparison of procedures for handling missing school identifiers with the MMREM and HLM Smith, Lindsey Janae 10 July 2012 (has links) This simulation study was designed to assess the impact of three ad hoc procedures for handling missing level two (here, school) identifiers in multilevel modeling. A multiple membership data structure was generated and both conventional hierarchical linear modeling (HLM) and multiple membership random effects modeling (MMREM) were employed. HLM models purely hierarchical data structures while MMREM appropriately models multiple membership data structures. Two of the ad hoc procedures investigated involved removing different subsamples of students from the analysis (HLM-Delete and MMREM-Delete) while the other procedure retained all subjects and involved creating a pseudo-identifier for the missing level two identifier (MMREM-Unique). Relative parameter and standard error (SE) bias were calculated for each parameter estimated to assess parameter recovery. Across the conditions and parameters investigated, each procedure had some level of substantial bias. MMREM-Unique and MMREM-Delete resulted in the least amount of relative parameter bias while HLM-Delete resulted in the least amount of relative SE bias. Results and implications for applied researchers are discussed. / text Multilevel modeling Multiple membership Missing data
8	The handling, analysis and reporting of missing data in patient reported outcome measures for randomised controlled trials Rombach, Ines January 2016 (has links) Missing data is a potential source of bias in the results of randomised controlled trials (RCTs), which can have a negative impact on guidance derived from them, and ultimately patient care. This thesis aims to improve the understanding, handling, analysis and reporting of missing data in patient reported outcome measures (PROMs) for RCTs. A review of the literature provided evidence of discrepancies between recommended methodology and current practice in the handling and reporting of missing data. Particularly, missed opportunities to minimise missing data, the use of inappropriate analytical methods and lack of sensitivity analyses were noted. Missing data patterns were examined and found to vary between PROMs as well as across RCTs. Separate analyses illustrated difficulties in predicting missing data, resulting in uncertainty about assumed underlying missing data mechanisms. Simulation work was used to assess the comparative performance of statistical approaches for handling missing available in standard statistical software. Multiple imputation (MI) at either the item, subscale or composite score level was considered for missing PROMs data at a single follow-up time point. The choice of an MI approach depended on a multitude of factors, with MI at the item level being more beneficial than its alternatives for high proportions of item missingness. The approaches performed similarly for high proportions of unit-nonresponse; however, convergence issues were observed for MI at the item level. Maximum likelihood (ML), MI and inverse probability weighting (IPW) were evaluated for handling missing longitudinal PROMs data. MI was less biased than ML when additional post-randomisation data were available, while IPW introduced more bias compared to both ML and MI. A case study was used to explore approaches to sensitivity analyses to assess the impact of missing data. It was found that trial results could be susceptible to varying assumptions about missing data, and the importance of interpreting the results in this context was reiterated. This thesis provides researchers with guidance for the handling and reporting of missing PROMs data in order to decrease bias arising from missing data in RCTs.
9	Applying missing data methods to routine data using the example of a population-based register of patients with diabetes Read, Stephanie Helen January 2015 (has links) Background: Routinely-collected data offer great potential for epidemiological research and could be used to make randomised controlled trials (RCTs) more efficient. The use of routine data for research has been limited by concerns surrounding data quality, particularly data completeness. To fully exploit these information-rich data sources it is necessary to identify approaches capable of overcoming high proportions of missing data. Using a 2008 extract of the Scottish Care Information – Diabetes Collaboration (SCIDC) database, a population-based register of people with a diagnosis of diabetes in Scotland, I compared the findings of several methods for handling missing data in a retrospective cohort study investigating the association between body mass index (BMI) and all-cause mortality in patients with type 2 diabetes. Methods: Discussions with clinicians and logistic regression analyses were used to determine the likely mechanisms of missingness and the relative appropriateness of a selection of missing data methods, such as multiple imputation. Sequentially more complicated imputation approaches were used to handle missing data. Cox proportional hazard model coefficients for the association between BMI and all-cause mortality were compared for each missing data method. Age-standardised mortality rates by categories of BMI at around the time of diagnosis were also presented. Results: There were 66,472 patients diagnosed with type 2 DM between 2004 and 2008. Of these patients, 21% of patients did not have a recording of BMI at time of diagnosis. Amongst patients with complete BMI data, there were 5,491 deaths during 296,584 person years of follow-up. Amongst patients with incomplete data, there were 2,090 deaths during 79,067 person-years of follow-up. Analyses indicated that the primary mechanism of missingness was missing at random, conditional on patient year of diagnosis and vital status. In particular, patients with missing data had considerably worse survival than patients without missing data. Regardless of the method for handling the missing data, a U-shaped relationship between BMI and mortality was observed. Compared to complete case analysis, the association between BMI and alliii cause mortality was weaker using multiple imputation approaches with estimates moving towards the null. Closest observation imputation had the smallest effect on estimates compared to complete case analysis. Risk of mortality was consistently highest in the less than 25kg/m² BMI group. For example, estimates obtained using multiple imputation using chained equations indicated that patients with a BMI below 25kg/m² had a 38% higher risk of mortality than patients in the 25 to less than 30kg/m² BMI category. Conclusions: Alternative methods to complete case analysis can be computationally intensive with many important practical considerations. However, it remains valuable to explore the robustness of estimates to departures from the assumptions made by complete case analysis. The use of these methods can preserve the sample size and therefore may be useful in developing risk prediction scores. Mortality was lowest amongst overweight or obese patients relative to normal weight. Further work is required to identify optimal approaches to weight management amongst patients with diabetes. 616.4
10	Study on a Hierarchy Model Che, Suisui 23 March 2012 (has links) The statistical inferences about the parameters of Binomial-Poisson hierarchy model are discussed. Based on the estimators of paired observations we consider the other two cases with extra observations on both the first and second layer of the model. The MLEs of lambda and p are derived and it is also proved the MLE lambda is also the UMVUE of lambda. By using multivariate central limit theory and large sample theory, both the estimators based on extra observations on the first and second layer are obtained respectively. The performances of the estimators are compared numerically based on extensive Monte Carlo simulation. Simulation studies indicate that the performance of the estimators is more efficient than those only based on paired observations. Inference about the confidence interval for p is presented for both cases. The efficiency of the estimators is compared with condition given that same number of extra observations is provided. Binomial-Poisson distribution unpaired data missing data

Search results