Global ETD Search

81	Métodos de imputação de dados aplicados na área da saúde Nunes, Luciana Neves January 2007 (has links) Em pesquisas da área da saúde é muito comum que o pesquisador defronte-se com o problema de dados faltantes. Nessa situação, é freqüente que a decisão do pesquisador seja desconsiderar os sujeitos que tenham não-resposta em alguma ou algumas das variáveis, pois muitas das técnicas estatísticas foram desenvolvidas para analisar dados completos. Entretanto, essa exclusão de sujeitos pode gerar inferências que não são válidas, principalmente se os indivíduos que permanecem na análise são diferentes daqueles que foram excluídos. Nas duas últimas décadas, métodos de imputação de dados foram desenvolvidos com a intenção de se encontrar solução para esse problema. Esses métodos usam como base a idéia de preencher os dados faltantes com valores plausíveis. O método mais complexo de imputação é a chamada imputação múltipla. Essa tese tem por objetivo divulgar o método de imputação múltipla e através de dois artigos procura atingir esse objetivo. O primeiro artigo descreve duas técnicas de imputação múltipla e as aplica a um conjunto de dados reais. O segundo artigo faz a comparação do método de imputação múltipla com duas técnicas de imputação única através de uma aplicação a um modelo de risco para mortalidade cirúrgica. Para as aplicações foram usados dados secundários já utilizados por Klück (2004). / Missing data in health research is a very common problem. The most direct way of dealing with missing data is to exclude observations with missing data, probably because the traditional statistical methods have been developed for complete data sets. However, this decision may give biased results, mainly if the subjects considered in the analysis are different of those who have been excluded. In the last two decades, imputation methods were developed to solve this problem. The idea of the imputation is to fill in the missing data with reasonable values. The multiple imputation is the most complex method. The objective of this dissertation is to divulge the multiple imputation method through two papers. The first one describes two different types of multiple imputation and it shows an application to real data. The second paper shows a comparison among the multiple imputation and two single imputations applied to a risk model for surgical mortality. The used data sets were secondary data used by Klück (2004). Interpretacao estatística de dados Epidemiologia Estatísticas de saúde Epidemiologia : Estatistica Imputação múltipla Imputation methods Multiple imputation Missing data Nonresponse
82	Métodos de imputação de dados aplicados na área da saúde Nunes, Luciana Neves January 2007 (has links) Em pesquisas da área da saúde é muito comum que o pesquisador defronte-se com o problema de dados faltantes. Nessa situação, é freqüente que a decisão do pesquisador seja desconsiderar os sujeitos que tenham não-resposta em alguma ou algumas das variáveis, pois muitas das técnicas estatísticas foram desenvolvidas para analisar dados completos. Entretanto, essa exclusão de sujeitos pode gerar inferências que não são válidas, principalmente se os indivíduos que permanecem na análise são diferentes daqueles que foram excluídos. Nas duas últimas décadas, métodos de imputação de dados foram desenvolvidos com a intenção de se encontrar solução para esse problema. Esses métodos usam como base a idéia de preencher os dados faltantes com valores plausíveis. O método mais complexo de imputação é a chamada imputação múltipla. Essa tese tem por objetivo divulgar o método de imputação múltipla e através de dois artigos procura atingir esse objetivo. O primeiro artigo descreve duas técnicas de imputação múltipla e as aplica a um conjunto de dados reais. O segundo artigo faz a comparação do método de imputação múltipla com duas técnicas de imputação única através de uma aplicação a um modelo de risco para mortalidade cirúrgica. Para as aplicações foram usados dados secundários já utilizados por Klück (2004). / Missing data in health research is a very common problem. The most direct way of dealing with missing data is to exclude observations with missing data, probably because the traditional statistical methods have been developed for complete data sets. However, this decision may give biased results, mainly if the subjects considered in the analysis are different of those who have been excluded. In the last two decades, imputation methods were developed to solve this problem. The idea of the imputation is to fill in the missing data with reasonable values. The multiple imputation is the most complex method. The objective of this dissertation is to divulge the multiple imputation method through two papers. The first one describes two different types of multiple imputation and it shows an application to real data. The second paper shows a comparison among the multiple imputation and two single imputations applied to a risk model for surgical mortality. The used data sets were secondary data used by Klück (2004). Interpretacao estatística de dados Epidemiologia Estatísticas de saúde Epidemiologia : Estatistica Imputação múltipla Imputation methods Multiple imputation Missing data Nonresponse
83	Métodos de imputação de dados aplicados na área da saúde Nunes, Luciana Neves January 2007 (has links) Em pesquisas da área da saúde é muito comum que o pesquisador defronte-se com o problema de dados faltantes. Nessa situação, é freqüente que a decisão do pesquisador seja desconsiderar os sujeitos que tenham não-resposta em alguma ou algumas das variáveis, pois muitas das técnicas estatísticas foram desenvolvidas para analisar dados completos. Entretanto, essa exclusão de sujeitos pode gerar inferências que não são válidas, principalmente se os indivíduos que permanecem na análise são diferentes daqueles que foram excluídos. Nas duas últimas décadas, métodos de imputação de dados foram desenvolvidos com a intenção de se encontrar solução para esse problema. Esses métodos usam como base a idéia de preencher os dados faltantes com valores plausíveis. O método mais complexo de imputação é a chamada imputação múltipla. Essa tese tem por objetivo divulgar o método de imputação múltipla e através de dois artigos procura atingir esse objetivo. O primeiro artigo descreve duas técnicas de imputação múltipla e as aplica a um conjunto de dados reais. O segundo artigo faz a comparação do método de imputação múltipla com duas técnicas de imputação única através de uma aplicação a um modelo de risco para mortalidade cirúrgica. Para as aplicações foram usados dados secundários já utilizados por Klück (2004). / Missing data in health research is a very common problem. The most direct way of dealing with missing data is to exclude observations with missing data, probably because the traditional statistical methods have been developed for complete data sets. However, this decision may give biased results, mainly if the subjects considered in the analysis are different of those who have been excluded. In the last two decades, imputation methods were developed to solve this problem. The idea of the imputation is to fill in the missing data with reasonable values. The multiple imputation is the most complex method. The objective of this dissertation is to divulge the multiple imputation method through two papers. The first one describes two different types of multiple imputation and it shows an application to real data. The second paper shows a comparison among the multiple imputation and two single imputations applied to a risk model for surgical mortality. The used data sets were secondary data used by Klück (2004). Interpretacao estatística de dados Epidemiologia Estatísticas de saúde Epidemiologia : Estatistica Imputação múltipla Imputation methods Multiple imputation Missing data Nonresponse
84	多重插補法在線上使用者評分之應用 / Managing online user-generated product reviews using multiple imputation methods 李岑志, Li, Cen Jhih Unknown Date (has links) 隨著網路普及，人們越來越常在網路上購物並在線上評價商品，產生了非常大的口碑效應。不論對廠商或對消費者來說，線上商品評論都已經變得非常重要；消費者能藉由他人購買經驗判斷產品優劣，廠商能藉由消費者評價來提升產品品質，目前已有許多電子商務網站都有蒐集消費者購買產品後的意見回饋。這些網站中有些提供消費者能對產品打一個總分並寫一段文字評論，然而每個消費者所評論的產品特徵通常各有不同，尤其是較晚購買的消費者更可能因為自己的意見已經有人提過而省略。將每個人提到的文字敘述量化為數字分數時，沒有寫到的特徵將會使量化後的資料存在許多遺漏值。同時消費者也有可能提到一些不重要的特徵，若能找到消費者評論中，各個特徵影響消費者的多寡，廠商就能針對產品較重要的缺點改進。本研究將會著重探討消費者所提到的特徵對產品總分的影響，以及這些遺漏值填補後是否能接近消費者真實意見。過去許多填補遺漏值的方法都是一次填補全部資料，並沒有考慮消費者會受到時間較早的評論影響。本研究設計一套多重插補的方法並透過模擬驗證，以之填補亞馬遜網站的Canon 系列 SX210、SX230、SX260等三個世代數位相機之消費者評論資料。研究結果指出此方法能夠準確估計各項特徵對產品總分的影響。 / Online user-generated product reviews have become a rich source of product quality information for both producers and customers. As a result, many E-commerce websites allow customers to rate products using scores, and some together with text comments. However, people usually comment only on the features they care about and might omit those have been mentioned by previous customers. Consequently, missing data occur when analyzing comments. In addition, customers may comment the features which influence neither their satisfaction nor sales volume. Thus, it is important to find the significant features so that manufacturers can improve the main defects. Our research focuses on modeling customer reviews and their influence on predicting overall ratings. We aim to understand whether, by filling up missing values, the critical features can be identified and the features rating authentically reflect customer opinion. Many previous studies fill whole the dataset, but not consider that customer reviews might be influenced by the foregoing reviews. We propose a method based on multiple imputation and fill the costumer reviews of Canon digital camera (SX210, SX230, SX260 generations) on Amazon. We design a simulation to verify the method’s effectiveness and the method get a great result on identifying the critical features. 意見探勘遺漏值多重插補 Opinion mining Missing data Multiple imputation
85	A cox proportional hazard model for mid-point imputed interval censored data Gwaze, Arnold Rumosa January 2011 (has links) There has been an increasing interest in survival analysis with interval-censored data, where the event of interest (such as infection with a disease) is not observed exactly but only known to happen between two examination times. However, because so much research has been focused on right-censored data, so many statistical tests and techniques are available for right-censoring methods, hence interval-censoring methods are not as abundant as those for right-censored data. In this study, right-censoring methods are used to fit a proportional hazards model to some interval-censored data. Transformation of the interval-censored observations was done using a method called mid-point imputation, a method which assumes that an event occurs at some midpoint of its recorded interval. Results obtained gave conservative regression estimates but a comparison with the conventional methods showed that the estimates were not significantly different. However, the censoring mechanism and interval lengths should be given serious consideration before deciding on using mid-point imputation on interval-censored data. Statistics -- Econometric models Survival analysis (Biometry) Nonparametric statistics Sampling (Statistics) Multiple imputation (Statistics)
86	Three Studies of Transitions of Young People in Public Care: A Focus on Educational Outcomes Tessier, Nicholas January 2015 (has links) The educational outcomes of children in care, as they prepare for and eventually complete the transition out of care, have been the subject of a growing body of research. Despite the progress made, no unified theory of risk and protective factors associated with educational outcomes has yet arisen from the longitudinal, cohort, and cross-sectional studies conducted with youth in care. This dissertation presents three papers that examine the effects of risk and protective factors on a range of educational outcome variables. The studies follow the timeline of a young person preparing for transition, moving into supported transitional living, and then eventually exiting care altogether. Study 1 presents cross-sectional and longitudinal tests of the generalizability of many of the risk and protective factors identified by O’Higgins, Sebba, & Gardner (2014) in their systematic review of predictors of educational achievement among young people living in foster or kinship care. The cross-sectional sample consisted of 3,662 young people aged 12 to 17 years who were residing in out-of-home care in Ontario, Canada. An additional longitudinal sample was composed of a subsample of 962 young people from the cross-sectional sample who had also been assessed 36 months later with the AAR-C2-2010 during year 13 (2013-2014) of the OnLAC project. Supporting evidence for twelve of the twenty factors identified by O’Higgins et al. are revealed in the broad cross-sectional study and for the four factors that were found to predict change in academic success over a longitudinal timeframe suggest we are on the right track. Study 2 uses a lag-as-moderator approach to see if the time between assessments influences the predictive capacity of variables assessed when the young person was in care to predict educational variables evaluated when the youth had completed the transition to support independent living. Results from this thorough methodological study of gap length over six years of OnLAC data are encouraging: 87.5% of the predictors tested for statistical moderation effects by the length of time between assessments were shown to be stable predictors across all gaps (i.e., no moderation by gap length effect). Study 3 presents a pilot 12-month follow-up study conducted with young people at the point of a major transition within or from child welfare services, comparing their characteristics with those of samples from the general population. When assembled together, the three studies provide a foundation towards the formalizing of a list of risk and protective predictors of educational outcomes (namely, academic success, educational attainment, educational aspirations, and NEET status) originally selected from a systematic review that identified a range of factors to be associated with the educational outcomes of youth in care (O’Higgins, Sebba, and Gardner; 2014). Additionally, this dissertation presents a series of recommendations regarding the management and multiple imputation of missing data and the use of Lag as Moderator statistical methods in child welfare research. Child welfare Children Youth In-care Foster care Kinship care Public care Multiple imputation Multiple regression Lag as modetor
87	A Comparison of Techniques for Handling Missing Data in Longitudinal Studies Bogdan, Alexander R 07 November 2016 (has links) Missing data are a common problem in virtually all epidemiological research, especially when conducting longitudinal studies. In these settings, clinicians may collect biological samples to analyze changes in biomarkers, which often do not conform to parametric distributions and may be censored due to limits of detection. Using complete data from the BioCycle Study (2005-2007), which followed 259 premenopausal women over two menstrual cycles, we compared four techniques for handling missing biomarker data with non-Normal distributions. We imposed increasing degrees of missing data on two non-Normally distributed biomarkers under conditions of missing completely at random, missing at random, and missing not at random. Generalized estimating equations were used to obtain estimates from complete case analysis, multiple imputation using joint modeling, multiple imputation using chained equations, and multiple imputation using chained equations and predictive mean matching on Day 2, Day 13 and Day 14 of a standardized 28-day menstrual cycle. Estimates were compared against those obtained from analysis of the completely observed biomarker data. All techniques performed comparably when applied to a Normally distributed biomarker. Multiple imputation using joint modeling and multiple imputation using chained equations produced similar estimates across all types and degrees of missingness for each biomarker. Multiple imputation using chained equations and predictive mean matching consistently deviated from both the complete data estimates and the other missing data techniques when applied to a biomarker with a bimodal distribution. When addressing missing biomarker data in longitudinal studies, special attention should be given to the underlying distribution of the missing variable. As biomarkers become increasingly Normal, the amount of missing data tolerable while still obtaining accurate estimates may also increase when data are missing at random. Future studies are necessary to assess these techniques under more elaborate missingness mechanisms and to explore interactions between biomarkers for improved imputation models. Missing Data Longitudinal Study Multiple Imputation Biostatistics Epidemiology Other Statistics and Probability Women's Health
88	Predicting Marital Dissolution Using Data from Both Spouses Lu, Chao-Chin 16 December 2010 (has links) (PDF) The present research studies marital dissolution using data from both spouses from the National Survey of Families and Households (NSFH) and uses the method of multiple imputation to handle missing data. Role theory and another four approaches (social exchange theory, stake theory, gender perspective and heterogeneity perspective) are used to make a methodological argument why using data from both spouses is necessary to study marital stability. Five data sets are imputed and there are 3,777 observations in each imputed data set. Main research findings are as followed. First, the model fits of the data from both spouses on marital dissolution are significantly better than the model fits of the data from one spouse only; therefore, gathering perceptual data from both spouses is necessary to understand marital dissolution. Second, overall, the effects of most spousal discrepancies do not support the heterogeneity perspective. Third, the model fits of the wife only model are significantly better than the model fits of the husband only model across different periods of marital duration, and the predictability of wives' variables is more stable than husbands' variables. Therefore, if only individual-level data are available to use, researchers are encouraged to use wives' data rather than husbands' data. Fourth, the predictability of factors varies with marital duration and gender in the models with data from both spouses. marital dissolution divorce separation multiple imputation data from both spouses dyadic couple data role theory stake theory Sociology
89	Bayesian Cluster Analysis : Some Extensions to Non-standard Situations Franzén, Jessica January 2008 (has links) The Bayesian approach to cluster analysis is presented. We assume that all data stem from a finite mixture model, where each component corresponds to one cluster and is given by a multivariate normal distribution with unknown mean and variance. The method produces posterior distributions of all cluster parameters and proportions as well as associated cluster probabilities for all objects. We extend this method in several directions to some common but non-standard situations. The first extension covers the case with a few deviant observations not belonging to one of the normal clusters. An extra component/cluster is created for them, which has a larger variance or a different distribution, e.g. is uniform over the whole range. The second extension is clustering of longitudinal data. All units are clustered at all time points separately and the movements between time points are modeled by Markov transition matrices. This means that the clustering at one time point will be affected by what happens at the neighbouring time points. The third extension handles datasets with missing data, e.g. item non-response. We impute the missing values iteratively in an extra step of the Gibbs sampler estimation algorithm. The Bayesian inference of mixture models has many advantages over the classical approach. However, it is not without computational difficulties. A software package, written in Matlab for Bayesian inference of mixture models is introduced. The programs of the package handle the basic cases of clustering data that are assumed to arise from mixture models of multivariate normal distributions, as well as the non-standard situations. Cluster analysis Clustering Classification Mixture model Gaussian Bayesian inference MCMC Gibbs sampler Deviant group Longitudinal Missing data Multiple imputation Statistics Statistik
90	Imputação múltipla: comparação e eficiência em experimentos multiambientais / Multiple Imputations: comparison and efficiency of multi-environmental trials Silva, Maria Joseane Cruz da 19 July 2012 (has links) Em experimentos de genótipos ambiente são comuns à presença de valores ausentes, devido à quantidade insuficiente de genótipos para aplicação dificultando, por exemplo, o processo de recomendação de genótipos mais produtivos, pois para a aplicação da maioria das técnicas estatísticas multivariadas exigem uma matriz de dados completa. Desta forma, aplicam-se métodos que estimam os valores ausentes a partir dos dados disponíveis conhecidos como imputação de dados (simples e múltiplas), levando em consideração o padrão e o mecanismo de dados ausentes. O objetivo deste trabalho é avaliar a eficiência da imputação múltipla livre da distribuição (IMLD) (BERGAMO et al., 2008; BERGAMO, 2007) comparando-a com o método de imputação múltipla com Monte Carlo via cadeia de Markov (IMMCMC), na imputação de unidades ausentes presentes em experimentos de interação genótipo (25) ambiente (7). Estes dados são provenientes de um experimento aleatorizado em blocos com a cultura de Eucaluptus grandis (LAVORANTI, 2003), os quais foram feitas retiradas de porcentagens aleatoriamente (10%, 20%, 30%) e posteriormente imputadas pelos métodos considerados. Os resultados obtidos por cada método mostraram que, a eficiência relativa em ambas as porcentagens manteve-se acima de 90%, sendo menor para o ambiente (4) quando imputado com a IMLD. Para a medida geral de exatidão, a medida que ocorreu acréscimo de dados em falta, foi maior ao imputar os valores ausentes com a IMMCMC, já para o método IMLD estes valores variaram sendo menor a 20% de retirada aleatória. Dentre os resultados encontrados, é de suma importância considerar o fato de que o método IMMCMC considera a suposição de normalidade, já o método IMLD leva vantagem sobre este ponto, pois não considera restrição alguma sobre a distribuição dos dados nem sobre os mecanismos e padrões de ausência. / In trials of genotypes by environment, the presence of absent values is common, due to the quantity of insufficiency of genotype application, making difficult for example, the process of recommendation of more productive genotypes, because for the application of the majority of the multivariate statistical techniques, a complete data matrix is required. Thus, methods that estimate the absent values from available data, known as imputation of data (simple and multiple) are applied, taking into consideration standards and mechanisms of absent data. The goal of this study is to evaluate the efficiency of multiple imputations free of distributions (IMLD) (BERGAMO et al., 2008; BERGAMO, 2007), compared with the Monte Carlo via Markov chain method of multiple imputation (IMMCMC), in the absent units present in trials of genotype interaction (25)environment (7). This data is provisional of random tests in blocks with Eucaluptus grandis cultures (LAVORANTI, 2003), of which random percentages of withdrawals (10%, 20%, 30%) were performed, with posterior imputation of the considered methods. The results obtained for each method show that, the relative efficiency in both percentages were maintained above 90%, being less for environmental (4) when imputed with an IMLD. The general measure of exactness, the measures where higher absent data occurred, was larger when absent values with an IMMCMC was imputed, as for the IMLD method, the varied absent values were lower at 20% for random withdrawals. Among results found, it is of sum importance to take into consideration the fact that the IMMCMC method considers it to be an assumption of normality, as for the IMLD method, it does not consider any restriction on the distribution of data, not on mechanisms and absent standards, which is an advantage on imputations. Decomposition by singular values Distribuições multivariadas Genotype-environment interaction Imputação múltipla Métodos de decomposição Métodos MCMC Monte Carlo via Markov chain Multiple imputation

Search results