Global ETD Search

151	Qualificação e imputação de dados sobre satisfação de hipertensos cadastrados na estratégia saúde da família / Eligibility and imputation of data on satisfaction of hypertensive registered in the Family Health Strategy. Moreira, Raquel de Negreiros 24 February 2012 (has links) Made available in DSpace on 2015-05-14T12:47:12Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 2080872 bytes, checksum: 97ca6e77578ad42b13570272cbc34e7e (MD5) Previous issue date: 2012-02-24 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / The quality of information has been of particular interest in health. It is known that the incompleteness of information is a very common problem in information systems and epidemiological studies. Thus, it has been imputation as a solution of the missing data, which are created artificially complete set of data subject to statistical analysis. This study aimed to analyze the data quality HIPERDIA items on satisfaction and hypertensive patients of the Family Health Strategy in the city of João Pessoa / PB on the service and the use of imputation methods for missing data. Secondary data were obtained from duplicate HIPERDIA, the hypertensive patients enrolled between 2006/2007 in 36 family health teams, resulting in a representative sample of 343 users in the city of João Pessoa / PB. As a primary source was constructed an instrument consisting of eight core dimensions of primary care, measured on Likert scale ranging from "0" to "5". The techniques were applied to the method of Single Imputation: Replacement for Central Value Trend (TC), Hot Deck, Estimated Maximum Likelihood (MV) and Multinomial Logistic Regression (RLM), were compared using the percentage of correct answers, average error square (RMSE) and mean absolute percentage error (MAPE). Was built to simulate two different scenarios sample with different proportions of missing data (5%, 10%, 15%, 30% and 40%). The comparison of the allocation methods, for variable setting with a type having overlapping response to the other, the method was that TC gave better performance, followed by the method of RLM. For the scenario with homogeneous frequency response, the best method was to RLM. The study has demonstrated that there are still errors in the completion of HIPERDIA and that allowed us to recover the imputation characteristics of the representation of the original data, verifying that the imputation methods adopted brought reliability and reduction of bias in the sample proportions of up to 40% of missing data. / A qualidade das informações tem sido objeto de interesse particularmente na área da saúde. Sabe-se que a incompletude de informações é um problema muito comum nos sistemas de informação e em estudos epidemiológicos. Desta forma, tem-se como solução a imputação de dados, onde são criados conjunto de dados artificialmente completos passíveis de análise estatística. Esse estudo objetivou analisar a qualidade dos dados do HIPERDIA e dos itens sobre satisfação de usuários hipertensos da Estratégia Saúde da Família no município de João Pessoa/PB sobre o serviço e o uso de métodos de imputação para dados faltantes. Os dados secundários foram obtidos da segunda via do HIPERDIA, dos hipertensos cadastrados entre 2006/2007 em 36 equipes de Saúde da Família, resultando numa amostra representativa de 343 usuários do município de João Pessoa/PB. Como fonte primária foi construído um instrumento composto por 8 dimensões essenciais da atenção básica, mensurados na Escala de Likert variando de 0 a 5 . As técnicas foram aplicadas para o método de Imputação Única: Substituição por um Valor de Tendência Central (TC), Hot Deck, Estimativa de Máxima Verossimilhança (MV) e Regressão Logística Multinomial (RLM), sendo comparados através do percentual de acerto, erro médio quadrado (RMSE) e erro percentual médio absoluto (MAPE). Foi construída a simulação de dois cenários amostrais distintos com diferentes proporções de dados faltantes (5%,10%, 15%, 30% e 40%). Na comparação dos métodos de imputação, para cenário com variável apresentando um tipo de resposta sobrepondo às outras, o método de TC foi o que obteve melhor performance, seguido do método de RLM. Para o cenário com homogeneidade de frequencia de respostas, o melhor método foi o de RLM. O estudo permitiu demonstrar que ainda existem falhas no preenchimento do HIPERDIA e que a imputação permitiu resgatar as características da representação dos dados originais, verificando que os métodos de imputação adotados trouxeram fidedignidade e diminuição de vieses na amostra para proporções de até 40% de dados faltantes. hipertensão Hiperdia Qualidade dos dados Falta de dados Imputação Hypertension Data quality Missing data Imputation CIENCIAS DA SAUDE::SAUDE COLETIVA
152	Métodos de imputação de dados aplicados na área da saúde Nunes, Luciana Neves January 2007 (has links) Em pesquisas da área da saúde é muito comum que o pesquisador defronte-se com o problema de dados faltantes. Nessa situação, é freqüente que a decisão do pesquisador seja desconsiderar os sujeitos que tenham não-resposta em alguma ou algumas das variáveis, pois muitas das técnicas estatísticas foram desenvolvidas para analisar dados completos. Entretanto, essa exclusão de sujeitos pode gerar inferências que não são válidas, principalmente se os indivíduos que permanecem na análise são diferentes daqueles que foram excluídos. Nas duas últimas décadas, métodos de imputação de dados foram desenvolvidos com a intenção de se encontrar solução para esse problema. Esses métodos usam como base a idéia de preencher os dados faltantes com valores plausíveis. O método mais complexo de imputação é a chamada imputação múltipla. Essa tese tem por objetivo divulgar o método de imputação múltipla e através de dois artigos procura atingir esse objetivo. O primeiro artigo descreve duas técnicas de imputação múltipla e as aplica a um conjunto de dados reais. O segundo artigo faz a comparação do método de imputação múltipla com duas técnicas de imputação única através de uma aplicação a um modelo de risco para mortalidade cirúrgica. Para as aplicações foram usados dados secundários já utilizados por Klück (2004). / Missing data in health research is a very common problem. The most direct way of dealing with missing data is to exclude observations with missing data, probably because the traditional statistical methods have been developed for complete data sets. However, this decision may give biased results, mainly if the subjects considered in the analysis are different of those who have been excluded. In the last two decades, imputation methods were developed to solve this problem. The idea of the imputation is to fill in the missing data with reasonable values. The multiple imputation is the most complex method. The objective of this dissertation is to divulge the multiple imputation method through two papers. The first one describes two different types of multiple imputation and it shows an application to real data. The second paper shows a comparison among the multiple imputation and two single imputations applied to a risk model for surgical mortality. The used data sets were secondary data used by Klück (2004). Interpretacao estatística de dados Epidemiologia Estatísticas de saúde Epidemiologia : Estatistica Imputação múltipla Imputation methods Multiple imputation Missing data Nonresponse
153	Métodos de imputação de dados aplicados na área da saúde Nunes, Luciana Neves January 2007 (has links) Em pesquisas da área da saúde é muito comum que o pesquisador defronte-se com o problema de dados faltantes. Nessa situação, é freqüente que a decisão do pesquisador seja desconsiderar os sujeitos que tenham não-resposta em alguma ou algumas das variáveis, pois muitas das técnicas estatísticas foram desenvolvidas para analisar dados completos. Entretanto, essa exclusão de sujeitos pode gerar inferências que não são válidas, principalmente se os indivíduos que permanecem na análise são diferentes daqueles que foram excluídos. Nas duas últimas décadas, métodos de imputação de dados foram desenvolvidos com a intenção de se encontrar solução para esse problema. Esses métodos usam como base a idéia de preencher os dados faltantes com valores plausíveis. O método mais complexo de imputação é a chamada imputação múltipla. Essa tese tem por objetivo divulgar o método de imputação múltipla e através de dois artigos procura atingir esse objetivo. O primeiro artigo descreve duas técnicas de imputação múltipla e as aplica a um conjunto de dados reais. O segundo artigo faz a comparação do método de imputação múltipla com duas técnicas de imputação única através de uma aplicação a um modelo de risco para mortalidade cirúrgica. Para as aplicações foram usados dados secundários já utilizados por Klück (2004). / Missing data in health research is a very common problem. The most direct way of dealing with missing data is to exclude observations with missing data, probably because the traditional statistical methods have been developed for complete data sets. However, this decision may give biased results, mainly if the subjects considered in the analysis are different of those who have been excluded. In the last two decades, imputation methods were developed to solve this problem. The idea of the imputation is to fill in the missing data with reasonable values. The multiple imputation is the most complex method. The objective of this dissertation is to divulge the multiple imputation method through two papers. The first one describes two different types of multiple imputation and it shows an application to real data. The second paper shows a comparison among the multiple imputation and two single imputations applied to a risk model for surgical mortality. The used data sets were secondary data used by Klück (2004). Interpretacao estatística de dados Epidemiologia Estatísticas de saúde Epidemiologia : Estatistica Imputação múltipla Imputation methods Multiple imputation Missing data Nonresponse
154	Métodos de imputação de dados aplicados na área da saúde Nunes, Luciana Neves January 2007 (has links) Em pesquisas da área da saúde é muito comum que o pesquisador defronte-se com o problema de dados faltantes. Nessa situação, é freqüente que a decisão do pesquisador seja desconsiderar os sujeitos que tenham não-resposta em alguma ou algumas das variáveis, pois muitas das técnicas estatísticas foram desenvolvidas para analisar dados completos. Entretanto, essa exclusão de sujeitos pode gerar inferências que não são válidas, principalmente se os indivíduos que permanecem na análise são diferentes daqueles que foram excluídos. Nas duas últimas décadas, métodos de imputação de dados foram desenvolvidos com a intenção de se encontrar solução para esse problema. Esses métodos usam como base a idéia de preencher os dados faltantes com valores plausíveis. O método mais complexo de imputação é a chamada imputação múltipla. Essa tese tem por objetivo divulgar o método de imputação múltipla e através de dois artigos procura atingir esse objetivo. O primeiro artigo descreve duas técnicas de imputação múltipla e as aplica a um conjunto de dados reais. O segundo artigo faz a comparação do método de imputação múltipla com duas técnicas de imputação única através de uma aplicação a um modelo de risco para mortalidade cirúrgica. Para as aplicações foram usados dados secundários já utilizados por Klück (2004). / Missing data in health research is a very common problem. The most direct way of dealing with missing data is to exclude observations with missing data, probably because the traditional statistical methods have been developed for complete data sets. However, this decision may give biased results, mainly if the subjects considered in the analysis are different of those who have been excluded. In the last two decades, imputation methods were developed to solve this problem. The idea of the imputation is to fill in the missing data with reasonable values. The multiple imputation is the most complex method. The objective of this dissertation is to divulge the multiple imputation method through two papers. The first one describes two different types of multiple imputation and it shows an application to real data. The second paper shows a comparison among the multiple imputation and two single imputations applied to a risk model for surgical mortality. The used data sets were secondary data used by Klück (2004). Interpretacao estatística de dados Epidemiologia Estatísticas de saúde Epidemiologia : Estatistica Imputação múltipla Imputation methods Multiple imputation Missing data Nonresponse
155	Distance estimation for mixed continuous and categorical data with missing values Azevedo, Glauco Gomes de 04 June 2018 (has links) Submitted by Glauco Gomes de Azevedo (glaucogazevedo@gmail.com) on 2018-08-28T20:54:50Z No. of bitstreams: 1 dissertacao_glauco_azevedo.pdf: 1909706 bytes, checksum: 6636e75aa9da1db2615932f064fd1138 (MD5) / Approved for entry into archive by Janete de Oliveira Feitosa (janete.feitosa@fgv.br) on 2018-09-10T19:38:08Z (GMT) No. of bitstreams: 1 dissertacao_glauco_azevedo.pdf: 1909706 bytes, checksum: 6636e75aa9da1db2615932f064fd1138 (MD5) / Made available in DSpace on 2018-09-12T17:39:51Z (GMT). No. of bitstreams: 1 dissertacao_glauco_azevedo.pdf: 1909706 bytes, checksum: 6636e75aa9da1db2615932f064fd1138 (MD5) Previous issue date: 2018-06-04 / Neste trabalho é proposta uma metodologia para estimar distâncias entre pontos de dados mistos, contínuos e categóricos, contendo dados faltantes. Estimação de distâncias é a base para muitos métodos de regressão/classificação, tais como vizinhos mais próximos e análise de discriminantes, e para técnicas de clusterização como k-means e k-medoids. Métodos clássicos para manipulação de dados faltantes se baseiam em imputação pela média, o que pode subestimar a variância, ou em métodos baseados em regressão. Infelizmente, quando a meta é a estimar a distância entre observações, a imputação de dados pode performar de modo ineficiente e enviesar os resultados na direção do modelo. Na proposta desse trabalho, estima-se a distância dos pares diretamente, tratando os dados faltantes como aleatórios. A distribuição conjunta dos dados é aproximada utilizando um modelo de mistura multivariado para dados mistos, contínuos e categóricos. Apresentamentos um algoritmo do tipo EM para estimar a mistura e uma metodologia geral para estimar a distância entre observações. Simulações mostram que um método proposto performa tanto dados simulados, como reais. / In this work we propose a methodology to estimate the pairwise distance between mixed continuous and categorical data with missing values. Distance estimation is the base for many regression/classification methods, such as nearest neighbors and discriminant analysis, and for clustering techniques such as k-means and k-medoids. Classical methods for handling missing data rely on mean imputation, that could underestimate the variance, or regression-based imputation methods. Unfortunately, when the goal is to estimate the distance between observations, data imputation may perform badly and bias the results toward the data imputation model. In this work we estimate the pairwise distances directly, treating the missing data as random. The joint distribution of the data is approximated using a multivariate mixture model for mixed continuous and categorical data. We present an EM-type algorithm for estimating the mixture and a general methodology for estimating the distance between observations. Simulation shows that the proposed method performs well in both simulated and real data. Dados faltantes Machine learning Modelos de mixtura Missing data Mixture models Tecnologia Aprendizado do computador Ausencia de dados (Estatistica) Modelagem de dados
156	Parametric LCA approaches for efficient design / Approches d'ACV paramétriques pour une conception performante Kozderka, Michal 13 December 2016 (has links) Ces travaux de recherche portent sur la problématique de la mise en pratique de l'analyse de cycle de vie (ACV). La question principale est : comment faire une ACV plus rapide et plus facilement accessible pour la conception des produits ? Nous nous concentrons sur deux problématiques qui prolongent l'inventaire de Cycle de Vie (ICV) : • recherche des données manquantes : Comment ranger les données manquantes selon leur importance? Comment traiter l'intersection des aspects qualitatifs et les aspects quantitatifs des données manquantes? • Modélisation du cycle de vie : Comment réutiliser le cycle de vie existant pour un nouveau produit? Comment développer un modèle de référence? Pour la recherche des solutions nous avons utilisé l'approche "Case study" selon Robert Yin. Nos contributions font résultat de trois études de cas, dont la plus importante est l'ACV du High Impact Polypropylene (HIPP) recyclé. Nous avons publié les résultats de celle-ci dans la revue scientifique Journal of Cleaner Production. Suite aux études de cas nous proposons deux approches d'amélioration d'efficacité en ICV : nous proposons l'analyse de sensibilité préalable pour classifier les données manquantes selon leur impact sur les résultats d'ACV. L'approche combine les aspects quantitatifs avec les aspects quantitatifs en protégeant le respect des objectives d'étude. Nous appelons cette protection "LCA Poka-Yoké". La modélisation du cycle de vie peut être assistée grace à la méthode basée sur l'algorithme de King. Pour la continuation de la recherche nous proposons huit perspectives, dont six font l'objet d'intégration des nouvelles approches d'amélioration dans les concepts d'ACV basés sur la norme ISO 14025 ou dans le projet de la Commission Européenne PEF. / This work addresses the different issues that put a brake to using Lifecycle assessment (LCA) in product design by answering the main question of the research: How to make Lifecycle assessment faster and easier accessible for manufactured product design? In the LCA methodology we have identified two issues to deal with and their consecutive scientific locks : • Research of missing data : How to organize missing data? How to respect quantitative and qualitative dimensions? • Modeling of the lifecycle scenario : How to translate methodological choices into the lifecycle scenario model? How to transform the reference scenario into a new one? We have dealt with these issues using the scientific approach Case study according toRobert Yin. Our contributions are based on three case studies, between which the most important is study of High Impact Polypropylene recycling in the automotive industry. We have published it in the Journal of Cleaner Production. As result of our research we present two methods to improve efficiency of the LifecycleInventory Analysis (LCI) : To organize the missing data: Preliminary sensitivity analysis with LCA Poka-Yoke ; To help with scenario modeling: Method of workflows factorization, based on Reverse engineering. For further research we propose eight perspectives, mostly based on integration of our methods into Product Category Rules (PCR)-based platforms like EPD International or the European PEF. ACV Efficience Données manquantes Analyse de sensibilité préliminaire LCA Efficiency Missing data Preliminary sensitivity analysis 518.1 531.3 620.1
157	Methods for handling missing data due to a limit of detection in longitudinal lognormal data Dick, Nicole Marie January 1900 (has links) Master of Science / Department of Statistics / Suzanne Dubnicka / In animal science, challenge model studies often produce longitudinal data. Many times the lognormal distribution is useful in modeling the data at each time point. Escherichia coli O157 (E. coli O157) studies measure and record the concentration of colonies of the bacteria. There are times when the concentration of colonies present is too low, falling below a limit of detection. In these cases a zero is recorded for the concentration. Researchers employ a method of enrichment to determine if E. coli O157 was truly not present. This enrichment process searches for bacteria colony concentrations a second time to confirm or refute the previous measurement. If enrichment comes back without evidence of any bacteria colonies present, a zero remains as the observed concentration. If enrichment comes back with presence of bacteria colonies, a minimum value is imputed for the concentration. At the conclusion of the study the data are log10-transformed. One problem with the transformation is that the log of zero is mathematically undefined, so any observed concentrations still recorded as a zero after enrichment can not be log-transformed. Current practice carries the zero value from the lognormal data to the normal data. The purpose of this report is to evaluate methods for handling missing data due to a limit of detection and to provide results for various analyses of the longitudinal data. Multiple methods of imputing a value for the missing data are compared. Each method is analyzed by fitting three different models using SAS. To determine which method is most accurately explaining the data, a simulation study was conducted. Lognormal Longitudinal Limit of detection Missing data Repeated measures Statistics Agriculture, Animal Pathology (0476) Mathematics (0405) Statistics (0463)
158	多重插補法在線上使用者評分之應用 / Managing online user-generated product reviews using multiple imputation methods 李岑志, Li, Cen Jhih Unknown Date (has links) 隨著網路普及，人們越來越常在網路上購物並在線上評價商品，產生了非常大的口碑效應。不論對廠商或對消費者來說，線上商品評論都已經變得非常重要；消費者能藉由他人購買經驗判斷產品優劣，廠商能藉由消費者評價來提升產品品質，目前已有許多電子商務網站都有蒐集消費者購買產品後的意見回饋。這些網站中有些提供消費者能對產品打一個總分並寫一段文字評論，然而每個消費者所評論的產品特徵通常各有不同，尤其是較晚購買的消費者更可能因為自己的意見已經有人提過而省略。將每個人提到的文字敘述量化為數字分數時，沒有寫到的特徵將會使量化後的資料存在許多遺漏值。同時消費者也有可能提到一些不重要的特徵，若能找到消費者評論中，各個特徵影響消費者的多寡，廠商就能針對產品較重要的缺點改進。本研究將會著重探討消費者所提到的特徵對產品總分的影響，以及這些遺漏值填補後是否能接近消費者真實意見。過去許多填補遺漏值的方法都是一次填補全部資料，並沒有考慮消費者會受到時間較早的評論影響。本研究設計一套多重插補的方法並透過模擬驗證，以之填補亞馬遜網站的Canon 系列 SX210、SX230、SX260等三個世代數位相機之消費者評論資料。研究結果指出此方法能夠準確估計各項特徵對產品總分的影響。 / Online user-generated product reviews have become a rich source of product quality information for both producers and customers. As a result, many E-commerce websites allow customers to rate products using scores, and some together with text comments. However, people usually comment only on the features they care about and might omit those have been mentioned by previous customers. Consequently, missing data occur when analyzing comments. In addition, customers may comment the features which influence neither their satisfaction nor sales volume. Thus, it is important to find the significant features so that manufacturers can improve the main defects. Our research focuses on modeling customer reviews and their influence on predicting overall ratings. We aim to understand whether, by filling up missing values, the critical features can be identified and the features rating authentically reflect customer opinion. Many previous studies fill whole the dataset, but not consider that customer reviews might be influenced by the foregoing reviews. We propose a method based on multiple imputation and fill the costumer reviews of Canon digital camera (SX210, SX230, SX260 generations) on Amazon. We design a simulation to verify the method’s effectiveness and the method get a great result on identifying the critical features. 意見探勘遺漏值多重插補 Opinion mining Missing data Multiple imputation
159	Linear Discriminant Analysis with Repeated Measurements Skinner, Evelina January 2019 (has links) The classification of observations based on repeated measurements performed on the same subject over a given period of time or under different conditions is a common procedure in many disciplines such as medicine, psychology and environmental studies. In this thesis repeated measurements follow the Growth Curve model and are classified using linear discriminant analysis. The aim of this thesis is both to examine the effect of missing data on classification accuracy and to examine the effect of additional data on classification robustness. The results indicate that an increasing amount of missing data leads to a progressive decline in classification accuracy. With regard to the effect of additional data on classification robustness the results show a less predictable effect which can only be characterised as a general tendency towards improved robustness. Classification repeated measurements linear discriminant analysis the Growth Curve model missing data classification robustness additional measurements Probability Theory and Statistics Sannolikhetsteori och statistik
160	Recommendations Regarding Q-Matrix Design and Missing Data Treatment in the Main Effect Log-Linear Cognitive Diagnosis Model Ma, Rui 11 December 2019 (has links) Diagnostic classification models used in conjunction with diagnostic assessments are to classify individual respondents into masters and nonmasters at the level of attributes. Previous researchers (Madison & Bradshaw, 2015) recommended items on the assessment should measure all patterns of attribute combinations to ensure classification accuracy, but in practice, certain attributes may not be measured by themselves. Moreover, the model estimation requires large sample size, but in reality, there could be unanswered items in the data. Therefore, the current study sought to provide suggestions on selecting between two alternative Q-matrix designs when an attribute cannot be measured in isolation and when using maximum likelihood estimation in the presence of missing responses. The factorial ANOVA results of this simulation study indicate that adding items measuring some attributes instead of all attributes is more optimal and that other missing data treatments should be sought if the percent of missing responses is greater than 5%. diagnostic classification model log-linear cognitive diagnostic model Q-matrix missing data classification accuracy attribute reliability Education

Search results