11 |
Temporal Graph Record Linkage and k-Safe Approximate MatchJupin, Joseph January 2016 (has links)
Since the advent of electronic data processing, organizations have accrued vast amounts of data contained in multiple databases with no reliable global unique identifier. These databases were developed by different departments for different purposes at different times. Organizing and analyzing these data for human services requires linking records from all sources. RL (Record Linkage) is a process that connects records that are related to the identical or a sufficiently similar entity from multiple heterogeneous databases. RL is a data and compute intensive, mission critical process. The process must be efficient enough to process big data and effective enough to provide accurate matches. We have evaluated an RL system that is currently in use by a local health and human services department. We found that they were using the typical approach that was offered by Fellegi and Sunter with tuple-by-tuple processing, using the Soundex as the primary approximate string matching method. The Soundex has been found to be unreliable both as a phonetic and as an approximate string matching method. We found that their data, in many cases, has more than one value per field, suggesting that the data were queried from a 5NF data base. Consider that if a woman has been married 3 times, she may have up to 4 last names on record. This query process produced more than one tuple per database/entity apparently generating a Cartesian product of this data. In many cases, more than a dozen tuples were observed for a single database/entity. This approach is both ineffective and inefficient. An effective RL method should handle this multi-data without redundancy and use edit-distance for approximate string matching. However, due to high computational complexity, edit-distance will not scale well with big data problems. We developed two methodologies for resolving the aforementioned issues: PSH and ALIM. PSH – The Probabilistic Signature Hash is a composite method that increases the speed of Damerau-Levenshtein edit-distance. It combines signature filtering, probabilistic hashing, length filtering and prefix pruning to increase the speed of edit-distance. It is also lossless because it does not lose any true positive matches. ALIM – Aggregate Link and Iterative Match is a graph-based record linkage methodology that uses a multi-graph to store demographic data about people. ALIM performs string matching as records are inserted into the graph. ALIM eliminates data redundancy and stores the relationships between data. We tested PSH for string comparison and found it to be approximately 6,000 times faster than DL. We tested it against the trie-join methods and found that they are up to 6.26 times faster but lose between 10 and 20 percent of true positives. We tested ALIM against a method currently in use by a local health and human services department and found ALIM to produce significantly more matches (even with more restrictive match criteria) and that ALIM ran more than twice as fast. ALIM handles the multi-data problem and PSH allows the use of edit-distance comparison in this RL model. ALIM is more efficient and effective than a currently implemented RL system. This model can also be expanded to perform social network analysis and temporal data modeling. For human services, temporal modeling can reveal how policy changes and treatments affect clients over time and social network analysis can determine the effects of these on whole families by facilitating family linkage. / Computer and Information Science
|
12 |
O uso de método de relacionamento de dados (record linkage) para integração de informação em sistemas heterogêneos de saúde: estudo de aplicabilidade entre níveis primário e terciário / The use of record linkage method for integration heterogeneous information systems in health: a study of applicability between primary and tertiarySuzuki, Katia Mitiko Firmino 21 September 2012 (has links)
O relacionamento de dados record linkage, originou-se na área da saúde pública e atualmente é aplicado em várias outras áreas como: epidemiologia, pesquisa médica, criação de ensaios clínicos, na área de marketing, gestão de relacionamento com o cliente, detecção de fraude, aplicação da lei e na administração do governo. A técnica consiste no processo de comparação entre dois ou mais registros em diferentes bases de dados e as principais estratégias de record linkage são: manual, deterministic record linkage (DRL) e probabilistic record linkage (PRL). Este estudoteve como objetivo aplicar o record linkage em bases de dados heterogêneas, utilizadas pela rede de atenção à saúde do município de Ribeirão Preto e identificar entre elas a melhor estratégia a ser adotada para a integração de bases de dados na área da saúde. As bases de dados da secretaria Municipal de Saúde de Ribeirão Preto (SMS-RP) e do Hospital das Clínicas da Faculdade de Medicina de Ribeirão Preto (HCFMRP/USP) foram objeto deste estudo, tendo como critério de inclusão apenas os registros de pacientes em que o município de residência informado correspondia ao município de Ribeirão Preto e o atendimento tivesse ocorrido na Unidade Básica Distrital e de Saúde (UDBS) - Centro Saúde Escola Joel Domingos Machado\" (CSE-Sumarezinho) nos anos de janeiro de 2006 a agosto de 2008 e no HCFMRP/USP. Foi selecionada uma amostra aleatória simples resultando em um conjunto de 1.100 registros de pacientes na base de dados do CSE-Sumarezinho e de 370.375 registros na base de dados do HCFMRP/USP. Foram, então, selecionadas quatro variáveis de relacionamento (nome, nome da mãe, sexo e data de nascimento). As estratégias adotadas foram: DRL exato, DRL com discordância em uma variável de relacionamento, e baseada em funções de similaridades (Dice, Levenshtein, Jaro e Jaro-Winkler) e, por fim, PRL. A estratégia DRL exato resultou em 334 registros pareados e na abordagem com discordância de uma variável foram 335, 343, 383 e 495, sendo as variáveis discordantes sexo, data de nascimento, nome e nome da mãe respectivamente. Quanto ao uso das funções de similaridades, as que mais se destacaram foram Jaro-Winkler e Jaro. Quanto à acurácia dos métodos aplicados, o PRL (sensibilidade = 97,75% (CI 95% 96,298,8) e especificidade = 98,55% (CI 95% 97,0-99,4)) obteve melhor sensibilidade e especificidade, seguido do DRL com as funções de similaridade Jaro-Winkler sensibilidade = 91,3% (CI 95% 88,793,4) e especificidade = 99% (CI 95% 97,6-99,7)) e Jaro (sensibilidade = 73,1% (CI 95% 69,476,6) e especificidade = 99,6% (CI 95% 98,5-99,9)). Quanto à avaliação da área sob a curva ROC do PRL, observou-se que há diferença estatisticamente significativa (p = 0,0001) quando comparada com os métodos DRL com discordância da variável nome da mãe, Jaro-Winkler e Jaro. Os resultados obtidos permitem concluir que o método PRL é mais preciso dentre as técnicas avaliadas. Mas as técnicas com a função de similaridade de Jaro-Winkler e Jaro também são alternativas viáveis interessantes devido à facilidade de utilização apesar de apresentarem o valor de sensibilidade ligeiramente menor que o PRL. / The record linkage originated in the area of public health and is currently applied in several other areas such as epidemiology, medical research, establishment of clinical trials, in the area of marketing, manager customer relationships, fraud detection, law enforcement and government administration. The technique consists on the comparison between two or more records in different databases and their key strategies are: manual comparison, Deterministic Record Linkage (DRL), and Probabilistic Record Linkage (PRL).This study aimed to apply the record linkage in heterogeneous databases, used by the network of health care in Ribeirão Preto and identify the best strategy to be adopted for the integration of databases in health care. The databases that were evaluated in this study were of the Municipal Health Department of Ribeirão Preto (SMS-RP) and of the Clinical Hospital of the School of Medicine of Ribeirao Preto (HCFMRP/USP) having as inclusion criterion only the records of patients in the county of residence reported corresponded to the city of Ribeirão Preto and care had taken place in the Basic District Health Unit (UDBS) - School Health Center \"Joel Domingos Machado\" (CSE-Sumarezinho) included in the years from January 2006 to August 2008 and in the HCFMRP/USP. Held to select a simple random sample resulted in a set of 1,100 patient records in the database of the CSE-Sumarezinho and 370,375 records in the database of HCFMRP/USP. Then there was the selection of four linking variables (name, mother\'s name, gender and birth date). The strategies adopted were: the exact DRL, DRL with one variable where the linking is disagreement, applied with similarity functions (Dice, Levenshtein, Jaro, and Jaro-Winkler), and, finally, PRL. The strategy of the exact DRL resulted in 334 matched records and strategy in dealing with disagreement of one variable were 335, 343, 383 and 495, to the following variables discordant gender, birth date, name and mother\'s name, respectively. Regarding the use of similarity functions which most stood out were Jaro and Jaro-Winkler. Regarding the accuracy of the methods applied, the PRL obtained better sensitivity and specificity (sensitivity = 97,75% (CI 95% 96,298,8) and specificity = 98.55% (95% CI 97.0 to 99.4)), followed by the DRL with the similarity functions Jaro-Winkler (sensitivity = 91.3% (95% CI 88.7 to 93.4) and specificity = 99% (95% CI 97.6 to 99, 7)) and then by Jaro (sensitivity = 73.1% (95% CI 69.4 to 76.6) = 99.6% and specificity (95% CI 98.5 to 99.9)). The evaluation of the area under the ROC curve in the PRL, was observed that there is statistically significant difference (p = 0.0001) if it is compared with the DRL methods when there is disagreement in the variable mother\'s name, as well as for Jaro and for Jaro-Winkler. The results indicate that the PRL method is most accurate among the techniques evaluated. Although the techniques with the similarity function of Jaro-Winkler and Jaro were also interesting viable options due to the ease of use, although having the sensitivity value slightly smaller than the PRL.
|
13 |
O uso de método de relacionamento de dados (record linkage) para integração de informação em sistemas heterogêneos de saúde: estudo de aplicabilidade entre níveis primário e terciário / The use of record linkage method for integration heterogeneous information systems in health: a study of applicability between primary and tertiaryKatia Mitiko Firmino Suzuki 21 September 2012 (has links)
O relacionamento de dados record linkage, originou-se na área da saúde pública e atualmente é aplicado em várias outras áreas como: epidemiologia, pesquisa médica, criação de ensaios clínicos, na área de marketing, gestão de relacionamento com o cliente, detecção de fraude, aplicação da lei e na administração do governo. A técnica consiste no processo de comparação entre dois ou mais registros em diferentes bases de dados e as principais estratégias de record linkage são: manual, deterministic record linkage (DRL) e probabilistic record linkage (PRL). Este estudoteve como objetivo aplicar o record linkage em bases de dados heterogêneas, utilizadas pela rede de atenção à saúde do município de Ribeirão Preto e identificar entre elas a melhor estratégia a ser adotada para a integração de bases de dados na área da saúde. As bases de dados da secretaria Municipal de Saúde de Ribeirão Preto (SMS-RP) e do Hospital das Clínicas da Faculdade de Medicina de Ribeirão Preto (HCFMRP/USP) foram objeto deste estudo, tendo como critério de inclusão apenas os registros de pacientes em que o município de residência informado correspondia ao município de Ribeirão Preto e o atendimento tivesse ocorrido na Unidade Básica Distrital e de Saúde (UDBS) - Centro Saúde Escola Joel Domingos Machado\" (CSE-Sumarezinho) nos anos de janeiro de 2006 a agosto de 2008 e no HCFMRP/USP. Foi selecionada uma amostra aleatória simples resultando em um conjunto de 1.100 registros de pacientes na base de dados do CSE-Sumarezinho e de 370.375 registros na base de dados do HCFMRP/USP. Foram, então, selecionadas quatro variáveis de relacionamento (nome, nome da mãe, sexo e data de nascimento). As estratégias adotadas foram: DRL exato, DRL com discordância em uma variável de relacionamento, e baseada em funções de similaridades (Dice, Levenshtein, Jaro e Jaro-Winkler) e, por fim, PRL. A estratégia DRL exato resultou em 334 registros pareados e na abordagem com discordância de uma variável foram 335, 343, 383 e 495, sendo as variáveis discordantes sexo, data de nascimento, nome e nome da mãe respectivamente. Quanto ao uso das funções de similaridades, as que mais se destacaram foram Jaro-Winkler e Jaro. Quanto à acurácia dos métodos aplicados, o PRL (sensibilidade = 97,75% (CI 95% 96,298,8) e especificidade = 98,55% (CI 95% 97,0-99,4)) obteve melhor sensibilidade e especificidade, seguido do DRL com as funções de similaridade Jaro-Winkler sensibilidade = 91,3% (CI 95% 88,793,4) e especificidade = 99% (CI 95% 97,6-99,7)) e Jaro (sensibilidade = 73,1% (CI 95% 69,476,6) e especificidade = 99,6% (CI 95% 98,5-99,9)). Quanto à avaliação da área sob a curva ROC do PRL, observou-se que há diferença estatisticamente significativa (p = 0,0001) quando comparada com os métodos DRL com discordância da variável nome da mãe, Jaro-Winkler e Jaro. Os resultados obtidos permitem concluir que o método PRL é mais preciso dentre as técnicas avaliadas. Mas as técnicas com a função de similaridade de Jaro-Winkler e Jaro também são alternativas viáveis interessantes devido à facilidade de utilização apesar de apresentarem o valor de sensibilidade ligeiramente menor que o PRL. / The record linkage originated in the area of public health and is currently applied in several other areas such as epidemiology, medical research, establishment of clinical trials, in the area of marketing, manager customer relationships, fraud detection, law enforcement and government administration. The technique consists on the comparison between two or more records in different databases and their key strategies are: manual comparison, Deterministic Record Linkage (DRL), and Probabilistic Record Linkage (PRL).This study aimed to apply the record linkage in heterogeneous databases, used by the network of health care in Ribeirão Preto and identify the best strategy to be adopted for the integration of databases in health care. The databases that were evaluated in this study were of the Municipal Health Department of Ribeirão Preto (SMS-RP) and of the Clinical Hospital of the School of Medicine of Ribeirao Preto (HCFMRP/USP) having as inclusion criterion only the records of patients in the county of residence reported corresponded to the city of Ribeirão Preto and care had taken place in the Basic District Health Unit (UDBS) - School Health Center \"Joel Domingos Machado\" (CSE-Sumarezinho) included in the years from January 2006 to August 2008 and in the HCFMRP/USP. Held to select a simple random sample resulted in a set of 1,100 patient records in the database of the CSE-Sumarezinho and 370,375 records in the database of HCFMRP/USP. Then there was the selection of four linking variables (name, mother\'s name, gender and birth date). The strategies adopted were: the exact DRL, DRL with one variable where the linking is disagreement, applied with similarity functions (Dice, Levenshtein, Jaro, and Jaro-Winkler), and, finally, PRL. The strategy of the exact DRL resulted in 334 matched records and strategy in dealing with disagreement of one variable were 335, 343, 383 and 495, to the following variables discordant gender, birth date, name and mother\'s name, respectively. Regarding the use of similarity functions which most stood out were Jaro and Jaro-Winkler. Regarding the accuracy of the methods applied, the PRL obtained better sensitivity and specificity (sensitivity = 97,75% (CI 95% 96,298,8) and specificity = 98.55% (95% CI 97.0 to 99.4)), followed by the DRL with the similarity functions Jaro-Winkler (sensitivity = 91.3% (95% CI 88.7 to 93.4) and specificity = 99% (95% CI 97.6 to 99, 7)) and then by Jaro (sensitivity = 73.1% (95% CI 69.4 to 76.6) = 99.6% and specificity (95% CI 98.5 to 99.9)). The evaluation of the area under the ROC curve in the PRL, was observed that there is statistically significant difference (p = 0.0001) if it is compared with the DRL methods when there is disagreement in the variable mother\'s name, as well as for Jaro and for Jaro-Winkler. The results indicate that the PRL method is most accurate among the techniques evaluated. Although the techniques with the similarity function of Jaro-Winkler and Jaro were also interesting viable options due to the ease of use, although having the sensitivity value slightly smaller than the PRL.
|
14 |
Arquitetura e métodos de integração de dados e interoperabilidade aplicados na saúde mental / Investigation of the effectiveness of data integration and interoperability methods applied to mental healthMiyoshi, Newton Shydeo Brandão 16 March 2018 (has links)
A disponibilidade e integração das informações em saúde relativas a um mesmo paciente entre diferentes níveis de atenção ou entre diferentes instituições de saúde é normalmente incompleta ou inexistente. Isso acontece principalmente porque os sistemas de informação que oferecem apoio aos profissionais da saúde não são interoperáveis, dificultando também a gestão dos serviços a nível municipal e regional. Essa fragmentação da informação também é desafiadora e preocupante na área da saúde mental, em que normalmente se exige um cuidado prolongado e que integra diferentes tipos de serviços de saúde. Problemas como a baixa qualidade e indisponibilidade de informações, assim como a duplicidade de registros, são importantes aspectos na gestão e no cuidado prolongado ao paciente portador de transtornos mentais. Apesar disso, ainda não existem estudos objetivos demonstrando o impacto efetivo da interoperabilidade e integração de dados na gestão e na qualidade de dados para a área de saúde mental. Objetivos: Neste contexto, o projeto tem como objetivo geral propor uma arquitetura de interoperabilidade para a assistência em saúde regionalizada e avaliar a efetividade de técnicas de integração de dados e interoperabilidade para a gestão dos atendimentos e internações em saúde mental na região de Ribeirão Preto, assim como o impacto na melhoria e disponibilidade dos dados por meio de métricas bem definidas. Métodos: O framework de interoperabilidade proposto tem como base a arquitetura cliente-servidor em camadas. O modelo de informação de interoperabilidade foi baseado em padrões de saúde internacionais e nacionais. Foi proposto um servidor de terminologias baseado em padrões de informação em saúde. Foram também utilizados algoritmos de Record Linkage para garantir a identificação unívoca do paciente. Para teste e validação da proposta foram utilizados dados de diferentes níveis de atenção à saúde provenientes de atendimentos na rede de atenção psicossocial na região de Ribeirão Preto. Os dados foram extraídos de cinco fontes diferentes: (i) a Unidade Básica de Saúde da Família - I, de Santa Cruz da Esperança; (ii) o Centro de Atenção Integrada à Saúde, de Santa Rita do Passa Quatro; (iii) o Hospital Santa Tereza; (iv) as informações de solicitações de internação contidas no SISAM (Sistema de Informação em Saúde Mental); e (v) dados demográficos do Barramento do Cartão Nacional de Saúde do Ministério da Saúde. As métricas de qualidade de dados utilizadas foram completude, consistência, duplicidade e acurácia. Resultados: Como resultado deste trabalho, foi projetado, desenvolvido e testado a plataforma de interoperabilidade em saúde, denominado eHealth-Interop. Foi adotada uma proposta de interoperabilidade por meio de serviços web com um modelo de integração de dados baseado em um banco de dados centralizador. Foi desenvolvido também um servidor de terminologias, denominado eHealth-Interop Terminology Server, que pode ser utilizado como um componente independente e em outros contextos médicos. No total foram obtidos dados de 31340 registros de pacientes pelo SISAM, e-SUS AB de Santa Cruz da Esperança, do CAIS de Santa Rita do Passa Quatro, do Hospital Santa Tereza e do Barramento do CNS do Ministério da Saúde. Desse total, 30,47% (9548) registros foram identificados como presente em mais de 1 fonte de informação, possuindo diferentes níveis de acurácia e completude. A análise de qualidade de dados, abrangendo todas os registros integrados, obteve uma melhoria na completude média de 18,40% (de 56,47% para 74,87%) e na acurácia sintática média de 1,08% (de 96,69% para 96,77%). Na análise de consistência houve melhoras em todas as fontes de informação, variando de uma melhoria mínima de 14.4% até o máximo de 51,5%. Com o módulo de Record Linkage foi possível quantificar, 1066 duplicidades e, dessas, 226 foram verificadas manualmente. Conclusões: A disponibilidade e a qualidade da informação são aspectos importantes para a continuidade do atendimento e gerenciamento de serviços de saúde. A solução proposta neste trabalho visa estabelecer um modelo computacional para preencher essa lacuna. O ambiente de interoperabilidade foi capaz de integrar a informação no caso de uso de saúde mental com o suporte de terminologias clínicas internacionais e nacionais sendo flexível para ser estendido a outros domínios de atenção à saúde. / The availability and integration of health information from the same patient between different care levels or between different health services is usually incomplete or non-existent. This happens especially because the information systems that support health professionals are not interoperable, making it difficult to manage services at the municipal and regional level. This fragmentation of information is also challenging and worrying in the area of mental health, where long-term care is often required and integrates different types of health services and professionals. Problems such as poor quality and unavailability of information, as well as duplicate records, are important aspects in the management and long-term care of patients with mental disorders. Despite this, there are still no objective studies that demonstrate the effective impact of interoperability and data integration on the management and quality of data for the mental health area. Objectives: In this context, this project proposes an interoperability architecture for regionalized health care management. It also proposes to evaluate the effectiveness of data integration and interoperability techniques for the management of mental health hospitalizations in the Ribeirão Preto region as well as the improvement in data availability through well-defined metrics. Methods: The proposed framework is based on client-service architecture to be deployed in the web. The interoperability information model was based on international and national health standards. It was proposed a terminology server based on health information standards. Record Linkage algorithms were implemented to guarantee the patient identification. In order to test and validate the proposal, we used data from different health care levels provided by the mental health care network in the Ribeirão Preto region. The data were extracted from five different sources: the Family Health Unit I of Santa Cruz da Esperança, the Center for Integrated Health Care of Santa Rita do Passa Quatro, Santa Tereza Hospital, the information on hospitalization requests system in SISAM (Mental Health Information System) and demographic data of the Brazilian Ministry of Health Bus. Results: As a result of this work, the health interoperability platform, called eHealth-Interop, was designed, developed and tested. A proposal was adopted for interoperability through web services with a data integration model based on a centralizing database. A terminology server, called eHealth-Interop Terminology Server, has been developed that can be used as an independent component and in other medical contexts. In total, 31340 patient records were obtained from SISAM, eSUS-AB from Santa Cruz da Esperança, from CAIS from Santa Rita do Passa Quatro, from Santa Tereza Hospital and from the CNS Service Bus from the Brazillian Ministry of Health. 47% (9548) records were identified as present in more than 1 information source, having different levels ofaccuracy and completeness. The data quality analysis, covering all integrated records, obtained an improvement in the average completeness of 18.40% (from 56.47% to 74.87%) and the mean syntactic accuracy of 1.08% (from 96,69% to 96.77%). In the consistency analysis there were improvements in all information sources, ranging from a minimum improvement of 14.4% to a maximum of 51.5%. With the Record Linkage module it was possible to quantify 1066 duplications, of which 226 were manually verified. Conclusions: The information\'s availability and quality are both important aspects for the continuity of care and health services management. The solution proposed in this work aims to establish a computational model to fill this gap. It has been successfully applied in the mental health care context and is flexible to be extendable to other medical domains.
|
15 |
Arquitetura e métodos de integração de dados e interoperabilidade aplicados na saúde mental / Investigation of the effectiveness of data integration and interoperability methods applied to mental healthNewton Shydeo Brandão Miyoshi 16 March 2018 (has links)
A disponibilidade e integração das informações em saúde relativas a um mesmo paciente entre diferentes níveis de atenção ou entre diferentes instituições de saúde é normalmente incompleta ou inexistente. Isso acontece principalmente porque os sistemas de informação que oferecem apoio aos profissionais da saúde não são interoperáveis, dificultando também a gestão dos serviços a nível municipal e regional. Essa fragmentação da informação também é desafiadora e preocupante na área da saúde mental, em que normalmente se exige um cuidado prolongado e que integra diferentes tipos de serviços de saúde. Problemas como a baixa qualidade e indisponibilidade de informações, assim como a duplicidade de registros, são importantes aspectos na gestão e no cuidado prolongado ao paciente portador de transtornos mentais. Apesar disso, ainda não existem estudos objetivos demonstrando o impacto efetivo da interoperabilidade e integração de dados na gestão e na qualidade de dados para a área de saúde mental. Objetivos: Neste contexto, o projeto tem como objetivo geral propor uma arquitetura de interoperabilidade para a assistência em saúde regionalizada e avaliar a efetividade de técnicas de integração de dados e interoperabilidade para a gestão dos atendimentos e internações em saúde mental na região de Ribeirão Preto, assim como o impacto na melhoria e disponibilidade dos dados por meio de métricas bem definidas. Métodos: O framework de interoperabilidade proposto tem como base a arquitetura cliente-servidor em camadas. O modelo de informação de interoperabilidade foi baseado em padrões de saúde internacionais e nacionais. Foi proposto um servidor de terminologias baseado em padrões de informação em saúde. Foram também utilizados algoritmos de Record Linkage para garantir a identificação unívoca do paciente. Para teste e validação da proposta foram utilizados dados de diferentes níveis de atenção à saúde provenientes de atendimentos na rede de atenção psicossocial na região de Ribeirão Preto. Os dados foram extraídos de cinco fontes diferentes: (i) a Unidade Básica de Saúde da Família - I, de Santa Cruz da Esperança; (ii) o Centro de Atenção Integrada à Saúde, de Santa Rita do Passa Quatro; (iii) o Hospital Santa Tereza; (iv) as informações de solicitações de internação contidas no SISAM (Sistema de Informação em Saúde Mental); e (v) dados demográficos do Barramento do Cartão Nacional de Saúde do Ministério da Saúde. As métricas de qualidade de dados utilizadas foram completude, consistência, duplicidade e acurácia. Resultados: Como resultado deste trabalho, foi projetado, desenvolvido e testado a plataforma de interoperabilidade em saúde, denominado eHealth-Interop. Foi adotada uma proposta de interoperabilidade por meio de serviços web com um modelo de integração de dados baseado em um banco de dados centralizador. Foi desenvolvido também um servidor de terminologias, denominado eHealth-Interop Terminology Server, que pode ser utilizado como um componente independente e em outros contextos médicos. No total foram obtidos dados de 31340 registros de pacientes pelo SISAM, e-SUS AB de Santa Cruz da Esperança, do CAIS de Santa Rita do Passa Quatro, do Hospital Santa Tereza e do Barramento do CNS do Ministério da Saúde. Desse total, 30,47% (9548) registros foram identificados como presente em mais de 1 fonte de informação, possuindo diferentes níveis de acurácia e completude. A análise de qualidade de dados, abrangendo todas os registros integrados, obteve uma melhoria na completude média de 18,40% (de 56,47% para 74,87%) e na acurácia sintática média de 1,08% (de 96,69% para 96,77%). Na análise de consistência houve melhoras em todas as fontes de informação, variando de uma melhoria mínima de 14.4% até o máximo de 51,5%. Com o módulo de Record Linkage foi possível quantificar, 1066 duplicidades e, dessas, 226 foram verificadas manualmente. Conclusões: A disponibilidade e a qualidade da informação são aspectos importantes para a continuidade do atendimento e gerenciamento de serviços de saúde. A solução proposta neste trabalho visa estabelecer um modelo computacional para preencher essa lacuna. O ambiente de interoperabilidade foi capaz de integrar a informação no caso de uso de saúde mental com o suporte de terminologias clínicas internacionais e nacionais sendo flexível para ser estendido a outros domínios de atenção à saúde. / The availability and integration of health information from the same patient between different care levels or between different health services is usually incomplete or non-existent. This happens especially because the information systems that support health professionals are not interoperable, making it difficult to manage services at the municipal and regional level. This fragmentation of information is also challenging and worrying in the area of mental health, where long-term care is often required and integrates different types of health services and professionals. Problems such as poor quality and unavailability of information, as well as duplicate records, are important aspects in the management and long-term care of patients with mental disorders. Despite this, there are still no objective studies that demonstrate the effective impact of interoperability and data integration on the management and quality of data for the mental health area. Objectives: In this context, this project proposes an interoperability architecture for regionalized health care management. It also proposes to evaluate the effectiveness of data integration and interoperability techniques for the management of mental health hospitalizations in the Ribeirão Preto region as well as the improvement in data availability through well-defined metrics. Methods: The proposed framework is based on client-service architecture to be deployed in the web. The interoperability information model was based on international and national health standards. It was proposed a terminology server based on health information standards. Record Linkage algorithms were implemented to guarantee the patient identification. In order to test and validate the proposal, we used data from different health care levels provided by the mental health care network in the Ribeirão Preto region. The data were extracted from five different sources: the Family Health Unit I of Santa Cruz da Esperança, the Center for Integrated Health Care of Santa Rita do Passa Quatro, Santa Tereza Hospital, the information on hospitalization requests system in SISAM (Mental Health Information System) and demographic data of the Brazilian Ministry of Health Bus. Results: As a result of this work, the health interoperability platform, called eHealth-Interop, was designed, developed and tested. A proposal was adopted for interoperability through web services with a data integration model based on a centralizing database. A terminology server, called eHealth-Interop Terminology Server, has been developed that can be used as an independent component and in other medical contexts. In total, 31340 patient records were obtained from SISAM, eSUS-AB from Santa Cruz da Esperança, from CAIS from Santa Rita do Passa Quatro, from Santa Tereza Hospital and from the CNS Service Bus from the Brazillian Ministry of Health. 47% (9548) records were identified as present in more than 1 information source, having different levels ofaccuracy and completeness. The data quality analysis, covering all integrated records, obtained an improvement in the average completeness of 18.40% (from 56.47% to 74.87%) and the mean syntactic accuracy of 1.08% (from 96,69% to 96.77%). In the consistency analysis there were improvements in all information sources, ranging from a minimum improvement of 14.4% to a maximum of 51.5%. With the Record Linkage module it was possible to quantify 1066 duplications, of which 226 were manually verified. Conclusions: The information\'s availability and quality are both important aspects for the continuity of care and health services management. The solution proposed in this work aims to establish a computational model to fill this gap. It has been successfully applied in the mental health care context and is flexible to be extendable to other medical domains.
|
16 |
A longitudinal patient record for patients receiving antiretroviral treatmentKotze, E., McDonald, T. January 2012 (has links)
Published Article / In response to the Human Immunodeficiency Virus (HIV) epidemic in the country, the South African Government started with the provisioning of Antiretroviral Therapy (ART) in the public health sector. Monitoring and evaluating the effectiveness of the ART programme is of the utmost importance. The current patient information system could not supply the required information to manage the rollout of the ART programme. A data warehouse, consisting of several data marts, was developed that integrated several disparate systems related to HIV/AIDS/ART into one system. It was, however, not possible to trace a patient across all the data marts in the data warehouse. No unique identifiers existed for the patient records in the different data marts and they also had different structures. Record linkage in conjunction with a mapping process was used to link all the data marts and in so doing identify the same patient in all the data marts. This resulted in a longitudinal patient record of an ART patient that displayed all the treatments received by the patient in all public health care facilities in the province.
|
17 |
Targeting Non-obvious Errors in Death CertificatesJohansson, Lars Age January 2008 (has links)
Mortality statistics are much used although their accuracy is often questioned. Producers of mortality statistics check for errors in death certification but current methods only capture obvious mistakes. This thesis investigates whether non-obvious errors can be found by linking death certificates to hospital discharge data. Data: 69,818 deaths in Sweden 1995. Paper I: Analysing differences between the underlying cause of death from the death certificate (UC) and the main discharge condition from the patient’s last hospitalization (MDC). Paper II: Testing whether differences can be explained by ICD definitions of UC and MDC. Paper III: Surveying methods in 44 current studies on the accuracy of death certificates. Paper IV: Checking death certificates against case summaries for: i) 573 deaths where UC and MDC were the same or the difference could be explained; ii) 562 deaths where the difference could not be explained. Results: In 54% of deaths the MDC differed from the UC. Almost two-thirds of the differences were medically compatible since the MDC might have developed as a complication of the UC. Of 44 recent evaluation studies, only 8 describe the methods in such detail that the study could be replicated. Incompatibility between MDC and UC indicates a four-fold risk that the death certificate is inaccurate. For some diagnostic groups, however, death certificates are often inaccurate even when the UC and MDC are compatible. Conclusion: Producers of official mortality statistics could reduce the number of non-obvious errors in the statistics by collecting additional information on incompatible deaths and on deaths in high-risk diagnostic groups. ICD conventions contribute to the quality problem since they presuppose that all deaths are due to a single underlying cause. However, in an ageing population an increasing number of deaths are due to an accumulation of etiologically unrelated conditions.
|
18 |
Privacy-Preserving Data Integration in Public Health SurveillanceHu, Jun 16 May 2011 (has links)
With widespread use of the Internet, data is often shared between organizations in B2B health care networks. Integrating data across all sources in a health care network would be useful to public health surveillance and provide a complete view of how the overall network is performing. Because of the lack of standardization for a common data model across organizations, matching identities between different locations in order to link and aggregate records is difficult. Moreover, privacy legislation controls the use of personal information, and health care data is very sensitive in nature so the protection of data privacy and prevention of personal health information leaks is more important than ever. Throughout the process of integrating data sets from different organizations, consent (explicitly or implicitly) and/or permission to use must be in place, data sets must be de-identified, and identity must be protected. Furthermore, one must ensure that combining data sets from different data sources into a single consolidated data set does not create data that may be potentially re-identified even when only summary data records are created.
In this thesis, we propose new privacy preserving data integration protocols for public health surveillance, identify a set of privacy preserving data integration patterns, and propose a supporting framework that combines a methodology and architecture with which to implement these protocols in practice. Our work is validated with two real world case studies that were developed in partnership with two different public health surveillance organizations.
|
19 |
Efficient Disk-Based Techniques for Manipulating Very Large String DatabasesAllam, Amin 18 May 2017 (has links)
Indexing and processing strings are very important topics in database management. Strings can be database records, DNA sequences, protein sequences, or plain text. Various string operations are required for several application categories, such as bioinformatics and entity resolution. When the string count or sizes become very large, several state-of-the-art techniques for indexing and processing such strings may fail or behave very inefficiently. Modifying an existing technique to overcome these issues is not usually straightforward or even possible.
A category of string operations can be facilitated by the suffix tree data structure, which basically indexes a long string to enable efficient finding of any substring of the indexed string, and can be used in other operations as well, such as approximate string matching. In this document, we introduce a novel efficient method to construct the suffix tree index for very long strings using parallel architectures, which is a major challenge in this category.
Another category of string operations require clustering similar strings in order to perform application-specific processing on the resulting possibly-overlapping clusters. In this document, based on clustering similar strings, we introduce a novel efficient technique for record linkage and entity resolution, and a novel method for correcting errors in a large number of small strings (read sequences) generated by the DNA sequencing machines.
|
20 |
Privacy-Enhancing Techniques for Data AnalyticsFang-Yu Rao (6565679) 10 June 2019 (has links)
<div>
<div>
<div>
<p>Organizations today collect and aggregate huge amounts of data from individuals
under various scenarios and for different purposes. Such aggregation of individuals’
data when combined with techniques of data analytics allows organizations to make
informed decisions and predictions. But in many situations, different portions of the
data associated with individuals are collected and curated by different organizations.
To derive more accurate conclusions and predictions, those organization may want to
conduct the analysis based on their joint data, which cannot be simply accomplished
by each organization exchanging its own data with other organizations due to the
sensitive nature of data. Developing approaches for collaborative privacy-preserving
data analytics, however, is a nontrivial task. At least two major challenges have to be
addressed. The first challenge is that the security of the data possessed by each organization should always be properly protected during and after the collaborative analysis
process, whereas the second challenge is the high computational complexity usually
accompanied by cryptographic primitives used to build such privacy-preserving protocols.
</p><p><br></p><p>
</p><div>
<div>
<div>
<p>In this dissertation, based on widely adopted primitives in cryptography, we address the aforementioned challenges by developing techniques for data analytics that
not only allow multiple mutually distrustful parties to perform data analysis on their
joint data in a privacy-preserving manner, but also reduce the time required to complete the analysis. More specifically, using three common data analytics tasks as
concrete examples, we show how to construct the respective privacy-preserving protocols under two different scenarios: (1) the protocols are executed by a collaborative process only involving the participating parties; (2) the protocols are outsourced to
some service providers in the cloud. Two types of optimization for improving the
efficiency of those protocols are also investigated. The first type allows each participating party access to a statistically controlled leakage so as to reduce the amount
of required computation, while the second type utilizes the parallelism that could
be incorporated into the task and pushes some computation to the offline phase to
reduce the time needed for each participating party without any additional leakage.
Extensive experiments are also conducted on real-world datasets to demonstrate the
effectiveness of our proposed techniques.<br></p>
<p> </p>
</div>
</div>
</div>
</div>
</div>
</div>
|
Page generated in 0.0324 seconds