Spelling suggestions: "subject:"record linkage"" "subject:"onecord linkage""
21 |
Targeting Non-obvious Errors in Death CertificatesJohansson, Lars Age January 2008 (has links)
<p>Mortality statistics are much used although their accuracy is often questioned. Producers of mortality statistics check for errors in death certification but current methods only capture obvious mistakes. This thesis investigates whether non-obvious errors can be found by linking death certificates to hospital discharge data.</p><p>Data: 69,818 deaths in Sweden 1995. Paper I: Analysing differences between the underlying cause of death from the death certificate (UC) and the main discharge condition from the patient’s last hospitalization (MDC). Paper II: Testing whether differences can be explained by ICD definitions of UC and MDC. Paper III: Surveying methods in 44 current studies on the accuracy of death certificates. Paper IV: Checking death certificates against case summaries for: i) 573 deaths where UC and MDC were the same or the difference could be explained; ii) 562 deaths where the difference could not be explained.</p><p>Results: In 54% of deaths the MDC differed from the UC. Almost two-thirds of the differences were medically compatible since the MDC might have developed as a complication of the UC. Of 44 recent evaluation studies, only 8 describe the methods in such detail that the study could be replicated. Incompatibility between MDC and UC indicates a four-fold risk that the death certificate is inaccurate. For some diagnostic groups, however, death certificates are often inaccurate even when the UC and MDC are compatible.</p><p>Conclusion: Producers of official mortality statistics could reduce the number of non-obvious errors in the statistics by collecting additional information on incompatible deaths and on deaths in high-risk diagnostic groups. ICD conventions contribute to the quality problem since they presuppose that all deaths are due to a single underlying cause. However, in an ageing population an increasing number of deaths are due to an accumulation of etiologically unrelated conditions.</p>
|
22 |
Privacy-Preserving Data Integration in Public Health SurveillanceHu, Jun 16 May 2011 (has links)
With widespread use of the Internet, data is often shared between organizations in B2B health care networks. Integrating data across all sources in a health care network would be useful to public health surveillance and provide a complete view of how the overall network is performing. Because of the lack of standardization for a common data model across organizations, matching identities between different locations in order to link and aggregate records is difficult. Moreover, privacy legislation controls the use of personal information, and health care data is very sensitive in nature so the protection of data privacy and prevention of personal health information leaks is more important than ever. Throughout the process of integrating data sets from different organizations, consent (explicitly or implicitly) and/or permission to use must be in place, data sets must be de-identified, and identity must be protected. Furthermore, one must ensure that combining data sets from different data sources into a single consolidated data set does not create data that may be potentially re-identified even when only summary data records are created.
In this thesis, we propose new privacy preserving data integration protocols for public health surveillance, identify a set of privacy preserving data integration patterns, and propose a supporting framework that combines a methodology and architecture with which to implement these protocols in practice. Our work is validated with two real world case studies that were developed in partnership with two different public health surveillance organizations.
|
23 |
Users of a hospital emergency department : Diagnoses and mortality of those discharged home from the emergency departmentGunnarsdóttir, Oddný January 2005 (has links)
Objectives – To ascertain the annual number of users who were discharged home after visits to the emergency department, grouped by age, gender and number of visits during the calendar year, and to assess whether an increasing number of visits to the department predicted a higher mortality. Methods – This is a retrospective cohort study, at the emergency department of Landspitali University Hospital, Reykjavik capital city area, Iceland. During the years of 1995 to 2001 19259 users visited the emergency department, and were discharged home and they were follow-up for cause specific mortality through a national registry. Standardised mortality ratio, with expected number based on national mortality rates was calculated and hazard ratios according to number of visits per calendar year using time dependent multivariate regression analysis were computed. Results – The annual increase of visits to the emergency department among the patients discharged home was seven to 14 per cent per age group during the period 1995 to 2001, with a highest increase among older men. The most common discharge diagnosis was the category Symptoms, signs and abnormal clinical and laboratory findings not elsewhere classified. When emergency department users were compared with the general population, the standardised mortality ratio was 1.81 for men and 1.93 for women. Among those attending the emergency department two times, and three or more times in a calendar year, the mortality rate was higher than among those coming only once in a year. The causes of death which led to the highest mortality among frequent users of the emergency department were neoplasm, ischemic heart diseases, and the category external causes, particularly drug intoxication, suicides and probable suicides. Conclusions – The mortality of users of the emergency department who had been discharged home turned out to be higher than that of the general population. Frequent users of the emergency department had a higher mortality than those visiting the department no more than once in a year. Since the emergency department serves general medicine and surgery patients, not injuries, the high mortality due to drug intoxication, suicide and probable suicide is notable. Further studies are needed into the diagnosis at discharge of those frequently using emergency departments, in an attempt to understand and possibly prevent this mortality / <p>ISBN 91-7997-128-8</p>
|
24 |
Record Linkage for Web DataHassanzadeh, Oktie 15 August 2013 (has links)
Record linkage refers to the task of finding and linking records (in a single database or in a set of data sources) that refer to the same entity. Automating the record linkage process is a challenging problem, and has been the topic of extensive research for many years. However, the changing nature of the linkage process and the growing size of data sources create new challenges for this task.
This thesis studies the record linkage problem for Web data sources. Our hypothesis is that a generic and extensible set of linkage algorithms combined within an easy-to-use framework that integrates and allows tailoring and combining of these algorithms can be used to effectively link large collections of Web data from different domains.
To this end, we first present a framework for record linkage over relational data, motivated by the fact that many Web data sources are powered by relational database engines. This framework is based on declarative specification of the linkage requirements by the user and allows linking records in many real-world scenarios. We present algorithms for translation of these requirements to queries that can run over a relational data source, potentially using a semantic knowledge base to enhance the accuracy of link discovery.
Effective specification of requirements for linking records across multiple data sources requires understanding the schema of each source, identifying attributes that can be used for linkage, and their corresponding attributes in other sources. Schema or attribute matching is often done with the goal of aligning schemas, so attributes are matched if they play semantically related roles in their schemas. In contrast, we seek to find attributes that can be used to link records between data sources, which we refer to as linkage points. In this thesis, we define the notion of linkage points and present the first linkage point discovery algorithms.
We then address the novel problem of how to publish Web data in a way that facilitates record linkage. We hypothesize that careful use of existing, curated Web sources (their data and structure) can guide the creation of conceptual models for semi-structured Web data that in turn facilitate record linkage with these curated sources. Our solution is an end-to-end framework for data transformation and publication, which includes novel algorithms for identification of entity types and their relationships out of semi-structured Web data. A highlight of this thesis is showcasing the application of the proposed algorithms and frameworks in real applications and publishing the results as high-quality data sources on the Web.
|
25 |
Record Linkage for Web DataHassanzadeh, Oktie 15 August 2013 (has links)
Record linkage refers to the task of finding and linking records (in a single database or in a set of data sources) that refer to the same entity. Automating the record linkage process is a challenging problem, and has been the topic of extensive research for many years. However, the changing nature of the linkage process and the growing size of data sources create new challenges for this task.
This thesis studies the record linkage problem for Web data sources. Our hypothesis is that a generic and extensible set of linkage algorithms combined within an easy-to-use framework that integrates and allows tailoring and combining of these algorithms can be used to effectively link large collections of Web data from different domains.
To this end, we first present a framework for record linkage over relational data, motivated by the fact that many Web data sources are powered by relational database engines. This framework is based on declarative specification of the linkage requirements by the user and allows linking records in many real-world scenarios. We present algorithms for translation of these requirements to queries that can run over a relational data source, potentially using a semantic knowledge base to enhance the accuracy of link discovery.
Effective specification of requirements for linking records across multiple data sources requires understanding the schema of each source, identifying attributes that can be used for linkage, and their corresponding attributes in other sources. Schema or attribute matching is often done with the goal of aligning schemas, so attributes are matched if they play semantically related roles in their schemas. In contrast, we seek to find attributes that can be used to link records between data sources, which we refer to as linkage points. In this thesis, we define the notion of linkage points and present the first linkage point discovery algorithms.
We then address the novel problem of how to publish Web data in a way that facilitates record linkage. We hypothesize that careful use of existing, curated Web sources (their data and structure) can guide the creation of conceptual models for semi-structured Web data that in turn facilitate record linkage with these curated sources. Our solution is an end-to-end framework for data transformation and publication, which includes novel algorithms for identification of entity types and their relationships out of semi-structured Web data. A highlight of this thesis is showcasing the application of the proposed algorithms and frameworks in real applications and publishing the results as high-quality data sources on the Web.
|
26 |
Targeting non-obvious errors in death certificates /Johansson, Lars Age, January 2008 (has links)
Diss. (sammanfattning) Uppsala : Uppsala universitet, 2008. / Härtill 4 uppsatser.
|
27 |
Segmentação de nome e endereço por meio de modelos escondidos de Markov e sua aplicação em processos de vinculação de registros / Segmentation of names and addresses through hidden Markov models and its application in record linkageRita de Cássia Braga Gonçalves 11 December 2013 (has links)
A segmentação dos nomes nas suas partes constitutivas é uma etapa fundamental no processo de integração de bases de dados por meio das técnicas de vinculação de registros. Esta separação dos nomes pode ser realizada de diferentes maneiras. Este estudo teve como objetivo avaliar a utilização do Modelo Escondido de Markov (HMM) na segmentação nomes e endereços de pessoas e a eficiência desta segmentação no processo de vinculação de registros. Foram utilizadas as bases do Sistema de Informações sobre Mortalidade (SIM) e do Subsistema de Informação de Procedimentos de Alta Complexidade (APAC) do estado do Rio de Janeiro no período entre 1999 a 2004. Uma metodologia foi proposta para a segmentação de nome e endereço sendo composta por oito fases, utilizando rotinas implementadas em PL/SQL e a biblioteca JAHMM, implementação na linguagem Java de algoritmos de HMM. Uma amostra aleatória de 100 registros de cada base foi utilizada para verificar a correção do processo de segmentação por meio do modelo HMM.Para verificar o efeito da segmentação do nome por meio do HMM, três processos de vinculação foram aplicados sobre uma amostra das duas bases citadas acima, cada um deles utilizando diferentes estratégias de segmentação, a saber: 1) divisão dos nomes pela primeira parte, última parte e iniciais do nome do meio; 2) divisão do nome em cinco partes; (3) segmentação segundo o HMM. A aplicação do modelo HMM como mecanismo de segmentação obteve boa concordância quando comparado com o observador humano. As diferentes estratégias de segmentação geraram resultados bastante similares na vinculação de registros, tendo a estratégia 1 obtido um desempenho pouco melhor que as demais. Este estudo sugere que a segmentação de nomes brasileiros por meio do modelo escondido de Markov não é mais eficaz do que métodos tradicionais de segmentação. / The segmentation of names into its constituent parts is a fundamental step in the integration of databases by means of record linkage techniques. This segmentation can be accomplished in different ways. This study aimed to evaluate the use of Hidden Markov Models (HMM) in the segmentation names and addresses of people and the efficiency of the segmentation on the record linkage process. Databases of the Information System on Mortality (SIM in portuguese) and Information Subsystem for High Complexity Procedures (APAC in portuguese) of the state of Rio de Janeiro between 1999 and 2004 were used. A method composed of eight stages has been proposed for segmenting the names and addresses using routines implemented in PL/SQL and a library called JAHMM, a Java implementation of HMM algorithms. A random sample of 100 records in each database was used to verify the correctness of the segmentation process using the hidden Markov model. In order to verify the effect of segmenting the names through the HMM, three record linkage process were applied on a sample of the aforementioned databases, each of them using a different segmentation strategy, namely: 1) dividing the name into first name , last name, and middle initials; 2) division of the name into five parts; 3) segmentation by HMM. The HMM segmentation mechanism was in good agreement when compared to a human observer. The three linkage processes produced very similar results, with the first strategy performing a little better than the others. This study suggests that the segmentation of Brazilian names by means of HMM is not more efficient than the traditional segmentation methods.
|
28 |
Segmentação de nome e endereço por meio de modelos escondidos de Markov e sua aplicação em processos de vinculação de registros / Segmentation of names and addresses through hidden Markov models and its application in record linkageRita de Cássia Braga Gonçalves 11 December 2013 (has links)
A segmentação dos nomes nas suas partes constitutivas é uma etapa fundamental no processo de integração de bases de dados por meio das técnicas de vinculação de registros. Esta separação dos nomes pode ser realizada de diferentes maneiras. Este estudo teve como objetivo avaliar a utilização do Modelo Escondido de Markov (HMM) na segmentação nomes e endereços de pessoas e a eficiência desta segmentação no processo de vinculação de registros. Foram utilizadas as bases do Sistema de Informações sobre Mortalidade (SIM) e do Subsistema de Informação de Procedimentos de Alta Complexidade (APAC) do estado do Rio de Janeiro no período entre 1999 a 2004. Uma metodologia foi proposta para a segmentação de nome e endereço sendo composta por oito fases, utilizando rotinas implementadas em PL/SQL e a biblioteca JAHMM, implementação na linguagem Java de algoritmos de HMM. Uma amostra aleatória de 100 registros de cada base foi utilizada para verificar a correção do processo de segmentação por meio do modelo HMM.Para verificar o efeito da segmentação do nome por meio do HMM, três processos de vinculação foram aplicados sobre uma amostra das duas bases citadas acima, cada um deles utilizando diferentes estratégias de segmentação, a saber: 1) divisão dos nomes pela primeira parte, última parte e iniciais do nome do meio; 2) divisão do nome em cinco partes; (3) segmentação segundo o HMM. A aplicação do modelo HMM como mecanismo de segmentação obteve boa concordância quando comparado com o observador humano. As diferentes estratégias de segmentação geraram resultados bastante similares na vinculação de registros, tendo a estratégia 1 obtido um desempenho pouco melhor que as demais. Este estudo sugere que a segmentação de nomes brasileiros por meio do modelo escondido de Markov não é mais eficaz do que métodos tradicionais de segmentação. / The segmentation of names into its constituent parts is a fundamental step in the integration of databases by means of record linkage techniques. This segmentation can be accomplished in different ways. This study aimed to evaluate the use of Hidden Markov Models (HMM) in the segmentation names and addresses of people and the efficiency of the segmentation on the record linkage process. Databases of the Information System on Mortality (SIM in portuguese) and Information Subsystem for High Complexity Procedures (APAC in portuguese) of the state of Rio de Janeiro between 1999 and 2004 were used. A method composed of eight stages has been proposed for segmenting the names and addresses using routines implemented in PL/SQL and a library called JAHMM, a Java implementation of HMM algorithms. A random sample of 100 records in each database was used to verify the correctness of the segmentation process using the hidden Markov model. In order to verify the effect of segmenting the names through the HMM, three record linkage process were applied on a sample of the aforementioned databases, each of them using a different segmentation strategy, namely: 1) dividing the name into first name , last name, and middle initials; 2) division of the name into five parts; 3) segmentation by HMM. The HMM segmentation mechanism was in good agreement when compared to a human observer. The three linkage processes produced very similar results, with the first strategy performing a little better than the others. This study suggests that the segmentation of Brazilian names by means of HMM is not more efficient than the traditional segmentation methods.
|
29 |
Privacy-Preserving Data Integration in Public Health SurveillanceHu, Jun January 2011 (has links)
With widespread use of the Internet, data is often shared between organizations in B2B health care networks. Integrating data across all sources in a health care network would be useful to public health surveillance and provide a complete view of how the overall network is performing. Because of the lack of standardization for a common data model across organizations, matching identities between different locations in order to link and aggregate records is difficult. Moreover, privacy legislation controls the use of personal information, and health care data is very sensitive in nature so the protection of data privacy and prevention of personal health information leaks is more important than ever. Throughout the process of integrating data sets from different organizations, consent (explicitly or implicitly) and/or permission to use must be in place, data sets must be de-identified, and identity must be protected. Furthermore, one must ensure that combining data sets from different data sources into a single consolidated data set does not create data that may be potentially re-identified even when only summary data records are created.
In this thesis, we propose new privacy preserving data integration protocols for public health surveillance, identify a set of privacy preserving data integration patterns, and propose a supporting framework that combines a methodology and architecture with which to implement these protocols in practice. Our work is validated with two real world case studies that were developed in partnership with two different public health surveillance organizations.
|
30 |
Privacy-Preserving Patient Tracking for Phase 1 Clinical TrialsFarah, Hanna Ibrahim January 2015 (has links)
Electronic data has become the standard method of storing information in our modern age.
Evolving from paper-based data to electronic data creates opportunities to share information
between organizations in record speeds, especially when handling large data sets. However,
sharing sensitive information creates requirements for electronic data exchange: privacy requires
that the original data will not be revealed to unauthorized parties. In the healthcare sector in
particular, there are two important use cases that require exchanging information in a privacy-preserving
way. 1. Contract research organizations (CROs) need to verify the eligibility of a participant in a
phase 1 clinical trial. One criterion is checking that an individual is not concurrently
enrolled in a trial at another CRO. However, privacy laws and the maintenance of a
private list of participants for competitive purposes prevent CROs from checking against
that criterion. 2. A patient’s medical record is usually distributed amongst several healthcare
organizations. To improve healthcare services, it is important to have a patient’s complete
medical history: either to help diagnose an illness or to gather statistics for better disease
control. However, patient medical files need to be confidential. Two healthcare
organizations cannot link their large patient databases by disclosing identity revealing
details (e.g., names or health card numbers). This thesis presents the development and evaluation of protocols capable of querying and linking
datasets in a privacy-preserving manner: TRACK for checking concurrent enrolment in phase 1
clinical trials, and SHARE for linking two large datasets in terms of millions of (patient medical)
records. These protocols are better than existing approaches in terms of the privacy protection
level they offer (e.g., against dictionary and frequency attacks), of the reliance on trusted third
parties, and of performance when performing blocking. These protocols were extensively
validated in simulated scenarios similar to their real-world counterparts. The thesis presents novel identity representation schemes that offer strong privacy
measures while being efficient for very large databases. These schemes may be used by other
researchers to represent identity in different use cases. CROs may implement the protocols (and
especially TRACK) in systems to check if an individual exists in another CRO’s dataset without
revealing the identity of that individual. Two healthcare organizations may use a system based
on this research (and especially the SHARE protocol) to discover their common patients while
protecting the identities of the other patients.
|
Page generated in 0.0504 seconds