Global ETD Search

151	Recherche d'information clinomique dans le Dossier Patient Informatisé : modélisation, implantation et évaluation. / Clinomics Information Retrieval in Electronic Health Records : Modelling, Implantation and Evaluation Cabot, Chloé 21 December 2017 (has links) Les objectifs de cette thèse s’inscrivent dans la large problématique de recherche d’information dans les données issues du Dossier Patient Informatisé (DPI). Les aspects abordés dans cette problématique sont multiples : d’une part la mise en oeuvre d’une recherche d’information clinomique au sein du DPI et d’autre part la recherche d’information au sein de données non structurées issues du DPI. Dans un premier temps, l’un des objectifs de cette thèse est d’intégrer au sein du DPI des informations dépassant le cadre de la médecine pour intégrer des données, informations et connaissances provenant de la biologie moléculaire ; les données omiques, issues de la génomique, protéomique ou encore métabolomique. L’intégration de ce type de données permet d’améliorer les systèmes d’information en santé, leur interopérabilité ainsi que le traitement et l’exploitation des données à des fins cliniques. Un enjeu important est d’assurer l’intégration de données hétérogènes, grâce à des recherches sur les modèles conceptuels de données, sur les ontologies et serveurs terminologiques et sur les entrepôts sémantiques. L’intégration de ces données et leur interprétation selon un même modèle de données conceptuel sont un verrou important. Enfin, il est important d’intégrer recherche clinique et recherche fondamentale afin d’assurer une continuité des connaissances entre recherche et pratique clinique et afin d’appréhender la problématique de personnalisation des soins. Cette thèse aboutit ainsi à la conception et au développement d’un modèle générique des données omiques exploité dans une application prototype de recherche et visualisation dans les données omiques et cliniques d’un échantillon de 2 000 patients. Le second objectif de ma thèse est l’indexation multi terminologique de documents médicaux à travers le développement de l’outil Extracteur de Concepts Multi-Terminologique (ECMT). Il exploite les terminologies intégrées au portail terminologique Health Terminology/Ontology Portal (HeTOP) pour identifier des concepts dans des documents non structurés. Ainsi, à partir d’un document rédigé par un humain, et donc porteur potentiellement d’erreurs de frappe, d’orthographe ou de grammaire,l’enjeu est d’identifier des concepts et ainsi structurer l’information contenue dans le document. Pour la recherche d’information médicale, l’indexation présente un intérêt incontournable pour la recherche dans les documents non structurés, comme lescomptes-rendus de séjour ou d’examens. Cette thèse propose plusieurs méthodes et leur évaluation suivant deux axes : l’indexation de textes médicaux à l’aide de plusieurs terminologies et le traitement du langage naturel dans les textes médicaux narratifs. / The aim of this thesis is part of the broad issue of information retrieval in Electronic Health Records (EHRs). The aspects tackled in this topic are numerous : on the one hand clinomics information retrieval within EHRs and secondly information retrieval within unstructured data from EHRs. As a first step, one of the objectives is to integrate in EHRs information beyond the scope of medicine to integrate data, information and knowledge from molecular biology ; omic data from genomics, proteomics or metabolomics. The integration of this type of data improves health information systems, their interoperability and the processing and exploitation of data for clinical purposes. An important challenge is to ensure the integration of heterogeneous data, through research on conceptual models of data, ontology and terminology servers, and semantic data warehouses. The integration of this data and their interpretation into a conceptual data model is an important challenge. Finally, it is important to integrate clinical research and fundamental research in order to ensure continuity of knowledge between research and clinical practice and to understand personalized medicine challenges. This thesis thus leads to the design and development of a generic model of omics data exploited in a prototype application for information retrieval and visualization in omic and clinical data within a sample of 2,000 patients. The second objective of this thesis is the multi-terminological indexing of medical documents through the development of the Extracting Concepts with Multiple Terminologies tool (ECMT). It uses terminologies embedded in the Health Terminology/Ontology Portal (HeTOP) to identify concepts in unstructured documents. From a document written by a human, and therefore potentially showing typing errors, spelling or grammar mistakes, the challenge is to identify concepts and thus structure the information contained in the text. In health information retrieval, indexing is of great interest for information retrieval in unstructured documents, such as reports and medical notes. This thesis proposes several methods and their evaluation along two axes : the indexing of medical texts using several terminologies and the processing of natural language in narrative medical notes. Recherche d'information Dossiers patients informatisés Modélisation Extraction d'information Vocabulaires contrôlés Traitement du langage naturel Information retrieval Electronic Health Records Modelling Information Extraction Controlled Vocabularies Natural Language Processing 006.35
152	Investigating the Risk of Adverse Cardiovascular Events Associated with Concomitant Treatment of Clopidogrel and Protein Pump Inhibitors Farhat, Nawal 06 March 2019 (has links) Proton pump inhibitors (PPIs) are commonly coadministered with clopidogrel, an antiplatelet agent, to patients with acute coronary syndrome (ACS). Mechanistic studies suggest that PPIs have the potential to competitively inhibit the bioactivation of clopidogrel and may attenuate its antiplatelet action in the body. The clinical implications of this drug-drug interaction have been extensively studied; however reported findings are inconsistent. More recently, several studies have questioned whether PPIs are associated with adverse cardiovascular events independent of clopidogrel. Given that PPIs and clopidogrel are widely used, it is critical to better understand the clinical impact of the concomitant treatment with both drugs. This thesis includes four studies that investigate the clinical effects of the drug-drug interaction between clopidogrel and PPIs. Chapter 2, a systematic review and meta-analysis, summarizes findings from 118 studies. Findings do not provide strong evidence for an association between adverse cardiovascular events and the use of PPIs when used alone, in combination with clopidogrel, or in combination with other antiplatelets. Chapters 3, 4, and 5 present analyses of real-world data comprised of electronic medical records. Results of these analyses demonstrate 1) that the concomitant use of clopidogrel and PPIs among inpatients was consistent with clinical guidelines suggested by the FDA (Chapter 3); 2) a lack of association between PPI use vs nonuse and four adverse cardiovascular outcomes among clopidogrel users (Chapter 4); and 3) a lack of association between PPI use vs nonuse and adverse cardiovascular outcomes among prasugrel users or ticagrelor users (Chapter 5). Collectively, our findings do not provide evidence of an elevated risk of adverse cardiovascular outcomes with the combined use of PPIs and clopidogrel. Although pharmacodynamic and pharmacokinetic studies have demonstrated an interaction between these two drugs, our findings support the opinion that the biological interaction does not translate into adverse clinical events among patients with acute coronary syndrome. Electronic health records Cardiovascular Drug-drug interactions Proton pump inhibitor Drug safety Epidemiology Systematic review Meta-analysis Acute coronary syndrome Real-world data
153	Using Low-Code Platforms to Collect Patient-Generated Health Data : A Software Developer’s Perspective Hallberg, Agnes January 2021 (has links) The act of people collecting their health data through health apps on their smartphones is becoming increasingly popular. Still, it is difficult for healthcare providers to use this patient-generated health data since health apps cannot easily share its data with the health care providers’ Electronic Health Records (EHR). Simultaneously, it is becoming increasingly popular to use low-code platforms for software development. This thesis explored using low-code platforms to create applications intended to collect patient-generated health data and send it to EHRs by creating a web application prototype with the low-code platforms Mendix and Better EHR Studio. During the web application prototype development, the developer conducted a diary to capture their impressions of Mendix to show how a developer experiences developing in a low-code platform compared to traditional programming. The result shows that it is impractical to create applications intended to collect patient-generated health data with the two low-code platforms chosen. The analysis of the conducted diary showed that using a low-code platform is straightforward but also challenging for an experienced software developer. Low-code Patient-generated health data Electronic Health Records Health apps Citizen developer Software Engineering Programvaruteknik Computer Systems Datorsystem Information Systems
154	Data-driven personalized healthcare : Towards personalized interventions via reinforcement learning for Mobile Health Galozy, Alexander January 2021 (has links) Medical and technological advancement in the last century has led to the unprecedented increase of the populace's quality of life and lifespan. As a result, an ever-increasing number of people live with chronic health conditions that require long-term treatment, resulting in increased healthcare costs and managerial burden to the healthcare provider. This increase in complexity can lead to ineffective decision-making and reduce care quality for the individual while increasing costs. One promising direction to tackle these issues is the active involvement of the patient in managing their care. Particularly for chronic diseases, where ongoing support is often required, patients must understand their illness and be empowered to manage their care. With the advent of smart devices such as smartphones, it is easier than ever to provide personalised digital interventions to patients, help them manage their treatment in their daily lives, and raise awareness about their illness. If such new approaches are to succeed, scalability is necessary, and solutions are needed that can act autonomously without costly human intervention. Furthermore, solutions should exhibit adaptability to the changing circumstances of an individual patient's health, needs and goals. Through the ongoing digitisation of healthcare, we are presented with the unique opportunity to develop cost-effective and scalable solutions through Artificial Intelligence (AI). This thesis presents work that we conducted as part of the project improving Medication Adherence through Person-Centered Care and Adaptive Interventions (iMedA) that aims to provide personalised adaptive interventions to hypertensive patients, supporting them in managing their medication regiment. The focus lies on inadequate medication adherence (MA), a pervasive issue where patients do not take their medication as instructed by their physician. The selection of individuals for intervention through secondary database analysis on Electronic Health Records (EHRs) was a key challenge and is addressed through in-depth analysis of common adherence measures, development of prediction models for MA and discussions on limitations of such approaches for analysing MA. Furthermore, providing personalised adaptive interventions is framed in the contextual bandit setting and addresses the challenge of delivering relevant interventions in environments where contextual information is significantly corrupted. The contributions of the thesis can be summarised as follows: (1) Highlighting the issues encountered in measuring MA through secondary database analysis and providing recommendations to address these issues, (2) Investigating machine learning models developed using EHRs for MA prediction and extraction of common refilling patterns through EHRs and (3) formal problem definition for a novel contextual bandit setting with context uncertainty commonly encountered in Mobile Health and development of an algorithm designed for such environments. Information Driven Care Electronic Health Records Machine Learning Reinforcement Learning Signal Processing Signalbehandling
155	Semantic Spaces of Clinical Text : Leveraging Distributional Semantics for Natural Language Processing of Electronic Health Records Henriksson, Aron January 2013 (has links) The large amounts of clinical data generated by electronic health record systems are an underutilized resource, which, if tapped, has enormous potential to improve health care. Since the majority of this data is in the form of unstructured text, which is challenging to analyze computationally, there is a need for sophisticated clinical language processing methods. Unsupervised methods that exploit statistical properties of the data are particularly valuable due to the limited availability of annotated corpora in the clinical domain. Information extraction and natural language processing systems need to incorporate some knowledge of semantics. One approach exploits the distributional properties of language – more specifically, term co-occurrence information – to model the relative meaning of terms in high-dimensional vector space. Such methods have been used with success in a number of general language processing tasks; however, their application in the clinical domain has previously only been explored to a limited extent. By applying models of distributional semantics to clinical text, semantic spaces can be constructed in a completely unsupervised fashion. Semantic spaces of clinical text can then be utilized in a number of medically relevant applications. The application of distributional semantics in the clinical domain is here demonstrated in three use cases: (1) synonym extraction of medical terms, (2) assignment of diagnosis codes and (3) identification of adverse drug reactions. To apply distributional semantics effectively to a wide range of both general and, in particular, clinical language processing tasks, certain limitations or challenges need to be addressed, such as how to model the meaning of multiword terms and account for the function of negation: a simple means of incorporating paraphrasing and negation in a distributional semantic framework is here proposed and evaluated. The notion of ensembles of semantic spaces is also introduced; these are shown to outperform the use of a single semantic space on the synonym extraction task. This idea allows different models of distributional semantics, with different parameter configurations and induced from different corpora, to be combined. This is not least important in the clinical domain, as it allows potentially limited amounts of clinical data to be supplemented with data from other, more readily available sources. The importance of configuring the dimensionality of semantic spaces, particularly when – as is typically the case in the clinical domain – the vocabulary grows large, is also demonstrated. / De stora mängder kliniska data som genereras i patientjournalsystem är en underutnyttjad resurs med en enorm potential att förbättra hälso- och sjukvården. Då merparten av kliniska data är i form av ostrukturerad text, vilken är utmanande för datorer att analysera, finns det ett behov av sofistikerade metoder som kan behandla kliniskt språk. Metoder som inte kräver märkta exempel utan istället utnyttjar statistiska egenskaper i datamängden är särskilt värdefulla, med tanke på den begränsade tillgången till annoterade korpusar i den kliniska domänen. System för informationsextraktion och språkbehandling behöver innehålla viss kunskap om semantik. En metod går ut på att utnyttja de distributionella egenskaperna hos språk – mer specifikt, statistisk över hur termer samförekommer – för att modellera den relativa betydelsen av termer i ett högdimensionellt vektorrum. Metoden har använts med framgång i en rad uppgifter för behandling av allmänna språk; dess tillämpning i den kliniska domänen har dock endast utforskats i mindre utsträckning. Genom att tillämpa modeller för distributionell semantik på klinisk text kan semantiska rum konstrueras utan någon tillgång till märkta exempel. Semantiska rum av klinisk text kan sedan användas i en rad medicinskt relevanta tillämpningar. Tillämpningen av distributionell semantik i den kliniska domänen illustreras här i tre användningsområden: (1) synonymextraktion av medicinska termer, (2) tilldelning av diagnoskoder och (3) identifiering av läkemedelsbiverkningar. Det krävs dock att vissa begränsningar eller utmaningar adresseras för att möjliggöra en effektiv tillämpning av distributionell semantik på ett brett spektrum av uppgifter som behandlar språk – både allmänt och, i synnerhet, kliniskt – såsom hur man kan modellera betydelsen av flerordstermer och redogöra för funktionen av negation: ett enkelt sätt att modellera parafrasering och negation i ett distributionellt semantiskt ramverk presenteras och utvärderas. Idén om ensembler av semantisk rum introduceras också; dessa överträffer användningen av ett enda semantiskt rum för synonymextraktion. Den här metoden möjliggör en kombination av olika modeller för distributionell semantik, med olika parameterkonfigurationer samt inducerade från olika korpusar. Detta är inte minst viktigt i den kliniska domänen, då det gör det möjligt att komplettera potentiellt begränsade mängder kliniska data med data från andra, mer lättillgängliga källor. Arbetet påvisar också vikten av att konfigurera dimensionaliteten av semantiska rum, i synnerhet när vokabulären är omfattande, vilket är vanligt i den kliniska domänen. / High-Performance Data Mining for Drug Effect Detection (DADEL) distributional semantics random indexing semantic space electronic health records clinical text synonyms diagnosis codes adverse drug reactions
156	Strategies for Implementation of Electronic Health Records Vassell-Webb, Carlene 01 January 2019 (has links) Implementation of electronic health records (EHRs) is a driver for the improvement of health care and the reduction of health care costs. Developing countries face substantial challenges in adopting EHRs. The complex adaptive system conceptual framework was used to guide this single case study to explore strategies that health care leaders used to successfully implement the EHR system. Data were collected from 6 health care leaders from an island in the Caribbean using a semistructured interview technique. Data were analyzed using the Bengtsson's 4-stage data analysis process, which includes decontextualization, recontextualization, categorization, and compilation. The results of the study yielded 5 main themes: training, increased staffing, monitoring, identifying organizational gaps, and time. The implications of the study for positive social change include the potential to improve the standards of care provided to promote improved patient outcomes by using the strategies identified in this study to successfully implement the EHR system. Complex Adaptive System Electronic Health Records Health Information Exchange Health Information Technology Meaningful Use Business Health and Medical Administration
157	Comparing Basic Computer Literacy Self-Assessment Test and Actual Skills Test in Hospital Employees Isaac, Jolly Peter 01 January 2015 (has links) A new hospital in United Arab Emirates (UAE) plans to adopt health information technology (HIT) and become fully digitalized once operational. The hospital has identified a need to assess basic computer literacy of new employees prior to offering them training on various HIT applications. Lack of research in identifying an accurate assessment method for basic computer literacy among health care professionals led to this explanatory correlational research study, which compared self-assessment scores and a simulated actual computer skills test to find an appropriate tool for assessing computer literacy. The theoretical framework of the study was based on constructivist learning theory and self-efficacy theory. Two sets of data from 182 hospital employees were collected and analyzed. A t test revealed that scores of self-assessment were significantly higher than they were on the actual test, which indicated that hospital employees tend to score higher on self-assessment when compared to actual skills test. A Pearson product moment correlation revealed a statistically weak correlation between the scores, which implied that self-assessment scores were not a reliable indicator of how an individual would perform on the actual test. An actual skill test was found to be the more reliable tool to assess basic computer skills when compared to self-assessment test. The findings of the study also identified areas where employees at the local hospital lacked basic computer skills, which led to the development of the project to fill these gaps by providing training on basic computer skills prior to them getting trained on various HIT applications. The findings of the study will be useful for hospitals in UAE who are in the process of adopting HIT and for health information educators to design appropriate training curricula based on assessment of basic computer literacy. Assessment Basic computer skills Computer Literacy Electronic Health Records Health Information Technology Hospital Employees Curriculum and Instruction Library and Information Science
158	THE PERCEIVED AND REAL VALUE OF HEALTH INFORMATION EXCHANGE IN PUBLIC HEALTH SURVEILLANCE Dixon, Brian Edward 22 August 2011 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Public health agencies protect the health and safety of populations. A key function of public health agencies is surveillance or the ongoing, systematic collection, analysis, interpretation, and dissemination of data about health-related events. Recent public health events, such as the H1N1 outbreak, have triggered increased funding for and attention towards the improvement and sustainability of public health agencies’ capacity for surveillance activities. For example, provisions in the final U.S. Centers for Medicare and Medicaid Services (CMS) “meaningful use” criteria ask that physicians and hospitals report surveillance data to public health agencies using electronic laboratory reporting (ELR) and syndromic surveillance functionalities within electronic health record (EHR) systems. Health information exchange (HIE), organized exchange of clinical and financial health data among a network of trusted entities, may be a path towards achieving meaningful use and enhancing the nation’s public health surveillance infrastructure. Yet the evidence on the value of HIE, especially in the context of public health surveillance, is sparse. In this research, the value of HIE to the process of public health surveillance is explored. Specifically, the study describes the real and perceived completeness and usefulness of HIE in public health surveillance activities. To explore the real value of HIE, the study examined ELR data from two states, comparing raw, unedited data sent from hospitals and laboratories to data enhanced by an HIE. To explore the perceived value of HIE, the study examined public health, infection control, and HIE professionals’ perceptions of public health surveillance data and information flows, comparing traditional flows to HIE-enabled ones. Together these methods, along with the existing literature, triangulate the value that HIE does and can provide public health surveillance processes. The study further describes remaining gaps that future research and development projects should explore. The data collected in the study show that public health surveillance activities vary dramatically, encompassing a wide range of paper and electronic methods for receiving and analyzing population health trends. Few public health agencies currently utilize HIE-enabled processes for performing surveillance activities, relying instead on direct reporting of information from hospitals, physicians, and laboratories. Generally HIE is perceived well among public health and infection control professionals, and many of these professionals feel that HIE can improve surveillance methods and population health. Human and financial resource constraints prevent additional public health agencies from participating in burgeoning HIE initiatives. For those agencies that do participate, real value is being added by HIEs. Specifically, HIEs are improving the completeness and semantic interoperability of ELR messages sent from clinical information systems. New investments, policies, and approaches will be necessary to increase public health utilization of HIEs while improving HIEs’ capacity to deliver greater value to public health surveillance processes. Public Health Surveillance Informatics Electronic Laboratory Reporting Syndromic Surveillance Infection Control Data Quality Health Information Exchange Electronic Health Records Public health surveillance Medical informatics
159	Patienters upplevelser av att ha tillgång till egen e-journal : en litteraturöversikt / Patients´experiences of having access to their electronic health records : a literature review Grapenson, Anna, Stenberg, Madelene January 2022 (has links) Bakgrund Användningen av den elektroniska journalen (e-journalen) har sedan år 2020 ökatexceptionellt, delvis till följd av Covid-19 pandemin. Legitimerad hälso- och sjukvårdpersonal är enligt lag skyldiga att dokumentera vårdbesök. E-journalen är ett verktyg för alla som medverkar kring patientens vård och är ett sätt att dela vårdinformation med andra vårdgivare vilket bidrar till en bättre och säkrare vård. Syfte Syftet var att beskriva patienters upplevelser av att ha tillgång till egen e-journal. Metod Icke-systematisk litteraturöversikt som baserats på 15 vetenskapliga originalartiklar med kvantitativ, kvalitativ och mixad metod. PubMed och CINHAL användes som databaser för sökningen som utfördes med hjälp av indexord i kombination med fritextsökningar. Artiklarna genomgick en kvalitetsgranskning baserad på Sophiahemmet Högskolas bedömningsunderlag för vetenskaplig klassificering. Resultatet sammanställdes och analyserades utifrån en integrerad dataanalys. Resultat Resultatet sammanställdes i två huvudkategorier och fem underkategorier. Huvudkategorierna var: Upplevda fördelar av tillgången till e-journalen och Upplevda nackdelar av tillgången till e-journalen. Resultatet visade att de upplevda fördelarna bland annat var en upplevd förbättrad kommunikation mellan vårdpersonal och patienter och en känsla av delaktighet hos patienten. Andra upplevda fördelar var en ökad kunskap för sin sjukdom och sin vård samt en förbättrad möjlighet till egenvård. Upplevda nackdelar anspelar på patientens upplevelse av rädsla och oro kring e-journalen samt språkliga och tekniska svårigheter ett internetbeserat verktyg kan ge. Slutsats Tillgången till e-journalen upplevs i helhet som positiv av patienter. E-journalen kan ses som ett verktyg för dem att bli mer delaktiga i sin vård, få en bättre kunskap om sin vård samt öka sin förmåga till egenvård. Dock framkom även negativa aspekter, så som oro samt språkliga och tekniska svårigheter som inte bör ignoreras. Hälso-och sjukvårdspersonal bör ha insikt i dessa upplevda för-och nackdelar för att kunna stärka de positiva upplevelserna samt förebygga och förhindra de negativa utfallen av e-journalen. / Background Since 2020, the use of the e-journal has increased exceptionally, partly as a result of the Covid-19 pandemic. All licensed healthcare professionals are obliged by law to document healthcare visits. Electric health records (EHR) can be used by all involved in a patient’s care and is a way to share information between healthcare professionals. This in turn provides safer and better healthcare. Aim The aim was to describe patients' experiences of having access to their electronic health records. Method Non-systematic literature review based on 15 original scientific articles using quantitative and qualitative methods. PubMed and CINHAL were used as databases for the search, which was performed using index words in combination with free text searches. The articles underwent a quality review based on Sophiahemmet University's assessment documents for scientific classification and quality regarding studies with a quantitative and qualitative method approach. The results were compiled and analyzed based on integrated data analysis. Results The results were compiled into two main categories and five subcategories. The main categories were: Perceived benefits of access to EHRs and Perceived disadvantages of access to EHRs. The result showed that perceived benefits were improved communication between healthcare professionals and patients and an improved feeling of participation for the patient. Other perceived benefits were the patients’ understanding of the illness and their care, as well as improving the patients’ opportunities for self-care. Perceived obstacles and disadvantages allude to the patients’ experience of fear and anxiety around EHR as well as the difficulties in language and technical obstacles an internet-based tool can cause. Conclusions Access to EHR is generally perceived as positive by patients. EHRs can be seen as a tool for them to become more involved in their care, gain a better knowledge of their care and increase their ability to self-care. However, negative aspects also emerged, such as anxiety and linguistic and technical difficulties that should not be ignored. Healthcare professionals should have insight into these perceived benefits as well as perceived obstacles to strengthen the positive experiences as well as prevent the negative outcomes of EHRs. Electronic health records Patient access to records Patients' experience Patient participation Delaktighet Elektronisk journal Tillgång till e-journal Patienters upplevelse Nursing Omvårdnad
160	Unsupervised machine learning to detect patient subgroups in electronic health records / Identifiering av patientgrupper genom oövervakad maskininlärning av digitala patientjournaler Lütz, Elin January 2019 (has links) The use of Electronic Health Records (EHR) for reporting patient data has been widely adopted by healthcare providers. This data can encompass many forms of medical information such as disease symptoms, results from laboratory tests, ICD-10 classes and other information from patients. Structured EHR data is often high-dimensional and contain many missing values, which impose a complication to many computing problems. Detecting meaningful structures in EHR data could provide meaningful insights in diagnose detection and in development of medical decision support systems. In this work, a subset of EHR data from patient questionnaires is explored through two well-known clustering algorithms: K-Means and Agglomerative Hierarchical. The algorithms were tested on different types of data, primarily raw data and data where missing values have been imputed using different imputation techniques. The primary evaluation index for the clustering algorithms was the silhouette value using euclidean and cosine distance measures. The result showed that natural groupings most likely exist in the data set. Hierarchical clustering created higher quality clusters than k-means, and the cosine measure yielded a good interpretation of distance. The data imputation imposed large effects to the data and likewise to the clustering results, and other or more sophisticated techniques are needed for handling missing values in the data set. / Användandet av digitala journaler för att rapportera patientdata har ökat i takt med digitaliseringen av vården. Dessa data kan innehålla många typer av medicinsk information så som sjukdomssymptom, labbresultat, ICD-10 diagnoskoder och annan patientinformation. EHR data är vanligtvis högdimensionell och innehåller saknade värden, vilket kan leda till beräkningssvårigheter i ett digitalt format. Att upptäcka grupperingar i sådana patientdata kan ge värdefulla insikter inom diagnosprediktion och i utveckling av medicinska beslutsstöd. I detta arbete så undersöker vi en delmängd av digital patientdata som innehåller patientsvar på sjukdomsfrågor. Detta dataset undersöks genom att applicera två populära klustringsalgoritmer: k-means och agglomerativ hierarkisk klustring. Algoritmerna är ställda mot varandra och på olika typer av dataset, primärt rådata och två dataset där saknade värden har ersatts genom imputationstekniker. Det primära utvärderingsmåttet för klustringsalgoritmerna var silhuettvärdet tillsammans med beräknandet av ett euklidiskt distansmått och ett cosinusmått. Resultatet visar att naturliga grupperingar med stor sannolikhet finns att hitta i datasetet. Hierarkisk klustring visade på en högre klusterkvalitet än k-means, och cosinusmåttet var att föredra för detta dataset. Imputation av saknade data ledde till stora förändringar på datastrukturen och således på resultatet av klustringsexperimenten, vilket tyder på att andra och mer avancerade dataspecifika imputationstekniker är att föredra. Machine learning unsupervised learning clustering EHR electronic health records ICD diagnosis codes. Maskininlärning oövervakat lärande klustring EHR digitala patientjournaler ICD diagnoskoder Computer and Information Sciences Data- och informationsvetenskap

Search results