Global ETD Search

1	Implementation of the Apriori algorithm for effective item set mining in VigiBaseTM : Project report in Teknisk Fysik 15 hp Olofsson, Niklas January 2010 (has links) No description available. Apriori datamining Vigibase effective item set mining algorithm
2	Implementation of the Apriori algorithm for effective item set mining in VigiBaseTM : Project report in Teknisk Fysik 15 hp Olofsson, Niklas January 2010 (has links) No description available. Apriori datamining Vigibase effective item set mining algorithm
3	Automatic de-identification of case narratives from spontaneous reports in VigiBase Sahlström, Jakob January 2015 (has links) The use of patient data is essential in research but it is on the other hand confidential and can only be used after acquiring approval from an Ethical Board and informed consent from the individual patient. A large amount of patient data is therefore difficult to obtain if sensitive information, such as names, id numbers and contact details, are not removed from the data, by so called de-identification. Uppsala Monitoring Centre maintains the world's larges database of individual case reports of any suspected adverse drug reaction. There exists, of today, no method for efficiently de-identifying the narrative text included in these which causes countries like the United States of America reports to exclude the narratives in the reports. The aim of this thesis is to develop and evaluate a method for automatic de-identification of case narratives in reports from the WHO Global Individual Case Safety Report Database System, VigiBase. This report compares three different models, namely Regular Expressions, used for text pattern matching, and the statistical models Support Vector Machine (SVM) and Conditional Random Fields (CRF). Performance, advantages and disadvantages are discussed as well as how identified sensitive information is handled to maintain readability of the narrative text. The models developed in this thesis are also compared to existing solutions to the de-identification problem. The 400 reports extracted from VigiBase were already well de-identified in terms of names, ID numbers and contact details, making it difficult to train statistical models on these categories. The reports did however, contain plenty of dates and ages. For these categories Regular Expression would be sufficient to achieve a good performance. To identify entities in other categories more advanced methods such as the SVM and CRF are needed and will require more data. This was prominent when applying the models on the more information rich i2b2 de-identification challenge benchmark data set where the statistical models developed in this thesis performed at a competing level with existing models in the literature. de-identification svm crf regex VigiBase i2b2 Computer and Information Sciences Data- och informationsvetenskap
4	Free-text Informed Duplicate Detection of COVID-19 Vaccine Adverse Event Reports Turesson, Erik January 2022 (has links) To increase medicine safety, researchers use adverse event reports to assess causal relationships between drugs and suspected adverse reactions. VigiBase, the world's largest database of such reports, collects data from numerous sources, introducing the risk of several records referring to the same case. These duplicates negatively affect the quality of data and its analysis. Thus, efforts should be made to detect and clean them automatically. Today, VigiBase holds more than 3.8 million COVID-19 vaccine adverse event reports, making deduplication a challenging problem for existing solutions employed in VigiBase. This thesis project explores methods for this task, explicitly focusing on records with a COVID-19 vaccine. We implement Jaccard similarity, TF-IDF, and BERT to leverage the abundance of information contained in the free-text narratives of the reports. Mean-pooling is applied to create sentence embeddings from word embeddings produced by a pre-trained SapBERT model fine-tuned to maximise the cosine similarity between narratives of duplicate reports. Narrative similarity is quantified by the cosine similarity between sentence embeddings. We apply a Gradient Boosted Decision Tree (GBDT) model for classifying report pairs as duplicates or non-duplicates. For a more calibrated model, logistic regression fine-tunes the leaf values of the GBDT. In addition, the model successfully implements a ruleset to find reports whose narratives mention a unique identifier of its duplicate. The best performing model achieves 73.3% recall and zero false positives on a controlled testing dataset for an F1-score of 84.6%, vastly outperforming VigiBase’s previously implemented model's F1-score of 60.1%. Further, when manually annotated by three reviewers, it reached an average 87% precision when fully deduplicating 11756 reports amongst records relating to hearing disorders. Duplicate detection Deduplication Record linkage Adverse Event Reports COVID-19 Vaccines Uppsala Monitoring Centre VigiBase Machine Learning Gradient Boosted Decision Trees BERT Natural Language Processing Pharmacovigilance Individual Case Safety Reports Engineering and Technology Teknik och teknologier Computer and Information Sciences Data- och informationsvetenskap
5	Creation of a Next-Generation Standardized Drug Groupingfor QT Prolonging Reactions using Machine Learning Techniques Tiensuu, Jacob, Rådahl, Elsa January 2021 (has links) This project aims to support pharmacovigilance, the science and activities relating to drug-safety and prevention of adverse drug reactions (ADRs). We focus on a specific ADR called QT prolongation, a serious reaction affecting the heartbeat. Our main goal is to group medicinal ingredients that might cause QT prolongation. This grouping can be used in safety analysis and for exclusion lists in clinical studies. It should preferably be ranked according to level of suspected correlation. We wished to create an automated and standardised process. Drug safety-related reports describing patients' experienced ADRs and what medicinal products they have taken are collected in a database called VigiBase, that we have used as source for ingredient extraction. The ADRs are described in free-texts and coded using an international standardised terminology. This helps us to process the data and filter ingredients included in a report that describes QT prolongation. To broaden our project scope to include uncoded data, we extended the process to use free-text verbatims describing the ADR as input. By processing and filtering the free-text data and training a classification model for natural language processing released by Google on VigiBase data, we were able to predict if a free-text verbatim is describing QT prolongation. The classification resulted in an F1-score of 98%. For the ingredients extracted from VigiBase, we wanted to validate if there is a known connection to QT prolongation. The VigiBase occurrences is a parameter to consider, but it might be misleading since a report can include several drugs, and a drug can include several ingredients, making it hard to validate the cause. For validation, we used product labels connected to each ingredient of interest. We used a tool to download, scan and code product labels in order to see which ones mention QT prolongation. To rank our final list of ingredients according to level of suspected QT prolongation correlation, we used a multinomial logistic regression model. As training data, we used a data subset manually labeled by pharmacists. Used on unlabeled validation data, the model accuracy was 68%. Analyzing the training data showed that it was not easily separated linearly explaining the limited classification performance. The final ranked list of ingredients suspected to cause QT prolongation consists of 1086 ingredients. Pharmacovigilance Adverse Drug Reactions MedDRA VigiBase WHODrug Global QT prolongation Torsades de Pointes Individual Case Safety Reports Text Recognition Standardised Drug Grouping Multinomial Logistic Regression BERT Other Engineering and Technologies Annan teknik

1

Page generated in 0.0287 seconds