Global ETD Search

91	Exploring Adverse Drug Effect Discovery from Data Mining of Clinical Notes Smith, Joshua Carl 05 July 2012 (has links) Many medications have potentially serious adverse effects detected only after FDA approval. After 80 million people worldwide received prescriptions for the drug rofecoxib (Vioxx), its manufacturer withdrew it from the marketplace in 2004. Epidemiological data showed that it increases risk of heart attack and stroke. Recently, the FDA warned that the commonly prescribed statin drug class (e.g., Lipitor, Zocor, Crestor) may increase risk of memory loss and Type 2 diabetes. These incidents illustrate the difficulty of identifying adverse effects of prescription medications during premarketing trials. Only post-marketing surveillance can detect some types of adverse effects (e.g., those requiring years of exposure). We explored the use of data mining on clinical notes to detect novel adverse drug effects. We constructed a knowledge base using UMLS and other data sources that could classify drug-finding pairs as currently known adverse effects (drug causes finding), known indications (drug treats/prevents finding), or unknown relationship. We used natural language processing (NLP) to extract current medications and clinical findings (including diseases) from 360,000 de-identified history and physical examination (H&P) notes. We identified 35,000 interesting co-occurrences of medication-finding concepts that exceeded threshold probabilities of appearance. These involved ~600 drugs and ~2000 findings. Among the identified pairs are several that the FDA recognized as harmful in postmarketing surveillance, including rofecoxib and heart attack, rofecoxib and stroke, statins and diabetes, and statins and memory loss. Our preliminary results illustrate both the problems and potential of using data mining of clinical notes for adverse drug effect discovery. Biomedical Informatics
92	EVALUATING THE PATIENT-CENTERED AUTOMATED SMS TAGGING ENGINE (PASTE): NATURAL LANGUAGE PROCESSING APPLIED TO PATIENT-GENERATED SMS TEXT MESSAGES Stenner, Shane P. 27 July 2011 (has links) Pilot studies have demonstrated the feasibility of using mobile technologies as a platform for electronic patient-centered medication management. Such tools may be used to intercept drug interactions, stop unintentional medication overdoses, prevent improper scheduling of medications, and to gather real-time data about symptoms, outcomes, and activities of daily living. Unprompted text-message communication with patients using natural language could engage patients in their healthcare but presents unique natural language processing (NLP) challenges. A major technical challenge is to process text messages and output an unambiguous, computable format that can be used by a subsequent medication management system. NLP challenges unique to text message communication include common use of ad hoc abbreviations, acronyms, phonetic lingoes, improper auto-spell correction, and lack of formal punctuation. While models exist for text message normalization, including dictionary substitution and statistical machine translation approaches, we are not aware of any publications that describe an approach specific to patient text messages or to text messages in the domain of medicine. To allow two-way interaction with patients using mobile phone-based short message service (SMS) technology, we developed the Patient-centered Automated SMS Tagging Engine (PASTE). The PASTE webservice uses NLP methods, custom lexicons, and existing knowledge sources, to extract and tag medication concepts and action concepts from patient-generated text messages. A pilot evaluation of PASTE using 130 medication messages anonymously submitted by 16 volunteers established the feasibility of extracting medication information from patient-generated medication messages and suggested improvements. A subsequent evaluation study using 700 patient-generated text messages from 14 teens and 5 adults demonstrated improved performance from the pilot version of PASTE, with F-measures over 90% for medication concepts and medication action concepts when compared to manually tagged messages. We report on recall and precision of PASTE for extracting and tagging medication information from patient messages. Biomedical Informatics
93	A COMPARISON OF BAYESIAN NETWORK STRUCTURE LEARNING ALGORITHMS ON EMERGENCY DEPARTMENT AMBULANCE DIVERSION DATA Leegon, Jeffrey Thomas 28 July 2009 (has links) Use of Bayesian networks (BN) has increased in medicine. Traditionally, BNs have been developed by experts or from the current literature. Several applications implement "off the shelf" BN structure learning algorithms, but few implementations have been evaluated. We compared six "off the shelf" BN structure learning algorithms and an expert-developed BN using two years of data from a curated emergency department (ED) overcrowding database. We used ED ambulance diversion as the reference standard. Eighteen variables selected from a previous study were used for prediction. BN structures were learned from a data set for predicting ED diversion one hour in advance. The data set was discretized using equal frequency and equal width discretization. Each BN structure learning algorithm developed a structure based on each data set. We used area under the receiver operating characteristic curve (AUC), negative log likelihood, and Akaike information criterion to compare the structures as they predicted ED diversion at 1, 2, 4, 6, 8, and 12 hours in advance. Both the training and test data sets contained >100,000 data points. The ED was on ambulance diversion 22% of the time. The machine-learned networks were complex, with >3,000 conditional probabilities, compared to the expert-developed network, with 365. Both the best performing machine-learned structure and the expert-developed network had an AUC of 0.95 predicting diversion at one hour and 0.94 predicting diversion at two hours in advance. The machine-learned BN performed as well at the expert-developed BN. The expert-developed network was parsimonious, but required extensive user involvement. Biomedical Informatics
94	MIASMA: A Medical Informatics Application for Systematic Microbiological Alerts Carnevale, Randy Joseph 14 September 2011 (has links) This PhD Dissertation project had as its objectives: (1) to develop MIASMA, a potentially open-source Medical Informatics Application for Systematic Microbiological Alerts that uses recently developed methods (e.g., from syndromic surveillance and from heuristic observations) to detect single-hospital outbreaks of both commonly occurring and rare bacterial, viral, and fungal species; (2) to deploy MIASMA in the Vanderbilt University Hospital (VUH) for use by the Department of Infection Control and Prevention; (3) to compare the alerting timeliness, positive predictive value, and sensitivity of MIASMA to current VUH infection control practices; and (4) to evaluate the utility of MIASMA when used to supplement current VUH infection control practices. Biomedical Informatics
95	Efficient Development of Electronic Health Record Based Algorithms to Identify Rheumatoid Arthritis Carroll, Robert James 21 October 2011 (has links) Electronic Health Records (EHRs) are valuable tools in clinical and genomic research, providing the ability to generate large patient cohorts for study. Traditionally, EHR-based research is carried out through manual review of patient charts, which is expensive and time consuming, and limits the scalability of EHR-derived genetic or clinical research. The recent introduction of automated phenotype identification algorithms have sped cohort identification, but they also require significant investment to develop. In these studies, we evaluated three aspects of the process of phenotype algorithm implementation and application in the context of Rheumatoid Arthritis (RA), a chronic inflammatory arthritis with known genetic risk factors. The first aspect was whether using a naïve set of features to train a support vector machine (SVM) would have similar performance to models trained using an expert-defined feature set. The second aspect was the effect of training set size on the predictive power of the algorithm for both the naïve and expert-defined sets. The third aspect was the evaluation of the portability across institutions of a trained model using expert-derived features. We show that training an SVM with all available attributes maintains strong performance compared to an SVM trained using an expert-defined set of features. Using an expert-defined feature set allowed for a much smaller training set compared to the naïve feature set, although training set size requirements were much smaller than often used for phenotype algorithm training. We also show the portability of a previously published logistic regression model trained at Partners HealthCare to Vanderbilt and Northwestern Universities. While the original model was portable, models retrained using local data can also improve performance. This research shows the potential for rapid development of new phenotype identification algorithms that may be portable to different EHR systems and institutions. With the application of clinical knowledge in the design, very few training records are required to create strongly predictive models, which could ease the development of models for new conditions. Fast, accurate development of portable phenotype algorithms offers the potential to engender a new era of EHR-based research. Biomedical Informatics
96	Novel Methods for Variable Selection in Non-faithful Domains, Understanding Support Vector Machines, Learning Regions of Bayesian Networks, and Prediction Under Manipulation Brown, Laura E 11 December 2009 (has links) The focus of my research was to develop several novel computational techniques for discovering informative patterns and complex relationships in biomedical data. First, an efficient, heuristic method was developed to search for the features with largest absolute weight in a polynomial Support Vector Machine (SVM) model. This algorithm provides a new ability to understand, conceptualize, visualize, and communicate polynomial SVM models. Second, a new variable selection algorithm, called Feature Space Markov Blanket (FSMB), was designed. FSMB combines the advantages from kernel methods and Markov Blanket-based techniques for variable selection. FSMB was evaluated on several simulated, "difficult" distributions where it identified the Markov Blankets with high sensitivity and specificity. Additionally, it was run on several real world data sets; the resulting classification models are parsimonious (for two data sets, the models consisted of only 2-3 features). On another data set, the Markov Blanket-based method performed poorly; FSMB's improved performance suggests the existence of a complex, multivariate relationship in the underlying domain. Third, a well-cited algorithm for learning Bayesian networks (Max-Min Hill-Climbing, MMHC) was extended to locally learn a region of a Bayesian network. This local method was compared to MMHC in an empirical evaluation. The local method took, as expected, a fraction of the time to learn regions compared to MMHC; of particular interest, the local technique learned regions with equal or better quality. Finally, an approach using the formalism of causal Bayesian networks was designed to make predictions under manipulations; this approach was used in a submission to the Causality Challenge. The approach required the use and combination of the three methods from this research and many state-of-the-art techniques to build and evaluate models. The results of the competition (the submission performed best on one of the four tasks presented) illustrate some of the strengths and weaknesses of causal discovery methods and point to new directions in the field. The methods explored are introductory steps along research paths to explore understanding SVM models, variable selection in non-faithful problems, identifying causal relations in large domains, and learning with manipulations. Biomedical Informatics
97	Algorithms for shotgun proteomics spectral identification and quality assessment Ma, Zeqiang 28 March 2012 (has links) Tandem mass spectrometry-based shotgun proteomics has become a widespread technology for analyzing complex protein mixtures. Assessing the full information content of shotgun proteomics experiments has motivated a series of powerful bioinformatics advances. Here I present three bioinformatics tools for shotgun proteomics spectral identification and quality assessment. The IDBoost tool is a post-identification analysis tool that rescues spectral identifications and corrects identification errors by incorporating the relationships inferred through spectral clustering. The ScanRanker tool offers a way to recover unidentified high quality spectra for additional analysis via the assessment of tandem mass spectral quality. The QuaMeter tool focuses on the quality assessment of shotgun proteomics experiments and provides objective criteria for the evaluation of analytical system variability. Each tool was developed to solve one aspect of problems but together they work coordinately to provide an improved shotgun proteomics data analysis pipeline. The source code and binaries of these tools are available from http://fenchurch.mc.vanderbilt.edu/. Biomedical Informatics
98	A framework for accurate, efficient private record linkage Durham, Elizabeth Ashley 09 April 2012 (has links) Record linkage is the task of identifying records from multiple data sources that refer to the same individual. Private record linkage (PRL) is a variant of the task in which data holders wish to perform linkage without revealing identifiers associated with the records. PRL is desirable in various domains, including health care, where it may not be possible to reveal an individuals identity due to confidentiality requirements. In medicine, PRL can be applied when datasets from multiple care providers are aggregated for biomedical research, thus enriching data quality by reducing duplicate and fragmented information. Additionally, PRL has the potential to improve patient care and minimize the costs associated with replicated services, by bringing together all of a patients information.<p> This dissertation is the first to address the entire life cycle of PRL and introduces a framework for its design and application in practice. Additionally, it addresses how PRL relates to policies that govern the use of medical data, such as the HIPAA Privacy Rule. To accomplish these goals, the framework addresses three crucial and competing aspects of PRL: 1) computational complexity, 2) accuracy, and 3) security. As such, this dissertation is divided into several parts. First, the dissertation begins with an evaluation of current approaches for encoding data for PRL and identifies a Bloom filter-based approach that provides a good balance of these competing aspects. However, such encodings may reveal information when subject to cryptanalysis and so, second, the dissertation presents a refinement of the encoding strategy to mitigate vulnerability without sacrificing linkage accuracy. Third, this dissertation introduces a method to significantly reduce the number of record pair comparisons required, and thus computational complexity, for PRL via the application of locality-sensitive hash functions. Finally, this dissertation reports on an extensive evaluation of the combined application of these methods with real datasets, which illustrates that they outperform existing approaches. Biomedical Informatics
99	REFINING COMPARATIVE PROTEOMICS BY SPECTRAL COUNTING TO ACCOUNT FOR SHARED PEPTIDES AND MULTIPLE SEARCH ENGINES Chen, Yao-Yi 15 May 2012 (has links) Spectral counting has become a widely used approach for comparing protein abundance in label-free shotgun proteomics. However, when analyzing complex samples, the ambiguity of matching between peptides and proteins greatly affects the assessment of peptide and protein differentiation. Meanwhile, the configuration of database searching algorithms that assign peptides to MS/MS spectra may produce different results. Here, I present three strategies to improve comparative proteomics through spectral counting. I show that comparing spectral counts for peptide groups rather than for protein groups forestalls problems introduced by shared peptides. I present four models to combine four popular search engines that lead to significant gains in spectral counting differentiation. Among these models, I demonstrate a powerful vote counting model that scales well for multiple search engines. I also show that semi-tryptic searching outperforms tryptic searching for comparative proteomics. Overall, these techniques considerably improve protein differentiation on the basis of spectral count tables. Biomedical Informatics
100	Chemotherapy Plan Abstraction Method Bhatia, Haresh 10 July 2012 (has links) Purpose: Chemotherapy plan abstraction is an important clinical and research task in medicine. Providers review the treatment plan history and the response to the treatment to make decisions about continuing or changing the current treatment. Likewise, medical researchers want to know the treatment plan history for a cohort of the patients under analysis. It is difficult for providers and researchers to efficiently abstract the sequence and nature of treatment plans from discrete drug events, as recorded by the clinical documentation procedures. I hypothesize that an automated plan abstraction method can accurately abstract medication plans from the temporal sequence of medication event records across multiple cancer domains. Methods: I have developed a data-driven plan abstraction method that takes as input pharmacy chemotherapy dispensing records and produces a sequence of chemotherapy plans. The framework consists of a pre-processing method, the plan abstraction method, and cohort analysis. The performance of the method was tested against a manually annotated gold standard set of chemotherapy plans. The method was first trained and tested on a data set limited to breast cancer and lung cancer patients. The generalizability of the method was then tested on a separate data set that includes all solid tumor cancer diagnoses other than breast and lung cancer. The methods utility was then demonstrated for cohort plan analysis using a data set of medication events from a large breast cancer cohort. Across plan and within plan analyses were performed on the treatment plan history obtained by applying the method to this breast cancer cohort. Results: For performance evaluation, the plan abstraction method was tested on a sample of 341 breast cancer and lung cancer patients with 6,050 chemotherapy medication events, and a sample of 168 non-breast cancer and non-lung cancer patients with solid tumors who had 3,366 chemotherapy medication events. For these two sets, the recall rate was 0.913 and 0.899, and the precision rate was 0.768 and 0.696, respectively. Treatment plan analysis was performed on a separate breast cancer cohort of 554 patients. This cohort consisted of 11,789 chemotherapy medication events, with 1,126 overall total plans and 107 unique plans identified. The analysis of the 5 most frequently prescribed plans shows concordance with national guideline recommendations for plan sequencing, cycle frequency, and number of cycles. A separate analysis of the breast cancer chemotherapeutic the fulvestrant plan for metastatic cancer showed concurrence to published results for the median time to disease progression in a randomized clinical trial. Conclusion: The plan abstraction method can accurately produce a well-structured sequence of chemotherapy plans from individual drug events across multiple cancer domains. Furthermore, I have demonstrated the utility of this method for cohort analysis of a large data set. I believe this plan abstraction method could become an important tool for clinicians and researchers to automatically extract chemotherapy treatment history from electronic health record data sources. Biomedical Informatics

Search results