Global ETD Search

51	A COMPARISON OF BAYESIAN NETWORK STRUCTURE LEARNING ALGORITHMS ON EMERGENCY DEPARTMENT AMBULANCE DIVERSION DATA Leegon, Jeffrey Thomas 28 July 2009 (has links) Use of Bayesian networks (BN) has increased in medicine. Traditionally, BNs have been developed by experts or from the current literature. Several applications implement "off the shelf" BN structure learning algorithms, but few implementations have been evaluated. We compared six "off the shelf" BN structure learning algorithms and an expert-developed BN using two years of data from a curated emergency department (ED) overcrowding database. We used ED ambulance diversion as the reference standard. Eighteen variables selected from a previous study were used for prediction. BN structures were learned from a data set for predicting ED diversion one hour in advance. The data set was discretized using equal frequency and equal width discretization. Each BN structure learning algorithm developed a structure based on each data set. We used area under the receiver operating characteristic curve (AUC), negative log likelihood, and Akaike information criterion to compare the structures as they predicted ED diversion at 1, 2, 4, 6, 8, and 12 hours in advance. Both the training and test data sets contained >100,000 data points. The ED was on ambulance diversion 22% of the time. The machine-learned networks were complex, with >3,000 conditional probabilities, compared to the expert-developed network, with 365. Both the best performing machine-learned structure and the expert-developed network had an AUC of 0.95 predicting diversion at one hour and 0.94 predicting diversion at two hours in advance. The machine-learned BN performed as well at the expert-developed BN. The expert-developed network was parsimonious, but required extensive user involvement. Biomedical Informatics
52	MIASMA: A Medical Informatics Application for Systematic Microbiological Alerts Carnevale, Randy Joseph 14 September 2011 (has links) This PhD Dissertation project had as its objectives: (1) to develop MIASMA, a potentially open-source Medical Informatics Application for Systematic Microbiological Alerts that uses recently developed methods (e.g., from syndromic surveillance and from heuristic observations) to detect single-hospital outbreaks of both commonly occurring and rare bacterial, viral, and fungal species; (2) to deploy MIASMA in the Vanderbilt University Hospital (VUH) for use by the Department of Infection Control and Prevention; (3) to compare the alerting timeliness, positive predictive value, and sensitivity of MIASMA to current VUH infection control practices; and (4) to evaluate the utility of MIASMA when used to supplement current VUH infection control practices. Biomedical Informatics
53	Efficient Development of Electronic Health Record Based Algorithms to Identify Rheumatoid Arthritis Carroll, Robert James 21 October 2011 (has links) Electronic Health Records (EHRs) are valuable tools in clinical and genomic research, providing the ability to generate large patient cohorts for study. Traditionally, EHR-based research is carried out through manual review of patient charts, which is expensive and time consuming, and limits the scalability of EHR-derived genetic or clinical research. The recent introduction of automated phenotype identification algorithms have sped cohort identification, but they also require significant investment to develop. In these studies, we evaluated three aspects of the process of phenotype algorithm implementation and application in the context of Rheumatoid Arthritis (RA), a chronic inflammatory arthritis with known genetic risk factors. The first aspect was whether using a naïve set of features to train a support vector machine (SVM) would have similar performance to models trained using an expert-defined feature set. The second aspect was the effect of training set size on the predictive power of the algorithm for both the naïve and expert-defined sets. The third aspect was the evaluation of the portability across institutions of a trained model using expert-derived features. We show that training an SVM with all available attributes maintains strong performance compared to an SVM trained using an expert-defined set of features. Using an expert-defined feature set allowed for a much smaller training set compared to the naïve feature set, although training set size requirements were much smaller than often used for phenotype algorithm training. We also show the portability of a previously published logistic regression model trained at Partners HealthCare to Vanderbilt and Northwestern Universities. While the original model was portable, models retrained using local data can also improve performance. This research shows the potential for rapid development of new phenotype identification algorithms that may be portable to different EHR systems and institutions. With the application of clinical knowledge in the design, very few training records are required to create strongly predictive models, which could ease the development of models for new conditions. Fast, accurate development of portable phenotype algorithms offers the potential to engender a new era of EHR-based research. Biomedical Informatics
54	Novel Methods for Variable Selection in Non-faithful Domains, Understanding Support Vector Machines, Learning Regions of Bayesian Networks, and Prediction Under Manipulation Brown, Laura E 11 December 2009 (has links) The focus of my research was to develop several novel computational techniques for discovering informative patterns and complex relationships in biomedical data. First, an efficient, heuristic method was developed to search for the features with largest absolute weight in a polynomial Support Vector Machine (SVM) model. This algorithm provides a new ability to understand, conceptualize, visualize, and communicate polynomial SVM models. Second, a new variable selection algorithm, called Feature Space Markov Blanket (FSMB), was designed. FSMB combines the advantages from kernel methods and Markov Blanket-based techniques for variable selection. FSMB was evaluated on several simulated, "difficult" distributions where it identified the Markov Blankets with high sensitivity and specificity. Additionally, it was run on several real world data sets; the resulting classification models are parsimonious (for two data sets, the models consisted of only 2-3 features). On another data set, the Markov Blanket-based method performed poorly; FSMB's improved performance suggests the existence of a complex, multivariate relationship in the underlying domain. Third, a well-cited algorithm for learning Bayesian networks (Max-Min Hill-Climbing, MMHC) was extended to locally learn a region of a Bayesian network. This local method was compared to MMHC in an empirical evaluation. The local method took, as expected, a fraction of the time to learn regions compared to MMHC; of particular interest, the local technique learned regions with equal or better quality. Finally, an approach using the formalism of causal Bayesian networks was designed to make predictions under manipulations; this approach was used in a submission to the Causality Challenge. The approach required the use and combination of the three methods from this research and many state-of-the-art techniques to build and evaluate models. The results of the competition (the submission performed best on one of the four tasks presented) illustrate some of the strengths and weaknesses of causal discovery methods and point to new directions in the field. The methods explored are introductory steps along research paths to explore understanding SVM models, variable selection in non-faithful problems, identifying causal relations in large domains, and learning with manipulations. Biomedical Informatics
55	Algorithms for shotgun proteomics spectral identification and quality assessment Ma, Zeqiang 28 March 2012 (has links) Tandem mass spectrometry-based shotgun proteomics has become a widespread technology for analyzing complex protein mixtures. Assessing the full information content of shotgun proteomics experiments has motivated a series of powerful bioinformatics advances. Here I present three bioinformatics tools for shotgun proteomics spectral identification and quality assessment. The IDBoost tool is a post-identification analysis tool that rescues spectral identifications and corrects identification errors by incorporating the relationships inferred through spectral clustering. The ScanRanker tool offers a way to recover unidentified high quality spectra for additional analysis via the assessment of tandem mass spectral quality. The QuaMeter tool focuses on the quality assessment of shotgun proteomics experiments and provides objective criteria for the evaluation of analytical system variability. Each tool was developed to solve one aspect of problems but together they work coordinately to provide an improved shotgun proteomics data analysis pipeline. The source code and binaries of these tools are available from http://fenchurch.mc.vanderbilt.edu/. Biomedical Informatics
56	A framework for accurate, efficient private record linkage Durham, Elizabeth Ashley 09 April 2012 (has links) Record linkage is the task of identifying records from multiple data sources that refer to the same individual. Private record linkage (PRL) is a variant of the task in which data holders wish to perform linkage without revealing identifiers associated with the records. PRL is desirable in various domains, including health care, where it may not be possible to reveal an individuals identity due to confidentiality requirements. In medicine, PRL can be applied when datasets from multiple care providers are aggregated for biomedical research, thus enriching data quality by reducing duplicate and fragmented information. Additionally, PRL has the potential to improve patient care and minimize the costs associated with replicated services, by bringing together all of a patients information.<p> This dissertation is the first to address the entire life cycle of PRL and introduces a framework for its design and application in practice. Additionally, it addresses how PRL relates to policies that govern the use of medical data, such as the HIPAA Privacy Rule. To accomplish these goals, the framework addresses three crucial and competing aspects of PRL: 1) computational complexity, 2) accuracy, and 3) security. As such, this dissertation is divided into several parts. First, the dissertation begins with an evaluation of current approaches for encoding data for PRL and identifies a Bloom filter-based approach that provides a good balance of these competing aspects. However, such encodings may reveal information when subject to cryptanalysis and so, second, the dissertation presents a refinement of the encoding strategy to mitigate vulnerability without sacrificing linkage accuracy. Third, this dissertation introduces a method to significantly reduce the number of record pair comparisons required, and thus computational complexity, for PRL via the application of locality-sensitive hash functions. Finally, this dissertation reports on an extensive evaluation of the combined application of these methods with real datasets, which illustrates that they outperform existing approaches. Biomedical Informatics
57	REFINING COMPARATIVE PROTEOMICS BY SPECTRAL COUNTING TO ACCOUNT FOR SHARED PEPTIDES AND MULTIPLE SEARCH ENGINES Chen, Yao-Yi 15 May 2012 (has links) Spectral counting has become a widely used approach for comparing protein abundance in label-free shotgun proteomics. However, when analyzing complex samples, the ambiguity of matching between peptides and proteins greatly affects the assessment of peptide and protein differentiation. Meanwhile, the configuration of database searching algorithms that assign peptides to MS/MS spectra may produce different results. Here, I present three strategies to improve comparative proteomics through spectral counting. I show that comparing spectral counts for peptide groups rather than for protein groups forestalls problems introduced by shared peptides. I present four models to combine four popular search engines that lead to significant gains in spectral counting differentiation. Among these models, I demonstrate a powerful vote counting model that scales well for multiple search engines. I also show that semi-tryptic searching outperforms tryptic searching for comparative proteomics. Overall, these techniques considerably improve protein differentiation on the basis of spectral count tables. Biomedical Informatics
58	Chemotherapy Plan Abstraction Method Bhatia, Haresh 10 July 2012 (has links) Purpose: Chemotherapy plan abstraction is an important clinical and research task in medicine. Providers review the treatment plan history and the response to the treatment to make decisions about continuing or changing the current treatment. Likewise, medical researchers want to know the treatment plan history for a cohort of the patients under analysis. It is difficult for providers and researchers to efficiently abstract the sequence and nature of treatment plans from discrete drug events, as recorded by the clinical documentation procedures. I hypothesize that an automated plan abstraction method can accurately abstract medication plans from the temporal sequence of medication event records across multiple cancer domains. Methods: I have developed a data-driven plan abstraction method that takes as input pharmacy chemotherapy dispensing records and produces a sequence of chemotherapy plans. The framework consists of a pre-processing method, the plan abstraction method, and cohort analysis. The performance of the method was tested against a manually annotated gold standard set of chemotherapy plans. The method was first trained and tested on a data set limited to breast cancer and lung cancer patients. The generalizability of the method was then tested on a separate data set that includes all solid tumor cancer diagnoses other than breast and lung cancer. The methods utility was then demonstrated for cohort plan analysis using a data set of medication events from a large breast cancer cohort. Across plan and within plan analyses were performed on the treatment plan history obtained by applying the method to this breast cancer cohort. Results: For performance evaluation, the plan abstraction method was tested on a sample of 341 breast cancer and lung cancer patients with 6,050 chemotherapy medication events, and a sample of 168 non-breast cancer and non-lung cancer patients with solid tumors who had 3,366 chemotherapy medication events. For these two sets, the recall rate was 0.913 and 0.899, and the precision rate was 0.768 and 0.696, respectively. Treatment plan analysis was performed on a separate breast cancer cohort of 554 patients. This cohort consisted of 11,789 chemotherapy medication events, with 1,126 overall total plans and 107 unique plans identified. The analysis of the 5 most frequently prescribed plans shows concordance with national guideline recommendations for plan sequencing, cycle frequency, and number of cycles. A separate analysis of the breast cancer chemotherapeutic the fulvestrant plan for metastatic cancer showed concurrence to published results for the median time to disease progression in a randomized clinical trial. Conclusion: The plan abstraction method can accurately produce a well-structured sequence of chemotherapy plans from individual drug events across multiple cancer domains. Furthermore, I have demonstrated the utility of this method for cohort analysis of a large data set. I believe this plan abstraction method could become an important tool for clinicians and researchers to automatically extract chemotherapy treatment history from electronic health record data sources. Biomedical Informatics
59	Understanding Delivery of Computer-based Intensive Insulin Therapy Campion, Jr., Thomas Richmond 02 August 2010 (has links) Intensive insulin therapy (IIT), a nurse-driven protocol combining frequent blood glucose testing and insulin administration to tightly control blood glucose, became the standard of critical care following a 2001 study. Many institutions subsequently implemented computer-based clinical decision support systems (CDSSs) for IIT. However, recent studies question IITs benefit and safety. Whereas previous research investigated effects of patient characteristics on IIT performance, this dissertation evaluated IIT CDSS with respect to the interaction of people, process, and technology. An organizational analysis using institutional theory explored the influence of peers, regulators, and professions in IIT adoption. A literature review and case study demonstrated the underreported role of social, organizational, and contextual factors affecting IIT CDSS. A quantitative analysis of system records established the frequency and effect of blood glucose data mismatches as well as characteristics and effects of nurse dosing overrides on IIT CDSS performance. An ethnographic study of nurse workflow yielded understanding of how IIT CDSS functions with respect to other clinical information systems and care processes. Using a mixed quantitative and qualitative approach informed by social theory, this research demonstrates how sociotechnical interactions affect IIT CDSS and may be leveraged to improve care delivery. Biomedical Informatics
60	THE SYSTEMATIC ASCERTAINMENT OF STRUCTURED FAMILY HEALTH INFORMATION USING AN ONLINE PATIENT PORTAL Holt, Jonathan Andrew 30 July 2012 (has links) Family Health Information (FHI) helps identify individuals who are at increased risk for adverse health conditions due to inherited genetic or environmental predisposition. Appropriate stratification of patients based on familial risk relies on the clinicians ability to ascertain and the patients ability to report complete and accurate FHI. Complicating this factor, the collection of detailed FHI often requires more time than is available in the typical patient encounter in the primary care setting. As a result, FHI is often inconsistently and ineffectively communicated during clinical encounters, leading to FHI that is often incomplete, thus limiting its potential use for clinical decision-making. Yet, FHI epitomizes a cost effective strategy, and is critical to the emerging practice of genome-informed and personalized medicine. This thesis describes the development and evaluation of the www.MyFamilyatVanderbilt.com (MyFaV), a web-based portal for ascertaining structured FHI directly from patients. Biomedical Informatics

Search results