Global ETD Search

81	A PROCESS MODELING STRATEGY TO LEARN ISCHEMIC STROKE TREATMENT PATTERNS FROM ELECTRONIC MEDICAL RECORDS Sulieman, Lina Mahmoud 30 July 2014 (has links) Process mining corresponds to a collection of methodologies designed to extract knowledge from event logs (e.g., time-stamped events) and provide a description about the underlying processes of a system. Various approaches have been developed and successfully applied to characterize, as well assess the efficiency of, the processes in traditional information management systems. In many instances, the clinical setting can be represented as a sequence of events that are aligned to deliver the best outcome. As such, to date, there have been several attempts to apply process mining techniques to learn and describe clinical workflows by learning frequent patterns from the event logs of electronic medical record (EMR) systems. However, the existing sets of techniques are designed to work with highly-structured data and systematic processes, such as those that occur immediately before and after a surgery. As such, the existing set of clinical processes that can be learned via such methods are limited in that they are 1) cumbersome and very detailed which will be difficult to read and analyze, 2) and fail to describe the actions invoked to treat subpopulations within a cohort of patients admitted for the same disease. This thesis introduces a multi-step process mining strategy, called Treatment Mining using Frequent Sequential Patterns (TM-FSP), to learn clinical workflows from high-dimensional patient episodes. TM-FSP filters the time-ordered sets of medication classes and laboratory test types into frequent events to represent the data in a lower-dimensional form. Next, patient event sequences are subject to a multiple sequence alignment strategy and clustered based on the similarity of their aligned event patterns. Finally, the common actions for each cluster are extracted and reported as workflows. We evaluated TM-FSP with a cohort of 133 patients diagnosed with ischemic stroke at the Vanderbilt University Medical Center. The results illustrate that 7 medications and 12 laboratory test forms 2,020 patterns that are associated with the treatment of this cohort. Moreover, it was discovered that subgroups of patients, who are influenced by lipid metabolism disorders lead to variation in their treatment by excluding Beta blockers and Insulin from their treatment course. Biomedical Informatics
82	Discovery and Replication of Pathway-Based Trans-Expression Quantitative Trait Loci Wiley, Laura Katherine 17 July 2014 (has links) A logical mechanism by which SNPs affect the pathophysiology of disease is through altering the expression of genes. Several studies have explored how SNPs alter expression of nearby genes (cis-eQTLs), but far fewer studies have explored distant effects (trans-eQTLs). This is likely due to the dramatic expansion of statistical tests required and the limited interpretability of results. We hypothesize that distant effects seen in trans-eQTLs are propagated or mediated by biological pathways. To investigate this hypothesis, we performed a focused trans-eQTL analysis on SNPs with known cis-effects by applying Signaling Pathway Impact Analysis (SPIA) to two independent datasets that have both genotype and gene-expression data. Fifteen SNP-Pathway associations were identified and replicated after correction for multiple testing. Given our requirement that all SNPs have cis-effects we performed conditional analyses to determine the effect of the cis-gene expression on our SNP-Pathway associations. Additionally, we annotated these results for functional elements from the ENCODE project to determine biological plausibility and generalizability. In summary, we identify trans-eQTL effects within the context of biological pathways that replicate across multiple ethnic populations. Biomedical Informatics
83	A Computational Analysis on Gene Fusions in Human Cancer Harrell, Morgan Rachel 22 July 2014 (has links) Gene fusions are instances where two discrete genes incorrectly join together. They are common mutations in cancer, and, since the advent of next generation sequencing technology, many gene fusions in cancer tissues have been discovered and cataloged. We utilized the rapidly growing pool of information on gene fusions in human cancer to form projections on gene fusion mutations. We test two hypotheses: 1) identifiable motifs and entropy patterns exist at breakpoints that form fusions, and 2) gene fusions are more connected than randomly generated mutations in the biological networks. This thesis project has three related computational analyses: 1) motif discovery to examine common sequence patterns at and around breakpoints that form fusions, 2) entropy sliding-window analysis to determine structural characteristics at and around breakpoints that form fusions, and 3) gene-fusion network analysis to visualize and compare cancer-associated gene fusion metrics versus controls. We found no over-represented motifs at breakpoints that form gene fusions. We characterized a common entropy change at breakpoints. This feature may help us to predict gene fusions as part of prediction algorithms. Finally, we found that network metrics may be useful toward understanding the role gene fusions have in cancers. Biomedical Informatics
84	Using Evolutionarily-Based Correlation Measures and Machine Learning to Improve Protein Structure Prediction in BCL::Fold Teixeira, Pedro Luis, Jr. 26 June 2014 (has links) De novo protein structure prediction is a challenge due to the sheer size of the search space. One can limit the set of potential models with long-range contact restraints (positions distant in the primary sequence but known to be in close proximity within the tertiary structure). Most available contact prediction methods achieve accuracies insufficient for de novo protein folding. Direct Information (DI), which finds the minimal set of correlations that explains all global correlation, is a notable exception. DI has been used to determine the structures of some membrane and soluble proteins with large numbers of homologous sequences compiled into large sequence alignments. However, DI has many limitations. I have leveraged machine learning methods to predict contacts more accurately by combining DI with sequence information thereby improving protein structure prediction accuracy in the Biochemical Library (BCL). The BCL is a C++ library developed in the Meiler lab. This innovative resource will augment the elucidation of traditionally challenging membrane protein structures specifically larger proteins, which are computationally difficult to address. Biomedical Informatics
85	A Prediction Model for Disease-Specific 30-Day Readmission Mize, Dara Lee Eckerle 23 March 2018 (has links) The Hospital Readmissions Reduction Program (HRRP) permits Centers for Medicare and Medicaid Services (CMS) to reduce reimbursement to hospitals with excess 30-day unplanned readmissions. Diabetes disproportionately impacts the hospitalized patient population, affecting 25-30% of admissions and increases the risk for unplanned readmission. We hypothesized that a readmission risk prediction model for hospitalized patients with type 2 diabetes using machine learning and a diagnosis-specific 30-day readmission outcome will outperform traditional prediction models. We demonstrate that L1 penalized logistic regression and random forest show improved discriminatory performance over LACE, a commonly used logistic regression-based model predicting all-cause readmission. L1 penalized logistic regression is also well-calibrated, efficient and produces interpretable results through feature selection. Random forest was less well-calibrated consistent with its use in other areas of the biomedical literature. In the setting of class imbalance, all of our models suffered from low precision at low thresholds near the outcome prevalence. Random forest precision improved when evaluated at higher thresholds enabling application in a clinical setting. Using an approach that includes a diagnosis-specific outcome enables actionable models for use by disease-specific service lines. Prospective evaluation is needed to assess the validity of this approach and to evaluate for overfitting in the setting of class imbalance. Biomedical Informatics
86	Constraint on Rare Protein-Coding Variation: Pathogenicity Prediction and Phenotypic Discovery Sivley, Robert Michael 22 January 2018 (has links) Patterns of genetic variation along the human genome provide insight into functional and evolutionary constraints on different loci. Quantifying these patterns of constraint improves our ability to identify functional regions and interpret the phenotypic effects of genetic mutations. Building on exome-sequencing data from tens of thousands of individuals, we are now able to quantify constraint on a large scale. In this work, we explore three avenues by which constraint on rare protein-coding variation can be used to better understand human biology and elucidate the genetic drivers of disease. We first present a novel algorithm to classify variants of unknown significance (VUS) using patterns of spatial constraint on disease-causing variation in protein structure. We demonstrate its utility in classifying VUS in RTEL1, a helicase protein, from patients with familial interstitial pneumonia. Next, we quantify spatial constraint on somatic mutations in 3D protein structures and identify patterns indicative of driver mutations in several proteins. Finally, we perform phenome-wide association studies (PheWAS) to interrogate the phenotypic impact of rare protein-coding variants in genes intolerant to loss-of-function mutations. This dissertation makes significant advances in our understanding of how evolutionary constraint on protein-coding genetic variants is related to their contribution to human disease. In particular, we leveraged this progress to develop powerful approaches to variant pathogenicity prediction, the detection of putative driver mutations in cancer, and the identification of novel phenotype associations for highly constrained genes. Biomedical Informatics
87	Augmenting Communication With Before Visit Questionnaires Kumah-Crystal, Yaa A 25 September 2017 (has links) Barriers faced by patients with diabetes can prevent them from adhering to their prescribed plan of care. An aspect of the clinical encounter that detracts from patient-provider engagement is the work required for a provider to collect and document a patientâs interval medical history.Â A workflow that allows patients to complete a computerized Before Visit Questionnaire (BVQ) prior to their clinic visit can support communication during a clinical encounter by highlighting the patient's barriers to adherence and using the patient's responses to facilitate provider documentation. We created a patient facing BVQ to collect information about patientsâ histories and barriers in a format that could generate a summary note for their provider. Patients agreed that the BVQ helped prepare them for their clinic visit (79%) and improved their clinic visit (80%). All providers agreed it was beneficial to review a generated interval summary prior to their patient encounter. Analysis of notes produced by the providers before and after their patients completed BVQs reveals that use of BVQs increased provider documentation about patient adherence problems and barriers. Providers that incorporated the generated summaries into their clinic notes did 50% less additional typing to document histories. This research demonstrates that a workflow that supports the practice of using patient completed BVQs to produce provider documentation is agreeable to both providers and patients. Patient completed questionnaires to generate provider documentation is an effective method of supporting communication and facilitating care. Biomedical Informatics
88	Extracting Detailed Tobacco Exposure From The Electronic Health Record Osterman, Travis John 09 August 2017 (has links) Lung cancer is the leading cause of cancer-related death in the United States and worldwide. Natural language processing (NLP) tools exist to determine smoking status (ever-smoker vs. never-smoker) from electronic health record data, but no system to date extracts detailed smoking data needed to assess a patientâs eligibility for lung cancer screening. Here we describe the Smoking History And Pack-year Extraction System (SHAPES), a rules-based, NLP system to quantify tobacco exposure from electronic clinical notes. SHAPES was developed based on 261 patient records with 9,573 clinical notes and validated on 352 randomly selected patient records with 4,040 notes. F-measures for never-smoking status, ever-smoking status, rate of smoking, duration of smoking, quantity of cigarettes, and years quit were 0.86, 0.82, 0.79, 0.62, 0.64, and 0.61, respectively. Sixteen of 22 individuals eligible for lung cancer screening were identified (precision = 0.94, recall = 0.73). SHAPES was compared to a previously validated smoking classification system using a phenome wide association study (PheWAS). SHAPES predicted similar significant associations with 66% less sample size (10,000 vs. 35,788), and detected 411 (268%) more associations in the full dataset than when using just ever/never smoking status. Using smoking data from SHAPES, a smoking genome by environment interaction study found 57 statistically significant interactions between smoking and diseases including previously describes interactions between ischemic heart disease and rs1746537, obesity and rs10871777, and type 2 diabetes and rs2943641. These studies support the use of SHAPES for lung cancer screening and other research requiring quantitative smoking history. External validation needs to be performed prior to implementation at other medical centers. Biomedical Informatics
89	THE EVOLUTION OF SECONDARY METABOLISM REGULATION AND PATHWAYS IN THE ASPERGILLUS GENUS Lind, Abigail Lee 20 July 2017 (has links) Filamentous fungi produce a diverse array of secondary metabolites (SMs) that play ecological roles in defense, virulence, and inter- and intra-species communication. Fungal SMs have both deleterious and beneficial impacts on human health; some are carcinogenic toxins found in contaminated food supplies, while others, such as lovastatin and penicillin, have been repurposed as successful therapeutics. SMs are narrowly taxonomically distributed and highly diverse between species, and the biosynthetic genes and pathways that produce them are among the most fast-evolving in filamentous fungal genomes. SM production is triggered by both biotic and abiotic factors and is controlled by widely conserved transcriptional regulators. To understand how these master transcriptional regulators influence SM production and impact fungal lifestyle, I examined the genome-wide regulatory role of several master SM regulators in different species of the filamentous fungal genus Aspergillus, in different environmental conditions, and during different developmental stages. To further gain insight into the evolution of SM pathways, I leveraged population genomics in the human pathogen Aspergillus fumigatus to determine the drivers of SM genetic diversity. The findings presented here indicate that master SM regulators undergo extensive transcriptional rewiring, interact with multiple abiotic signals, and coordinate with developmental regulators to control SM production, and that novel SMs evolve through extensive genomic reorganization and through incorporation of foreign DNA. Biomedical Informatics
90	Exploring the Utility of Ratio-based Co-expression Networks using a GPU Implementation of Semantic Similarity Greer, Michael J. 21 November 2017 (has links) The reduced cost of sequencing has made it feasible to acquire multi-tissue site expression data from the same patient. In the field of cancer research, this has caused an accumulation of cancer type specific tumor with matched adjacent normal expression data sets. Co-expression network analysis is a common technique used to analyze expression data; however, it is unknown whether integrating multi-tissue site data into network construction or constructing pan-cancer networks will improve gene function prediction. One method of evaluating network performance relies on semantic similarity scores; however, computing these scores is computationally intensive. Here, I develop a GPU implementation of a commonly used semantic similarity measure and evaluate its performance compared to CPU-based approaches. Next, I explore whether constructing co-expression networks using the ratio of tumor to match adjacent normal mRNA or a pan-cancer consensus network produces superior performance compared to networks constructed with tumor expression alone. The findings presented here indicate that the GPU-based approach offers significant performance improvement over CPU-based approaches. However, the ratio- and pan-cancer networks produce only a modest improvement over tumor-based networks. Biomedical Informatics

Search results