Global ETD Search

101	A Translational Bioinformatics Approach to Parsing and Mapping ISCN Karyotypes: A Computational Cytogenetic Analysis of Chronic Lymphocytic Leukemia (CLL) Abrams, Zachary 26 September 2016 (has links) No description available. Bioinformatics
102	Computational approaches for metatranscriptomic profiling in translational medicine and pulmonary diseases Nankya, Ethel 11 January 2024 (has links) Use of total RNA-seq in host and microbiome analysis allows for multi-omic interrogation of microbial profiles, assessment of their function and their interaction with host immune and metabolic pathways. This type of analysis calls for novel computational techniques. However, existing tools for analyzing microbial multi-omic data are lacking, as they typically address a single data type. For example, there are many available tools for the characterization of microbial communities, but these are unable to investigate microbial-host interactions. To address this need, I developed a novel computational pipeline that integrates existing methods for microbial and host expression profiling. This pipeline provides insight into possible personalized medical interventions in translational medicine. This dissertation utilized — transcriptomics and metatranscriptomics to interrogate: 1) host-microbial interactions in people with indeterminate pulmonary nodules, 2) the role of Human Endogenous Retroviruses in the early onset of ageing observed in virologically suppressed HIV positive individuals, and finally 3) to characterize humoral responses to SARS-CoV-2 peptides in Covid-19 patients. Specifically, to address the host-microbial interactions in people with indeterminate pulmonary nodules, I addressed sources of batch effects in the data, and I utilized statistical approaches to identify differentially abundant microbes in current and former smokers and malignant and benign samples. Lastly, I linked abundant microbes in both datasets to human pathways and tested for their strength of association. This approach aided in providing insight into the possible functional profile of these microbes and their role in lung cancer. Furthermore, I investigated the role of Human Endogenous Retroviruses in the early onset of ageing observed in virologically suppressed HIV positive individuals. In this project, I utilized Telescope software to generate HERVs counts. Differential analyses were then performed to identify differentially expressed HERVs in PLHIV. Using the computational pipeline that was developed for muti-omic analyses, the association of differentially expressed HERVs with pathways involved in inflammageing and inflammatory markers was then investigated. Taken together, this work identified HERVS that could act as therapeutic and diagnostic in the HIV setting. Lastly, for the third project, I sought to characterize IgG and IgM humoral responses to SARS-CoV-2 at the epitope level, where discriminating epitopes for disease severity were identified. I also investigated epitopes that were conserved between SARS-CoV-2 virus and other Human coronaviruses, allowing the investigation of associations with less severe disease outcomes. These epitopes could serve as discriminative markers for COVID-19 disease severity. / 2026-01-11T00:00:00Z Read more Bioinformatics
103	Computational characterization of long non-coding RNAs (lncRNAs) and study their role in rodent liver disease, xenobiotic exposure, and sex-specific responses using bulk and single cell RNA-sequencing Karri, Kritika 20 March 2024 (has links) LncRNAs comprise a heterogeneous class of thousands of RNA-encoding genes whose functions are largely unknown. This thesis describes systematic computational approaches to discover liver-expressed lncRNAs globally and then deduce their regulatory roles in response to foreign chemical and hormonal exposures. In a first study, bulk liver RNA-seq data was used to discover liver-expressed lncRNAs responsive to multiple xenobiotics in a rat model. Ortholog analysis combined with co-expression data and causal inference methods was used to infer lncRNA function and deduce gene regulatory networks, including causal effects of lncRNAs on biological pathways. This work provides a framework for understanding the widespread transcriptome-altering actions of foreign chemicals in a key-responsive mammalian tissue. In a second study, single-cell RNA-seq was employed to develop a reference catalog of 48,261 mouse liver-expressed lncRNAs, a majority novel, by transcriptome reconstruction from > 2,000 bulk public mouse liver RNA-seq datasets. Single cell RNA-seq was sufficiently sensitive to detect >30,000 mouse liver lncRNAs and characterize their dysregulation in mouse models of high fat diet-induced non-alcoholic steatohepatitis (NASH), carbon tetrachloride-induced liver fibrosis, and hepatotoxicity induced by the Ah receptor agonist TCDD. Trajectory inference algorithms uncovered lncRNA zonation patterns in five major hepatic cell populations and their dysregulation in diseased states. LncRNAs expressed in NASH-associated macrophages, closely linked to disease progression, and in collagen-producing myofibroblasts, a key source of the fibrous scar in fibrotic liver, were identified. Regulatory network analysis linked individual lncRNAs with key biological pathways and gene centrality metrics identified network-essential regulatory lncRNAs in each liver disease model. In a third study, single nucleus RNA-seq combined with single nucleus ATAC-seq mapping of open chromatin regions elucidated functional linkages between cis- and trans-regulatory elements and their downstream genes targets, notably genes showing expression sex-differences impacting metabolism and disease risk. Liver cell type-specific chromatin accessibility signatures were identified, as were sex-specific accessibility signatures for hepatocytes and their associated DNA regulatory region motifs. Integrative modalities were employed to elucidate transcription factor-based mechanisms involved in sex-specific growth hormone-regulated gene expression by identifying transcriptional and epigenetic changes during feminization of mouse liver. Together, these studies characterize lncRNA function and can motivate future experiments. / 2026-03-20T00:00:00Z Read more Bioinformatics
104	Methods and tools for characterizing microbial communities in the context of chronic diseases Odom, Aubrey R. 08 May 2024 (has links) The human microbiome, a complex ecosystem of microorganisms inhabiting various body sites, plays a crucial role in the immune system and overall human health. A comprehensive understanding of the microbiome and its interactions with the host is essential for advancing scientific knowledge and potential therapeutic interventions. This work focuses on two aspects of the microbiome space: taxonomic profiling for the identification of resident microbes, and longitudinal analysis to unravel the dynamics of microbial communities over time. A fundamental step in microbiome analysis is taxonomic profiling: the identification of resident microbes in samples. Numerous tools have been developed to cater to different sequencing types (e.g., 16S versus WGS) and contexts. However, despite significant advances in the profiling field, further work is needed to establish optimal methods for metagenomic classification. To address this gap, we introduce MetaScope, a comprehensive R-based package for accurate microbial composition identification at a strain-level resolution within a sample. We have performed benchmarking against mock microbial communities to validate MetaScope's performance against popular competitors using 16S datasets. Microbial time-series data presents unique challenges, including intricate covariate dependencies and diverse longitudinal study designs. Existing methods often fall short in addressing these challenges, lacking versatility, data type specificity, or the ability to account for the compositional nature of the data. In response, this work introduces LegATo, an open-source suite comprising modeling, visualization, and statistical tools tailored for analyzing microbiome dynamics. LegATo, with its user-friendly interface, accommodates various study structures and incorporates Generalized Estimating Equation (GEE) models, Hotelling’s T-squared tests, and several visualization functions. This toolkit enables researchers to identify microbial taxa affected by perturbations over time, such as the onset of disease or lifestyle changes, and predict their effects on the composition or stability of commensal bacteria. To illustrate the practical application of LegATo, we present two case studies focusing on the nasopharyngeal microbiomes of Zambian infants exposed to HIV and experiencing fatal acute febrile illness. These applications showcase the efficacy of LegATo for unraveling the complex dynamics of microbial communities, providing insight into the impact of specific perturbations on the microbiome. In conclusion, this research contributes to the advancement of microbiome analysis by enhancing taxonomic profiling methodologies and addressing the challenges posed by longitudinal data. The presented tools, MetaScope and LegATo, provide valuable resources for researchers exploring the intricate interactions between the microbiome and host over time, paving the way for a deeper understanding of microbial dynamics and their implications for human health. Read more Bioinformatics
105	Approaches for identifying lung cell type responses to perturbation Corbett, Sean E. 01 August 2019 (has links) The use of genomic profiling can provide indications of underlying molecular responses to chemical perturbation, and the characterization of these responses can provide an increased understanding of the greater physiological effects of an exposure and inform clinical decision making. This approach has proven to be effective in understanding the effects of environmental exposures such as cigarette smoke on the airway epithelium, and how they may contribute to associated disease pathogenesis. Because of the existing body of work in genomic profiling towards understanding the effects of environmental exposures, it has relevant applications towards the study of the effects of emerging exposures such as electronic cigarettes, which remain poorly understood. Further, current approaches for genomic profiling could be improved through the development of data resources and computational methods which can identify not only tissue- or sample-level molecular responses to perturbation, but also responses specific to individual cells or cell types. In light of these issues, I investigated the molecular response in airway epithelium to a novel inhaled exposure, and developed methods to support more detailed characterization of such effects. In this dissertation, I describe a clinical observational study in which I examined the gene expression effects of electronic cigarettes on the airway epithelium, and compare these effects to those of conventional cigarettes (Aim 1). Next, I describe CELDA, a novel computational method for identifying cell subpopulations and the co-expressed modules of genes that identify them in single cell RNA-seq (scRNA-seq) data (Aim 2). Finally, I describe the Lung Connectivity Map (Lung CMap), a platform for interrogating lung cell type specific responses to a large set of chemical and molecular perturbations (Aim 3). Collectively, this work encompasses both observational and computational approaches for detailed characterization of the molecular responses to perturbation, and the determination of the relative effects of these novel perturbations versus their more well-described counterparts. / 2021-07-31T00:00:00Z Read more Bioinformatics
106	From lesions to treatment: a multi-species multi-transcriptomics study of oral (pre-) cancer conditions Khan, Mohammed Muzamil 17 May 2024 (has links) Cancer incidence rates in 2024 are estimated to increase, according to the latest statistics from the American Cancer Society, with cancer-related death rates declining overall due to improved intervention and treatment strategies. However, these strategies remain only partially effective for some cancers due to their sub-clonal evolution and associated molecular heterogeneity with added complexities related to socioeconomic status. One such type comprises head and neck squamous cell carcinomas (HNSCCs) that arise from the mucosal epithelium in the upper aerodigestive tract with the oral cavity representing a major subsite that presents with oral squamous cell carcinoma (OSCC) pathology. The five-year survival rate for OSCC is ~66% for localized stages and reduces to ~35% upon locoregional spread and metastasis. OSCC etiology is associated with tobacco usage, excessive alcohol consumption, and the usage of carcinogens-containing substances. Other HNSCC subsites, such as oropharynx and nasopharynx are associated with human papillomavirus (HPV) and Epstein-Barr viruses infections, respectively. The HPV-negative OSCC subtype is highly heterogeneous in its molecular and cellular composition and remains understudied compared to other solid tumor types; hence, there is a need for early detection strategies in addition to effective targeted therapeutics for improving treatment outcomes. With the advances in computational methodologies to study human diseases, multi-omic data modalities are becoming a crucial tool to improve patient care and have already led to remarkable discoveries. In my thesis, bioinformatics, statistical, and machine learning methodologies were leveraged to study the transcriptome-wide changes across different modalities (bulk and single-cell) of pre-cancer oral conditions in humans and of treatment effects on OSCC tumors in murine models. In my first aim, the transcriptional changes of early oral lesions and their progression to OSCC were studied. OSCC arises from oral epithelial dysplasia through a series of clinical and histopathological changes. A subset of OSCC develops from oral leukoplakia, classified as “oral potentially malignant disorders” (OPMD), clinically defined as either localized homogenous leukoplakia (LL), erythroplakia, or proliferative verrucous leukoplakia (PVL), with a malignant transformation ranging between 15% to 90%. The association of leukoplakia with dysplasia is a strong predictor of its capacity to progress to OSCC. In addition, leukoplakia with hyperkeratosis/hyperplasia or hyperkeratosis non-reactive (HkNR) can develop into OSCC. While these lesions have been characterized clinically, a rigorous molecular characterization of these lesions and of the concomitant microbiota are lacking. In this project, I leveraged bulk RNA sequencing to study the molecular profiles of oral lesions at different stages of transformation, and to characterize their defining pathways and host/microbiome interactions. To this end, I used data collected from tissue biopsies isolated from a cohort of 66 human patients harboring distinct histopathology groups consisting of healthy oral mucosa, PMLs comprising HkNR and dysplasia, and OSCC. These samples were profiled using total RNA-sequencing technology, allowing us to study the global transcriptome, including host and microbiome of the oral mucosa from these groups. Our data revealed that PMLs were enriched in gene signatures associated with cellular plasticity, such as partial EMT (p-EMT) phenotypes, and with immune response. Integrated analyses of the host transcriptome and microbiome further highlighted a significant association between differential microbial abundance and PML pathway activity, suggesting a contribution of the oral microbiome to the evolution of PML to OSCC. Collectively, this study revealed molecular processes associated with PML progression that may help early diagnosis and disease interception at an early stage. My second aim was focused on elucidating the mechanisms of action of candidate therapeutics. As the efficacy rate of FDA-approved drugs for HNSCC tumors remains relatively low due to a combination of factors, including tumor heterogeneity, late diagnosis, and drug resistance, the need to develop targeted therapies focusing on selected cancer pathways represents a promising avenue. In previous studies, our laboratories have shown that pharmacological blockade of Wnt/β-catenin/CBP activity with small molecule inhibitors effectively abolished oncogenic cell phenotypes in OSCC. To determine the identity of additional cell subtypes contributing to OSCC progression to advanced disease, we induced oral tumors in immunocompetent mice using a tobacco-associated carcinogen, 4-Nitroquinoline 1-oxide (4-NQO), that causes DNA damage and recapitulates the genomic and histopathological changes observed with the progressive development of OSCC in human oral epithelia. Following 4-NQO treatment, mice were treated with E7386, an orally active small molecule modulator of β-catenin/CBP activity, at two concentrations — 25 mg/kg and 50 mg/kg. We generated a single-cell RNA sequencing dataset with ~50K cells to explore the E7386 treatment-associated effects on OSCC development and to better understand the underlying molecular and cellular changes that might provide novel insights into its mechanisms of action. Our analyses revealed shifts in cellular diversity between treatment groups, with the proportion of epithelial cells decreasing upon treatment — consistent with greatly diminished tumor volumes – while endothelial and fibroblast populations increased compared to the 4NQO control group, providing further evidence for the antitumor activity of E7386. In the immune compartment, we found enrichment of effector T-cell (CD8+ and CD4+) activity and a decrease in disease-associated neutrophil activity in the inhibitor-treated profiles. Additionally, epithelial sub-typing using curated markers of cell identities revealed a decrease in basal cancer stem-like cells (Krt5+, Krt14+) concomitant with an increase in cycling cells (Top2a+, Cdc20+). Finally, we identified a decrease in a stress cell phenotype associated with the AP-1 complex (Jun+, Fos+), and decreased cell plasticity in response to E7386, confirmed by functional validations through immunofluorescence staining experiments. Overall, this study provides cellular characterizations of murine oral tumors and presents further evidence of Wnt/β-catenin/CBP inhibition as a promising therapeutic strategy in OSCC. In summary, my studies have contributed to advancing early detection and treatment strategies and to unveil molecular mechanisms of treatment response in OSCC. / 2025-05-17T00:00:00Z Read more Bioinformatics
107	Dynamics of natural selection on human genomic variants Pivirotto, Alyssa, 0000-0002-2409-2574 05 1900 (has links) Evolutionary adaptation in humans is shaped primarily by the selection of beneficial alleles. Classical population genetic theory predicts that alleles under selection will experience a rapid increase in frequency. However, the effects of weakly deleterious and neutral alleles propelled to high frequency due to drift complicates the identification of sites under positive selection. Evolutionary probability uses vertebrate alignments and divergence times to estimate a site’s evolutionary history, assigning probability values to each potential amino acid at that site. For each mutation, a probability value can be calculated indicating whether the mutation is favored or disfavored evolutionarily. Because sites under selection will increase in frequency more quickly than they would due to genetic drift alone, it is expected that both beneficial and deleterious mutations will be younger on average than neutral variants of the same frequency. In chapter two, this was found to be true for the disfavored, deleterious mutations which were younger on average. Notably, beneficial mutations were found to be older on average than neutral mutations of the same frequency. One possible model suggests that the enrichment of old, beneficial alleles segregating in modern humans can be explained due to linked, weakly deleterious variants hindering the fixation of beneficial mutations until recombination allows for the escape of the beneficial mutation. Assessing allele age estimation methods is crucial for understanding the potential selection a mutation is undergoing. While whole genome sequencing data is becoming increasingly accessible, a large amount of the currently available data for large population datasets exists in the form of whole exome sequencing data. In chapter three, the accuracy of three allele age estimators, Genealogical Estimation of Variant Age (GEVA), Relate, and time of coalescence is tested for accuracy for both whole genome and whole exome sequencing datasets. Relate was found to outperform both other estimators of allele age for both a simple (Pearson: 0.64) and complex (Pearson: 0.68) demography model with the estimates based on whole exome data having an average drop in performance of 16 percent in comparison with the whole genome estimates. Beyond investigating segregating variants, phylogenetic methods such as evolutionary probability allow for the analysis of fixed candidate variants and the investigation into potential mechanisms by which these favored alleles arose. In chapter four, derived sites which have become fixed in modern humans where non-human primates all share the ancestral amino acid are identified. Utilizing the fixed, derived sites and the corresponding evolutionary probability values, it can be tested if adaptation occurs due to novel, low evolutionary probability mutations. A second hypothesis can also be tested where instead adaptation occurs due to a mutation to a more evolutionarily stable amino acid. It was found that while the majority of substitutions in modern humans are both arising by way of novel amino acids, however this is no evidence that these substitutions are driving phenotypic adaptation in modern humans. / Biology / Accompanied by one .txt file : 1) Pivirotto_temple_0225E_171/primary_refcores.txt Read more Bioinformatics
108	The fundamentals of genome-scale metabolic models and their application to the study of evolution and cancer Moyer, Devlin 07 January 2025 (has links) 2025 / Hundreds to thousands of distinct metabolic reactions occur in all cells, forming a densely interconnected metabolic network that transforms similarly numerous metabolites into each other. Genome-Scale Metabolic Models (GSMMs) encode all existing knowledge about the structures of these metabolic networks, the enzymes responsible for catalyzing their reactions, the genes that encode those enzymes, and the metabolites that they interact with. Integration of different forms of high-throughput data within a single GSMM has facilitated numerous biological insights, ranging from strategies for engineering the metabolisms of microbes to produce commercially and/or medically valuable compounds to identifying novel drug targets for cancer, diabetes, inborn errors of metabolism, infectious diseases, among others. Due to the complexity of cellular metabolic networks and the limited availability of relevant experimental data, the predictive utility of GSMMs is often limited by missing or inaccurate reactions. Furthermore, common approaches to predicting metabolic fluxes from GSMMs often focus on identifying a single optimal flux state, which frequently leads to inaccurate predictions for specific cell types or disease states where biologically plausible metabolic optima are unknown or challenging to formally define. This dissertation addresses several limitations of existing approaches to creating and using GSMMs, with particular emphasis on the following challenges: (i) testing for the presence of reactions which can sustain unrealistically high fluxes, duplicate reactions, and missing or misannotated reactions; and (ii) predicting biologically and statistically sound distributions of steady-state fluxes through GSMMs, including methods which involve the incorporation of transcriptomics and/or proteomics data from particular conditions, which are especially relevant for the development of tissue-, disease- and patient-specific GSMMs. In addition, I extend the techniques for predicting fluxes through GSMMs to artificial chemistry networks — abstract models of simplified chemical reaction networks, which have been used to study general principles governing the behavior of such networks while avoiding the incompleteness of our understanding of real biochemistry. Specifically, I use these artificial chemistry networks to study general principles governing the evolution of the structures of metabolic networks, and demonstrate the importance of the biomass composition in determining intracellular network architecture. Throughout the dissertation, I present multiple tools and recommendations for improving the predictive quality of GSMMs and demonstrate their utility by correcting several hundred errors in the most recent GSMM of generic human cells, with possible broad implications for the field of metabolic modeling and its applications. Read more Bioinformatics
109	Dynamics of Microbial Genome Evolution Hooper, Sean January 2003 (has links) <p>The success of microbial life on Earth can be attributed not only to environmental factors, but also to the surprising hardiness, adaptability and flexibility of the microbes themselves. They are able to quickly adapt to new niches or circumstances through gene evolution and also by sheer strength of numbers, where statistics favor otherwise rare events.</p><p>An integral part of adaptation is the plasticity of the genome; losing and acquiring genes depending on whether they are needed or not. Genomes can also be the birthplace of new gene functions, by duplicating and modifying existing genes. Genes can also be acquired from outside, transcending species boundaries. In this work, the focus is set primarily on duplication, deletion and import (lateral transfer) of genes – three factors contributing to the versatility and success of microbial life throughout the biosphere. </p><p>We have developed a compositional method of identifying genes that have been imported into a genome, and the rate of import/deletion turnover has been appreciated in a number of organisms. Furthermore, we propose a model of genome evolution by duplication, where through the principle of gene amplification, novel gene functions are discovered within genes with weak- or secondary protein functions. Subsequently, the novel function is maintained by selection and eventually optimized. Finally, we discuss a possible synergic link between lateral transfer and duplicative processes in gene innovation.</p> Read more Bioinformatics Bioinformatik Bioinformatics Bioinformatik
110	Predicting Function of Genes and Proteins from Sequence, Structure and Expression Data Hvidsten, Torgeir R. January 2004 (has links) <p>Functional genomics refers to the task of determining gene and protein function for whole genomes, and requires computational analysis of large amounts of biological data including DNA and protein sequences, protein structures and gene expressions. Machine learning methods provide a powerful tool to this end by first inducing general models from such data and already characterized genes or proteins and then by providing hypotheses on the functions of the remaining, uncharacterized cases.</p><p>This study contains four parts giving novel contributions to functional genomics through the analysis of different biological data and different aspects of biological functions. Gene Ontology played an important part in this research providing a controlled vocabulary for describing the cellular roles of genes and proteins in terms of specific molecular functions and broad biological processes.</p><p>The first part used gene expression time profiles to learn models capable of predicting the participation of genes in biological processes. The model consists of IF-THEN rules associating biological processes with minimal set of discrete changes in expression level over limited periods of time. The models were used to hypothesize new biological processes for both characterized and uncharacterized genes.</p><p>The second part investigated the combinatorial nature of gene regulation by inducing IF-THEN rules associating minimal combinations of sequence motifs common to genes with similar expression profiles. Such combinations were shown to be significantly correlated to function, and provided hypotheses on the mechanisms behind the regulation of gene expression in several biological responses.</p><p>The third part used a novel concept of local descriptors of protein structure to investigate sequence patterns governing protein structure at a local level and to predict the topological class (fold) of protein domains from sequence. Finally, the fourth part used local descriptors to represent protein structure and induced IF-THEN rule models predicting molecular function from structure.</p> Read more Bioinformatics Bioinformatik Bioinformatics Bioinformatik

Search results