Global ETD Search

81	Integrated Analysis of Multi-Omics Data Using Sparse Canonical Correlation Analysis Castleberry, Alissa 29 July 2019 (has links) No description available. Bioinformatics
82	Determination of the Structure of Human Testis Protein Maelstrom and Examination of Functional Differences among Human Genome Variants Wobser, Madison June 30 April 2019 (has links) No description available. Bioinformatics
83	Noncoding RNA-Involved Interactions for Cancer Prognosis: A Prostate Cancer Study Wang, Leying 08 October 2020 (has links) No description available. Bioinformatics
84	THE ROLE OF COMPLEX EVOLUTIONARY DYNAMICS IN MOLECULAR SEQUENCE ANALYSIS Lucaci, Alexander January 2023 (has links) Codon substitution models can be used to quantify selective pressure on molecular sequences. The work contained within this dissertation represents my efforts to create, validate, and test codon models, and to apply them to biologically diverse data sets. Specifically, my objective is to determine the impact and consequences of instantaneous multi-nucleotide mutational events (MH) on the statistical inference of evolutionary rate parameters. In evolutionary terms, MH events represent rare phenomena which may have strong effects on protein-coding genes. Evidence for MH events directly impact the patterns and processes of protein-coding gene evolution across species and time. My central hypothesis is that accounting for multi-nucleotide mutational events alters the estimation of evolutionary rate parameters and represents an alternative path a gene may embark upon during their evolutionary history. This hypothesis is fundamentally based on a synthesis of my own work, and others within the field, on the relationship of the evolutionary properties of functional coding regions of the genome. My rationale is that completion of my dissertation will result in the generation of new statistical methodologies and readily available computational implementations designed to identify key targets of evolutionary mechanisms, genomic patterns, and biological processes important for the functional adaptation of the genome as it relates to MH. My long-term goal is to develop novel strategies to improve the biological realism of statistical models of molecular sequence evolution. I accomplish this goal by addressing gaps in conventional protein-coding gene model assumptions and with the inclusion of additional biological, statistical, physiological, and evolutionary information. In this effort, I have created a codon model to account for MH events, with details described in Chapter 2. We have also made the model easily available on Datamonkey.org and provided several useful visualizations to aide in the interpretations of results. To better understand the evolution of protein-coding genes, I applied my methods and other existing methods in the field to explore potentially novel biological insights into a gene family, the heat shock proteins, which play an important role as a protein chaperone, and in another paper, Brain-Derived Neurotrophic Factor (BDNF), which plays an important role in brain development. I developed software pipelines, curated data, and provided visualizations for the interpretation of results. The publications associated with this work are highlighted in Chapters 3 and 4, respectively. To enable the exploration of molecular virology and the analysis of viral pathogens I developed “Rapid Assessment of Selection in CLades (RASCL)”: a novel application for the rapid assessment of molecular sequence evolution viral clades. I used RASCL to study the emergence and ongoing evolution of SARS-CoV-2 lineage and to identify key sites subject to adaptive evolution and the development of new viral lineages. Near-real-time pathogen molecular surveillance is an important part of understanding the spread of disease. Our development of scalable tools to analyze big datasets of viral pathogen sequences is a critical step forward for global public health. My methods and results can be used to translate existing molecular sequence data into novel insights, and to improve the understanding of important evolutionary systems, and all together constitute an accessible platform for quantifying selective pressure on molecular sequences. / Biology Bioinformatics
85	Investigation of HIV-TB co-infection through analysis of the potential impact of host genetic variation on host-pathogen protein interactions Heekes, Alexa Storme 29 August 2022 (has links) (PDF) HIV and Mycobacterium tuberculosis (Mtb) co-infection causes treatment and diagnostic difficulties, which places a major burden on health care systems in settings with high prevalence of both infectious diseases, such as South Africa. Human genetic variation adds further complexity, with variants affecting disease susceptibility and response to treatment. The identification of variants in African populations is affected by reference mapping bias, especially in complex regions like the Major Histocompatibility Complex (MHC), which plays an important role in the immune response to HIV and Mtb infection. We used a graph-based approach to identify novel variants in the MHC region within African samples without mapping to the canonical reference genome. We generated a host-pathogen functional interaction network made up of inter- and intraspecies protein interactions, gene expression during co-infection, drug-target interactions, and human genetic variation. Differential expression and network centrality properties were used to prioritise proteins that may be important in co-infection. Using the interaction network we identified 28 human proteins that interact with both pathogens (”bridge” proteins). Network analysis showed that while MHC proteins did not have significantly higher centrality measures than non-MHC proteins, bridge proteins had significantly shorter distance to MHC proteins. Proteins that were significantly differentially expressed during co-infection or contained variants clinically-associated with HIV or TB also had significantly stronger network properties. Finally, we identified common and consequential variants within prioritised proteins that may be clinically-associated with HIV and TB. The integrated network was extensively annotated and stored in a graph database that enables rapid and high throughput prioritisation of sets of genes or variants, facilitates detailed investigations and allows network-based visualisation. Bioinformatics
86	A Computational Approach for Diagnostic Long-Read Genome Sequencing Kautto, Esko Antero 30 August 2022 (has links) No description available. Bioinformatics
87	Leveraging transcriptomic regulation to understand, diagnose and intercept early lung cancer pathogenesis Ning, Boting 07 November 2023 (has links) Lung cancer is the leading cause of cancer death in the US, largely due to the lack of treatment options to intercept the progression of early lung cancers and methods to diagnose lung cancer at early stages. Prior studies indicated that the lack of immune surveillance is associated with the progression of bronchial premalignant lesions (PMLs) and the gene alterations in the nasal epithelium can be leveraged for the early detection of lung cancer. Yet, the regulatory mechanism of these gene expression alterations is still less understood. Thus, there are unmet needs to study the gene expression regulation for better disease management of early lung cancer, including further understanding the biology of early lung cancer development, identifying potential interception strategies, and improving the lung cancer diagnosis. My dissertation addresses these challenges by investigating the transcriptional and post-transcriptional gene expression regulators, including transcription factors and microRNAs (miRNAs), to facilitate the understanding, interception, and diagnosis of early lung cancer. First, I explored the miRNA regulatory landscape to identify miRNA-gene regulatory relationships associated with bronchial PML progression and molecular subtypes. Using matched gene and microRNA expression profiles from patients with bronchial premalignant lesions, I identified epithelial miR-149-5p to be a key regulator of gene expression contributing to PML progression. By suppressing NLRC5, miR-149-5p inhibits MHC-I gene expression of epithelial cells, promoting early immune depletion and lesion progression. I also developed a novel statistical framework, Differential Regulation Analysis of miRNA (DReAmiR), that characterizes miRNA-mediated gene regulatory network rewiring across multiple groups from transcriptomic profiles, and identified regulatory network differences across PML molecular subtypes. Secondly, I investigated the alterations in the Hippo pathway to identify potential drug targets to intercept the progression of bronchial PMLs. I found that Hippo pathway effectors YAP/TAZ, together with transcription factors TEAD and TP63, cooperatively promote basal cell proliferation and repress signals associated with interferon responses and immune cell communication. Further in silico drug screening with external datasets identified small compounds that can reverse the direct regulated gene signature to potentially intercept bronchial PML progression. Lastly, I integrated miRNA and gene expression profiles in the nasal epithelium to distinguish malignant from benign indeterminate pulmonary nodules. I built an ensemble classifier consisting of nasal epithelial miRNA expression features, miRNA-gene top scoring pairs, and clinical features. The performance of the ensemble classifier exceeded that of the classifier built with clinical features alone. Collectively, my thesis investigated the gene expression regulation mechanisms to facilitate the understanding, interception, and diagnosis of early lung cancer pathogenesis. / 2025-11-06T00:00:00Z Bioinformatics
88	Multimodal, longitudinal, and mega-analysis of biomedical data Schiffer, Lucas 07 November 2023 (has links) Biomedical data science is a multi-disciplinary field concerned with the collection, storage, and interpretation of biomedical data that uses annotation, algorithms, and analysis to extract knowledge and insights from structured and unstructured data to be used in the development and evaluation of diagnostic tests, prognostic predictions, and therapeutic interventions. Biomedical data scientists perform this work using biomedical data that arises when samples are subjected to biochemical assays to quantitively or qualitatively investigate their pathophysiological characteristics. Increasingly, biomedical data are generated at single-cell resolution and have consequently become far more hierarchical and multimodal in nature – that is, levels of organization encapsulate one another (e.g., samples belonging to subjects are made up of cells) and multiple biological modalities are profiled simultaneously. The paradigm shift adds significant complexity to the collection, storage, management, and analysis of biomedical data, but brings with it the promise of unprecedented insights to be gained from integrative analyses. These analyses are the focus of this dissertation, where the challenges of integrating biomedical data across multiple modalities, timepoints, and studies are examined through three research projects. Challenges related to multimodal analysis of biomedical data will be explored through the development of MultimodalExperiment, a data structure that appropriately and efficiently represents multiomics data that is hierarchical, multimodal, and/or longitudinal in nature. A schematic of and methods for the data structure will be presented along with example usage to demonstrate how current challenges of alternative data structures are overcome, ease of data management is improved, and computational/storage efficiency is optimized. Challenges related to longitudinal analysis of biomedical data will be explored in the context of a cohort study of cancer patients being treated with anti-programmed cell death protein 1/programmed cell death ligand 1 immunotherapies at Boston Medical Center. The progression-free survival status of study participants will be analyzed using linear mixed effects models which incorporate longitudinal high-dimensional metabolomics data. Maps of metabolic pathways and a hypothesis will be presented to explain serum metabolites that are associated with progress-free survival status and possibly therapeutic efficacy. Challenges related to mega-analysis of biomedical data will be explored through the creation of a pipeline to preprocess transcriptomics data from human host infected with tuberculosis to support machine learning and other tasks. The details of original software developed to provide more than 10,000 samples of clean high-quality machine learning ready data from all related and eligible studies in the Gene Expression Omnibus repository will be illustrated. The importance improving diagnostic testing and therapeutic interventions for tuberculosis disease will be highlighted in the context of these data, and the specifics of why they represent a key ingredient for machine learning that helps overcome current challenges in the field will be explained. Bioinformatics
89	Enhancing protein interaction prediction using deep learning and protein language models Hashemi, Nasser 30 August 2023 (has links) Proteins are large macromolecules that play critical roles in many cellular activities in living organisms. These include catalyzing metabolic reactions, mediating signal transduction, DNA replication, responding to stimuli, and transporting molecules, to name a few. Proteins perform their functions by interacting with other proteins and molecules. As a result, determining the nature of such interactions is critically important in many areas of biology and medicine. The primary structure of a protein refers to its specific sequence of amino acids, while the tertiary structure refers to its unique 3D shape, and the quaternary structure refers to the interaction of multiple protein subunits to form a larger, more complex structure. While the number of experimentally determined tertiary and quaternary structures are limited, databases of protein sequences continue to grow at an unprecedented rate, providing a wealth of information for training and improving sequence-based models. Recent developments in the sequence-based model using machine learning and deep learning has shown significant progress toward solving protein-related problems. Specifically, attention-based transformer models, a recent breakthrough in Natural Language Processing (NLP), has shown that large models trained on unlabeled data are able to learn powerful representations of protein sequences and can lead to significant improvements in understanding protein folding, function, and interactions, as well as in drug discovery and protein engineering. The research in this thesis has pursued two objectives using sequence-based modeling. The first is to use deep learning techniques based on NLP to address an important problem in cellular immune system studies, namely, predicting Major Histocompatibility Complex (MHC)-Peptide binding. The second is to improve the performance of the Cluspro docking server, a well-known protein-protein docking tool, in three ways: (i) integrating Cluspro with AlphaFold2, a well-known accurate protein structure predictor, for enhanced protein model docking, (ii) predicting distance maps to improve docking accuracy, and (iii) using regression techniques to rank protein clusters for better results. Bioinformatics
90	Modeling premalignant lung squamous carcinoma via gene expression changes associated with EP300 knockout Fu, Dany 14 June 2023 (has links) Lung cancer is the third most common type of cancer and the leading cause of cancer death, in both men and women, and prognosis for lung carcinoma remains poor due to late diagnosis. While lung squamous cell carcinoma (LUSC) makes up 20-30% of all lung cancer cases, identification of genetic signatures and successful targeted therapies remain limited. An ongoing effort is being made to create an in vitro system for modeling the early stages of lung squamous carcinoma and premalignancy, which will ultimately serve as a model for drug discovery. A previous effort performed whole exome and targeted DNA sequencing to reveal the somatic mutations in endobronchial biopsies that harbored lung squamous premalignant histology. EP300 was identified as a candidate gene which may act as a driver for carcinogenesis, but remains understudied when compared to prominent oncogenic driver genes such TP53, NOTCH1, or NFE2L2 in LUSC. The p300 protein is a histone acetyltransferase that regulates gene expression by means of chromatin remodeling and has been implicated in various diseases, including cancer. My objective as part of my thesis was to first generate stable EP300 knockout (KO) clones from the NL20 bronchial epithelial cell line utilizing the CRISPR/Cas gene editing system. Using the NL20 clones and EP300 KO clones in the HBEC-3KT cell line generated in a previous effort, I then validated the knockouts at the DNA, RNA, and protein levels. Literature review was also conducted to identify possible cellular pathways that EP300 participates in and validate its role in those pathways by observing changes in downstream protein targets. Finally, I generated RNA sequencing data from the functionally validated clones to identify differentially expressed genes and cellular pathways perturbed by EP300 knockout. Through these efforts, I developed sets of gene signatures unique to each cell line and found that EP300 is associated with bronchial carcinogenesis progression and likely functions as an oncogene in LUSC. Bioinformatics

Search results