211 |
Metabolic Modeling of Cystic Fibrosis Airway Microbiota from Patient SamplesVyas, Arsh 20 October 2021 (has links)
Cystic Fibrosis (CF) is a genetic disorder, found with higher prevalence in the Caucasian population, affecting > 30,000 individuals in the United States and > 70,000 worldwide. Due to the astoundingly high rate of mortality among CF patients being attributed to respiratory failure brought on by chronic bacterial infections and subsequent airway inflammation, there has been a lot of focus on systematically analyzing CF lung airway communities. While it is observed traditionally that Pseudomonas aeruginosa is the most threatening and persistent CF colonizer due to high antibiotic resistance, recent studies have elicited the roles of other pathogens and it has been widely accepted the CF lung airway consists of a complex codependent community of bacteria, viruses, and fungi.
To elucidate the interplay among the members of this community, within the constraint of lung uptake regime, I developed a community metabolic network model comprising of >380 metabolites obtained after modeling 39 most abundant bacterial genera across 279 sputum specimens collected from 79 individuals over 10 years from a study by LiPuma et. al. by 16S rRNA gene sequencing, accounting for >89% of reads across samples. The community metabolic model was contrasted with the 16S relative abundance data through standard data mining techniques employed for the analysis of multidimensional data. I further attempted to quantitatively analyze and elucidate the correlations among patient lung function, disease progression, community diversity, microbial compositions, and metabolic capabilities by standard classical hypothesis testing methods.
Comparison through linear dimensionality reduction (PCA) of the 16S data and the model data revealed slightly higher variance explained by the model, indicating presence of relatively smaller number of metabolite-based than the 16S-based polymicrobial communities. A deeper analysis elucidated both the phenomena, consolidation of compositionally different communities due to metabolic closeness, as well as splitting of other communities into metabolically distinct clusters due to minor changes in composition and increase in diversity. Clustering of 16S-based relative abundance data and the model data revealed that the rare Burkholderia infections are metabolically distinct from other CF communities, and are heavily dominated by this genus. It was also reiterated that Achromobacter infections are highly resilient to treatment. Linear regression analysis between lung function and microbiota diversity revealed no strong correlation across the population, however, diversity was found to first increase and then subsequently decrease drastically with disease severity.
|
212 |
NEW BIOINFORMATIC METHODS OF BACTERIOPHAGE PROTEIN STUDYEmily A Kerstiens (10716540) 05 May 2021 (has links)
<p>Bacteriophages are viruses that
infect and kill bacteria. They are the most abundant organism on the planet and
the largest source of untapped genetic information. Every year, more
bacteriophages are isolated from the environment, purified, and sequenced. Once
sequenced, their genomes are annotated to determine the location and putative
function of each gene expressed by the phage. Phages have been used in the past
for genetic engineering and new research is being done into how they can be
used for the treatment of disease, water safety, agriculture, and food safety. </p>
<p>Despite the influx of sequenced
bacteriophages, a majority of the genes annotated are hypothetical proteins,
also known as No Known Function (NKF) proteins. They are expressed by the
phages, but research has not identified a possible function. Wet lab research
into the functions of the hundreds of NKF phages genes would be costly and
could take years. Bioinformatics methods could be used to determine putative
functions and functional categories for these hypothetical proteins. A new
bioinformatics method using algorithms such as Domain Assignments, Hidden
Markov Models, Structure Prediction, Sub-Cellular Localization, and iterative
algorithms is proposed here. This new method was tested on the bacteriophage
genome PotatoSplit and dropped the number of NKF genes from 57 to 40. A total of 17 new
functions were found. The functional class was identified for an additional six
proteins, though no specific functions were named. Structure Prediction and
Simulations were tested with a focus on two NKF proteins within lytic phages
and both returned possible functional categories with high confidence.</p>
<p>Additionally, this research focuses on the possibility
of phage therapy and FDA regulation. A database of phage proteins was built and
tested using R Statistical Analysis to determine proteins significant to phage
infecting <i>M. tuberculosis</i> and to the lytic cycle of phages. The statistical
methods were also tested on both pharmaceutical products recalled by the FDA
between 2012 and 2018 to determine ingredients/manufacturing steps that could
affect product quality and on the FDA Adverse Event Reporting System (FAERS)
data to determine if AERs could be used to judge the quality of a product. Many
significant excipients/manufacturing steps were identified and used to score products
on their quality. The AERs were evaluated on two case studies with mixed
results. </p>
|
213 |
The labyrinth of protein classification: a pipeline forselection and classification of biological dataPelosi, Benedetta January 2022 (has links)
Recent progress in fundamental biological sciences and medicine has considerably increased the quantity ofdata that can be studied and processed. The main limitation now is not retrieving data, but rather extractinguseful biological insights from the large datasets accumulated. More recent advances have provided detailedhigh-density data regarding metabolism (metabolomics) and protein expression (proteomics). Clearly, no single analytic methods, can provide a comprehensive understanding. Rather, the ability to link available datatogether in a coherent manner is required to obtain a complete view. The improving application of MachineLearning (ML) techniques provides the means to make continuous progress in processing complex data sets.A brief discussion is offered on the advantages of ML, the state-of-the-art in Deep Learning (DL) for proteinpredictions and the importance of ML in biological data processing. Noise stemming from incorrect classification or arbitrary/ambiguous labelling of data may arise when ML techniques are applied to large datasets. Furthermore, the stochasticity of biological systems needs to be considered for correctly evaluating theoutputs. Here we show the potential of a workflow to respond biological questions taking into consideration aperturbation of the biological data. For controlling the applicability of models and maximizing the predictivity, in silico filtering schemescan usefully be applied as an “Ockham’s razor” before using any ML technique. After reviewing differentDL approaches for protein prediction purposes, this work shows that a computational approach in filteringsteps is a valuable tool for proteins classification when biological features are not fully annotated or reviewed.The in silico approach has identified putative proline transporters in fungi and plants as well as carotenoidbiosynthetic gene products in the plant family Brassicaceae. The proposed method is suitable for extractingfeatures of classification and then maximizing the use of a DL approach.
|
214 |
Computational prediction of cell-cell interactions in the brain-tumour microenvironmentCamargo Romera, Paula January 2023 (has links)
Glioblastoma is the fastest-growing, and the most common malignant brain tumour in adults. It is normally treated with surgery and radio- or chemotherapy, but the approximate life expectancy is of 15 months with a high probability of cancer recurring. Therefore, there is a need for decreasing its severity. Bulk and single-cell RNA sequencing allow the identification of cellular states in tumours affected by cell-intrinsic and extrinsic factors. Four different cellular states have been identified in glioblastoma: neural progenitor-like, oligodendrocyte progenitor-like, astrocyte-like, and mesenchymal-like. As glioblastoma is an immunosuppressive tumour, it can alter the immune system and increase the tumour's immune escaping by segregating immunosuppressive factors or interacting with the brain microenvironment.Two datasets were used in this study to explore if the localization of the tumour in the brain microenvironment and the tendency of glioblastomas to activate microglial cells are due to particular ligand-receptor interactions. Data quality control was applied to both datasets and SingleCellSignalR and CellphoneDB packages were used to predict the possible interactions. A total of seven experiments were designed for this study. The first dataset, GBmap, allowed us to do a comparison between tumour cells and microglia, tumour cells and other cell types in the brain, and the four cellular states of glioblastoma with microglia and macrophages. Next, healthy microglia from GBmap was used to compare with the tumour bulk data from the second dataset, HGCC. The bootstrap technique was performed to compare bulk data vs single-cell data, and a comparison between tumour cells and microglia or other cell types was analysed.Results showed specific and shared interactions between cell types or cellular states, revealing the different localization of the tumour cells depends on the expressed ligand-receptor pairs. Also, a total of four patterns of interactions were found in the 50 samples to have a different tendency to activate microglial cells, which are promising results to further explore drugs to interfere with or how these interactions are related to patient survival. Furthermore, even if glioblastoma is a heterogenous disease, more interactions were predicted with microglial/macrophage cells without a uniform pattern between patients, and therefore, this study is a starting point upon which further in vitro studies would be needed to study the predicted interactions as potential targets to stop the progression of this type of cancer.
|
215 |
Predicting Biomarkers/ Candidate Genes involved in iALL, using Rough Sets based Interpretable Machine Learning Model.Pulinkala, Girish January 2023 (has links)
Acute lymphoblastic leukemia is a hematological malignancy that gains a proliferative advantage and originates in the bone marrow. One of the more common genetic alterations in ALL is KMT2A-rearrangement which constitutes 80% of the cases of ALL in infants. Patients carrying the KMT2A rearrangement have a poor prognosis and will eventually develop drug resistance. This project aimed to find new therapeutic targets which would help in the development of novel drugs. We designed a model which uses gene expression data, to infer expressions of oncogenes and the genes which could be associated with immune pathways. The data was extracted and transformed by removing the batch effects and identifying the biotypes of these genes for more focused research. Here we utilized exome RNA-seq, hence it was necessary to reduce the high dimensionality of the data. The dimensionality reduction was performed using Monte Carlo Feature Selection. After the feature selection, a list of highly significant genes was obtained. These genes were used in a machine learning model, R.ROSETTA, which produces rule-based results centered on rough sets theory. The rules were visualized using VisuNet, an interactive tool that creates networks from the rules. Among others, we identified levels of expressions of genes such as JAK3, TOX3, and DMRTA1 and their relations to other genes using the machine learning model. These significant genes were also used to do pathway analysis using pathfindR which allowed us to infer the oncogenic pathways. The pathway analysis helped us deduce pathways such as immunodeficiency and other signaling pathways that could be potential drugs
|
216 |
Improving specimen identification: Informative DNA using a statistical Bayesian methodLou, Melanie 04 1900 (has links)
<p>This work investigates the assignment of unknown sequences to their species of origin. In particular, I examine four questions: Is existing (GenBank) data reliable for accurate species identification? Does a segregating sites algorithm make accurate species identifications and how does it compare to another Bayesian method? Does broad sampling of reference species improve the information content of reference data? And does an extended model (of the theory of segregating sites) describe the genetic variation in a set of sequences (of a species or population) better? Though we did not find unusually similar between-species sequences in GenBank, there was evidence of unusually divergent within-species sequences, suggesting that caution and a firm understanding of GenBank species should be exercised before utilizing GenBank data. To address challenging identifications resulting from an overlap between within- and between species variation, we introduced a Bayesian treeless statistical assignment method that makes use of segregating sites. Assignments with simulated and <em>Drosophila</em> (fruit fly) sequences show that this method can provide fast, high probability assignments for recently diverged species. To address reference sequences with low information content, the addition of even one broadly sampled reference sequence can increase the number of correct assignments. Finally, an extended theory of segregating sites generates more realistic probability estimates of the genetic variability of a set of sequences. Species are dynamic entities and this work will highlight ideas and methods to address dynamic genetic patterns in species.</p> / Doctor of Philosophy (PhD)
|
217 |
Understanding lineage-specific biology through comparative genomicsLi, Yang January 2014 (has links)
A major challenge in biology is to identify how different species arose and acquired distinct phenotypic traits. High-throughput sequencing is transforming our understanding of biology by allowing us to study genomes and cellular processes at genome-wide levels. Only a decade subsequent to the publication of the first human genome draft, genome assemblies of hundreds of organisms have been produced. Yet, genome analysis remains challenging and advances have lagged far behind our sequencing abilities and other technological advances. The next generation of comparative genomicists must therefore understand, invent and apply a wide number of computational tools in order to study biology in the most efficient manner and in order to pose the most interesting questions. This thesis spans areas covering evolutionary genomics, gene regulation, and computational methods development. A major aim was to understand how genetic variation contributes to variation in phenotypic traits. This was approached using a large variety of evolutionary and comparative genomics tools. In particular, high-throughput sequencing datasets were analysed to study single-cell transcriptomics, gene duplications, gene architecture evolution, and alternative splicing. Additionally, in cases where off-the-shelf analysis tools were inexistent, novel pipelines and programs were designed and implemented to solve algorithmic problems such as scaffolding genome assemblies and short-read mapping onto small exons.
|
218 |
Uncovering the Transcription Factor Network Underlying Mammalian Sex DeterminationNatarajan, Anirudh January 2014 (has links)
<p>Understanding transcriptional regulation in development and disease is one of the central questions in modern biology. The current working model is that Transcription Factors (TFs) combinatorially bind to specific regions of the genome and drive the expression of groups of genes in a cell-type specific fashion. In organisms with large genomes, particularly mammals, TFs bind to enhancer regions that are often several kilobases away from the genes they regulate, which makes identifying the regulators of gene expression difficult. In order to overcome these obstacles and uncover transcriptional regulatory networks, we used an approach combining expression profiling and genome-wide identification of enhancers followed by motif analysis. Further, we applied these approaches to uncover the TFs important in mammalian sex determination.</p><p>Using expression data from a panel of 19 human cell lines we identified genes showing patterns of cell-type specific up-regulation, down-regulation and constitutive expression. We then utilized matched DNase-seq data to assign DNase Hypersensitivity Sites (DHSs) to each gene based on proximity. These DHSs were scanned for matches to motifs and compiled to generate scores reflecting the presence of TF binding sites (TFBSs) in each gene's putative regulatory regions. We used a sparse logistic regression classifier to classify differentially regulated groups of genes. Comparing our approach to proximal promoter regions, we discovered that using sequence features in regions of open chromatin provided significant performance improvement. Crucially, we discovered both known and novel regulators of gene expression in different cell types. For some of these TFs, we found cell-type specific footprints indicating direct binding to their cognate motifs.</p><p>The mammalian gonad is an excellent system to study cell fate determination processes and the dynamic regulation orchestrated by TFs in development. At embryonic day (E) 10.5, the bipotential gonad initiates either testis development in XY embryos, or ovarian development in XX embryos. Genetic studies over the last 3 decades have revealed about 30 genes important in this process, but there are still significant gaps in our understanding. Specifically, we do not know the network of TFs and their specific combinations that cause the rapid changes in gene expression observed during gonadal fate commitment. Further, more than half the cases of human sex reversal are as yet unexplained. </p><p>To apply the methods we developed to identify regulators of gene expression to the gonad, we took two approaches. First, we carried out a careful dissection of the transcriptional dynamics during gonad differentiation in the critical window between E11.0 and E12.0. We profiled the transcriptome at 6 equally spaced time points and developed a Hidden Markov Model to reveal the cascades of transcription that drive the differentiation of the gonad. Further, we discovered that while the ovary maintains its transcriptional state at this early stage, concurrent up- and down-regulation of hundreds of genes are orchestrated by the testis pathway. Further, we compared two different strains of mice with differential susceptibility to XY male-to-female sex reversal. This analysis revealed that in the C57BL/6J strain, the male pathway is delayed by ~5 hours, likely explaining the increased susceptibility to sex reversal in this strain. Finally, we validated the function of Lmo4, a transcriptional co-factor up-regulated in XY gonads at E11.6 in both strains. RNAi mediated knockdown of Lmo4 in primary gonadal cells led to the down-regulation of male pathway genes including key regulators such as Sox9 and Fgf9. </p><p>To find the enhancers in the XY gonad, we conducted DNase-seq in E13.5 XY supporting cells. In addition, we conducted ChIP-seq for H3K27ac, a mark correlated with active enhancer activity. Further, we conducted motif analysis to reveal novel regulators of sex determination. Our work is an important step towards combining expression and chromatin profiling data to assemble transcriptional networks and is applicable to several systems.</p> / Dissertation
|
219 |
Computational discovery of cis-regulatory modules in human genome by genome comparisonMok, Kwai-lung., 莫貴龍. January 2008 (has links)
published_or_final_version / Biochemistry / Master / Master of Philosophy
|
220 |
Finding motif pairs from protein interaction networksSiu, Man-hung., 蕭文鴻. January 2008 (has links)
published_or_final_version / Computer Science / Master / Master of Philosophy
|
Page generated in 0.1241 seconds