12 March 2016
Host-virus systems biology seeks to elucidate the complex interactions between a virus and its host, and to determine the downstream consequences of these interactions for the host. Traditional studies of host-virus interactions, conducted one-at-a-time, yield high-quality results, but these have limited scope. By contrast, systems biology uses a holistic approach to examine many interactions simultaneously, thereby increasing the breadth of interactions revealed. However, these studies have largely focused on common human pathogens (e.g., influenza or HIV), and their results may not apply to unrelated viruses, such as those that cause hemorrhagic fevers. Combining experimental and computational techniques can yield novel information about host-virus interactions that traditional virological or purely computational systems-biology methods cannot uncover. In this thesis, I demonstrate the utility of combined experimental and computational approaches by: (1) revealing general principles of host-virus interactions, broadly applicable to a wide range of viruses; and (2) probing a specific host-virus interaction system to identify transcriptional signatures which elucidate host response to Ebola virus. I identify general mechanisms governing host-virus protein-protein interactions (PPIs) using domain-resolved PPI networks. This method identifies mechanistic differences between virus-human and within-human interactions, such as the preference viral proteins exhibit for binding human proteins containing linear motif-binding domains. Using domain-resolved PPIs reveals novel signatures of pleiotropy, economy, and convergent evolution in the viral-host interactome not previously identified in other PPI networks. I further identify transcriptional signatures of host response to Ebola virus (EBOV) infection by pairing high-throughput microarray data with advanced pathway analyses. I compare EBOV-infected, non-human primates with and without anticoagulant treatment, to identify transcriptional signatures associated with survival following infection. Having found that CCAAT-enhancer binding proteins (CEBPs) are associated with survival, I determine the role CEBPs have in EBOV infection by using comparative microarray analysis of multiple viral infections of hemorrhagic and non-hemorrhagic origin. I also identify unique transcriptional changes in the host that distinguish EBOV infection from other viral infections, such as Influenza. Integrating these two areas of research provides information about universally applicable patterns of viral infection, while simultaneously examining the consequences of specific host-pathogen interactions.
08 April 2016
Infectious diseases derive from organisms such as viruses, bacteria, fungi and parasites that can be passed from person to person, transmitted via bites from insects or animals, or acquired through ingestion of contaminated food or water or environmental exposure. Infectious diseases cause roughly 20% of annual deaths worldwide, including many children under the age of five. In developing countries, these diseases remain a major public health problem. They can also cause societal and economic burdens through life-long disability. We need a better understanding of these diseases with a view towards the goals of prevention and cure. The advent of whole-genome transcriptional profiling technology and powerful computational resources has made it possible to study infectious diseases on a genome-wide scale. Such studies can lead to improvements in diagnostic tools as well as preventive measures such as vaccines. The work of this thesis focuses on a number of projects with the common thread of developing and applying of computational methods to extract biological information from high-throughput transcriptional data related to infectious diseases. These include (1) the identification of gene signatures related to B-cell proliferation that predict an influenza vaccine-induced antibody response; (2) study of the physiological state of the Plasmodium falciparum malaria parasite when sequestered in human tissue; (3) identifying the similarity and differences of the response to five anti-viral vaccines. To achieve the scientific goals of these projects I developed two new computational methods that can be utilized more broadly for the downstream interpretation of results from enrichment analyses of whole transcriptome profiles. There are a combined visualization and annotation approach called the Constellation Map and the Leading Edge Metagene Detector that systematically consolidates functionally related genes from multiple sets representing highly enriched biological pathways and processes in the comparison of expression data of two biological phenotypes. The application of those computational approaches and tools in this dissertation enabled a better understanding of the biological mechanisms related to human vaccine response. The software packages developed are freely available for use by biological investigators across many fields.
02 February 2018
This dissertation comprises four projects. I) Glycosylation is a post-translational modification that affects many physiological processes, including protein folding, cell interaction and host immune response. PglC, a phosphoglycosyl transferase (PGT) involved in the biosynthesis of N-linked glycoproteins in Campylobacter jejuni, is representative of one of the structurally simplest members of the small bacterial PGT family. The research utilizes sequence similarity network and evolutionary covariance studies to identify the catalytic core of PglC, followed by modeling its three-dimensional structure using the covariance as constraints. II) Rapid growth of fragment-based drug discovery necessitates accurate fragment library screening for targets of interest, finding strong binders with specific binding. While many high-resolution biophysical methods for fragment screening work well, docking-based virtual screening is highly desired due to the speed and cost efficiency. Beyond the key performance-determining factors like score function and search method, the goal is to learn from the experimental fragment bound structures in the PDBbinder database and to evaluate the profile of side-chain flexibility in the interface and its contribution to docking performance. III) Protein docking procedures carry out the task of predicting the structure of a protein–protein complex starting from the known structures of the individual protein components. However, the structure of one or both components frequently must be obtained by homology modeling based on known structures. This work presents a benchmark dataset of experimentally determined target complexes with a large set of sufficiently diverse template complexes identified for each target. The dataset allows developers to test their algorithms combining homology modeling and docking, in order to determine the factors that critically influence the prediction performance. IV) Human Eukaryotic Initiation Factor 4AI (heIF4AI) is the enzymatic component of a highly efficient complex, heIF4F. Its helicase activity binds and unwinds the secondary structure of mRNA at the 5' end and thus plays a crucial role in translation initiation. This research focuses on the C-terminal domain of heIF4AI and investigates its potential as an anti-cancer target by integrating the approaches of solvent mapping, docking, crystallization and NMR.
Inferring dinoflagellate genome structure, function, and evolution from short-read high-throughput RNA-SeqGibbons, Theodore Robert 19 February 2016 (has links)
<p> Dinoflagellates are a diverse and ancient lineage of globally abundant algae that have adapted to fill a diverse array of important ecological roles. Despite their importance, dinoflagellate genomes remain relatively poorly understood because of their enormous size. It is suspected that dinoflagellate genomes have expanded through rampant gene duplication, possibly using a lineage-specific mechanism that involves reinsertion of mature transcripts back into the genome, and that may rely on spliced leader trans-splicing for reactivation and processing of recycled transcripts. Draft genomes have recently been published for two extremely small endosymbiotic species. These genomes confirm expansion of nearly 10k gene families, relative to other eukaryotes. In the more complete genome, evidence for transcript recycling based on relict spliced leader sequences was found in over 5,500 genes. Genomic efforts in larger dinoflagellates have focused instead on transcriptome sequencing, but transcriptomes assembled from short-read HTS data contain very little evidence for rampant gene duplication, or for trans-splicing. I have shown that apparent disagreement with hypotheses related to ubiquitous trans-splicing and widespread gene duplication are the result of technological limitations. By leveraging the statistical power of high-throughput sequencing, I found that spliced leader suffixes as short as six nucleotides are sufficient for positive identification. I also found that isoform sequences from families of conserved paralogs are systematically collapsed during assembly, but that many of these consensus sequences can be identified using a custom SNP-calling procedure that can be combined with traditional clustering based on pairwise sequence alignment to obtain a more complete picture of gene duplication in dinoflagellates. Efficient, automated homology detection based on pairwise sequence alignment is an equally challenging problem for which there is much room for improvement. I explored alternative metrics for scoring alignments between sequences using a popular procedure based on BLAST and Markov clustering, and showed that simplified metrics perform as well or better than more popular alternatives. I also found that Markov clustering of protein sequences suffers from a serious false positive problem when compared against manual curation, suggesting that it is more appropriate for pre-clustering of very large data sets than as a complete clustering solution. </p>
Topological Analysis of Biological Pathways: Genes, MicroRNAs and Pathways Involved in Hepatocellular CarcinomaJanuary 2017 (has links)
abstract: Rewired biological pathways and/or rewired microRNA (miRNA)-mRNA interactions might also influence the activity of biological pathways. Here, rewired biological pathways is defined as differential (rewiring) effect of genes on the topology of biological pathways between controls and cases. Similarly, rewired miRNA-mRNA interactions are defined as the differential (rewiring) effects of miRNAs on the topology of biological pathways between controls and cases. In the dissertation, it is discussed that how rewired biological pathways (Chapter 1) and/or rewired miRNA-mRNA interactions (Chapter 2) aberrantly influence the activity of biological pathways and their association with disease. This dissertation proposes two PageRank-based analytical methods, Pathways of Topological Rank Analysis (PoTRA) and miR2Pathway, discussed in Chapter 1 and Chapter 2, respectively. PoTRA focuses on detecting pathways with an altered number of hub genes in corresponding pathways between two phenotypes. The basis for PoTRA is that the loss of connectivity is a common topological trait of cancer networks, as well as the prior knowledge that a normal biological network is a scale-free network whose degree distribution follows a power law where a small number of nodes are hubs and a large number of nodes are non-hubs. However, from normal to cancer, the process of the network losing connectivity might be the process of disrupting the scale-free structure of the network, namely, the number of hub genes might be altered in cancer compared to that in normal samples. Hence, it is hypothesized that if the number of hub genes is different in a pathway between normal and cancer, this pathway might be involved in cancer. MiR2Pathway focuses on quantifying the differential effects of miRNAs on the activity of a biological pathway when miRNA-mRNA connections are altered from normal to disease and rank disease risk of rewired miRNA-mediated biological pathways. This dissertation explores how rewired gene-gene interactions and rewired miRNA-mRNA interactions lead to aberrant activity of biological pathways, and rank pathways for their disease risk. The two methods proposed here can be used to complement existing genomics analysis methods to facilitate the study of biological mechanisms behind disease at the systems-level. / Dissertation/Thesis / Doctoral Dissertation Molecular and Cellular Biology 2017
abstract: Immunosignature is a technology that retrieves information from the immune system. The technology is based on microarrays with peptides chosen from random sequence space. My thesis focuses on improving the Immunosignature platform and using Immunosignatures to improve diagnosis for diseases. I first contributed to the optimization of the immunosignature platform by introducing scoring metrics to select optimal parameters, considering performance as well as practicality. Next, I primarily worked on identifying a signature shared across various pathogens that can distinguish them from the healthy population. I further retrieved consensus epitopes from the disease common signature and proposed that most pathogens could share the signature by studying the enrichment of the common signature in the pathogen proteomes. Following this, I worked on studying cancer samples from different stages and correlated the immune response with whether the epitope presented by tumor is similar to the pathogen proteome. An effective immune response is defined as an antibody titer increasing followed by decrease, suggesting elimination of the epitope. I found that an effective immune response usually correlates with epitopes that are more similar to pathogens. This suggests that the immune system might occupy a limited space and can be effective against only certain epitopes that have similarity with pathogens. I then participated in the attempt to solve the antibiotic resistance problem by developing a classification algorithm that can distinguish bacterial versus viral infection. This algorithm outperforms other currently available classification methods. Finally, I worked on the concept of deriving a single number to represent all the data on the immunosignature platform. This is in resemblance to the concept of temperature, which is an approximate measurement of whether an individual is healthy. The measure of Immune Entropy was found to work best as a single measurement to describe the immune system information derived from the immunosignature. Entropy is relatively invariant in healthy population, but shows significant differences when comparing healthy donors with patients either infected with a pathogen or have cancer. / Dissertation/Thesis / Doctoral Dissertation Molecular and Cellular Biology 2018
05 March 2017
The human microbiome ecosystem plays numerous, yet poorly understood beneficial roles in human health. It can shape the immune response and provide essential vitamins and enzymes to the host. The different environments present in the human host are a major determinant of community composition. Conversely, the presence of certain bacteria in specific parts of the human body is sometimes associated with an increased chance of pathologies. Advances in DNA sequencing have increased our understanding of the relationship of microbes with the environment. However, sequencing data alone is unlikely to provide such understanding without the help of appropriate computational models and analyses. For the first part of this thesis, I applied to the infant gut microbiome an approach previously used to understand the order of colonization of microbial biofilms. Available metagenomic sequencing data from infant fecal samples collected for 2.5 years was queried to test whether or not the gut colonization process is a multi-step process, in which the organisms that are prevalent at a given time are closely related, in their metabolic capabilities, to the organisms present at the previous time step. I further used network expansion algorithms previously developed for the study of large-scale biogeochemical evolution, to explore the dynamics and diet-dependency of the gut microbiome. These analyses suggest that metabolic relatedness among organisms is an important factor in the colonization process. The second part of my thesis explores the role of H. pylori in gastric cancer. I analyzed public microarray data for gastric AGS cancer cell lines infected with different strains of H. pylori differing in pathogenicity. Relative to uninfected AGS cell lines, low-pathogenic H. pylori strain displayed no major metabolic dysregulation, consistent with the fact that H. pylori does not cause inflammation/gastric cancer in a majority of the human population. However, gastric AGS cell lines infected with highly pathogenic strains showed more significant differences, including the upregulation of purine metabolism, possibly consistent with an inflammatory response. The results in this dissertation thus offer insights into how the interplay between metabolic activity of human-associated microbes and their surrounding environment plays an important role in the colonization process as well as in pathogenesis.
Hayes, John A.
01 January 2004
No description available.
Algorithms for reconstruction and analysis of metabolic networks, with an application to Neurospora crassaDreyfuss, Jonathan M. 12 March 2016 (has links)
In this work, I have developed optimization-based algorithms to reconstruct and analyze metabolic network models, and I have applied them to the metabolism of the filamentous fungus Neurospora crassa. The developed algorithms are: (1) LInear MEtabolite Dilution Flux Balance Analysis (limed-FBA), which predicts flux while linearly accounting for metabolite dilution; (2) One-step functional Pruning (OnePrune), which removes blocked reactions with a single compact linear program; and (3) Consistent Reproduction Of growth/no-growth Phenotype (CROP), which reconciles differences between in silico and experimental gene essentiality faster than previous approaches. Together, these algorithms comprise Fast Automated Reconstruction of Metabolism (FARM). FARM was applied to reconstruct the first genome-scale model of N. crassa metabolism. This organism has played a central role in the development of twentieth-century genetics, biochemistry and molecular biology, and continues to serve as a model organism for eukaryotic biology. The N. crassa model consists of 836 metabolic genes, 257 pathways, 6 cellular compartments, and is supported by extensive manual curation of 491 literature citations. Against an independent test set of more than 300 essential/non-essential genes that were not used to train the model, it displays 93% sensitivity and specificity. The model was also used to simulate the biochemical genetics experiments originally performed on N. crassa by comprehensively predicting nutrient rescue of essential genes and synthetic lethal interactions, and providing detailed pathway-based mechanistic explanations of the predictions. The model provides a reliable computational framework for the integration and interpretation of ongoing experimental efforts in N. crassa, and the algorithms will enhance reconstruction and analysis of high-quality genome-scale metabolic models in general.
RNA sequencing differential expression and small RNA analyses of obesity and BMI with post-mortem human brainWake, Christian 29 September 2019 (has links)
Obesity, the accumulation of body fat to excess, may cause serious negative health effects including increased risk of heart disease, type 2 diabetes, stroke and certain cancers. RNA sequencing studies in the human brain related to obesity have not been previously undertaken. I conducted both large and small RNA sequencing of hypothalamus (207 samples) and nucleus accumbens (276 samples) from individuals defined as consistently obese (124 samples), consistently normal weight as controls (148 samples) or selected without respect to BMI and falling within neither case nor control definition (211 samples), based on longitudinal BMI measures. The samples were provided by three cohort studies with brain donation programs; the Framingham Heart Study, the Religious Orders Study and the Memory Aging Project. For each brain region and large/small RNA sequencing set, differential expression of obesity, BMI, brain region and sex was performed. There are sixteen mRNAs and five microRNAs that are differentially expressed (adjusted p < 0:05) by obesity or BMI in these tissues. Some genes, such as APOBR and CES1 and some gene sets, such as Reactome’s “opioid signaling”, yielded findings with interesting implications. The small RNA sequencing data was used for novel analyses of microRNAs (miRNAs), discovering novel miRNAs and characterizing post-transcriptionally edited miRNAs (isomiRs). A custom miRNA identification analysis pipeline was built, which utilizes miRDeep* miRNA identification and result filtering based on false positive rate estimates. With this analysis I discovered over 300 novel miRNAs. Our isomiR analysis included isomiR-specific read filtering based on genome-alignment, and generated a set of isomiR reads which show editing patterns that are non-random with respect to the position and nucleotide of the edit. Specifically, purine substitution, pyrimidine substitution and 3’ polyadenylation and polyuridylation are commonly observed. The patterns of editing revealed that some miRNAs are almost always edited while others are very rarely. I developed a novel statistical test to determine differences in the isomiR profiles of individual miRNAs between two sets of samples. This method revealed 58 miRNAs with differentially edited isomiRs between the two brain regions, but none when comparing obese with control samples or male with female samples.
Page generated in 0.1269 seconds