Global ETD Search

1	SEARCHING THE EDGES OF THE PROTEIN UNIVERSE USING DATA SCIENCE Mengmeng Zhu (8775917) 30 April 2020 (has links) <p>Data science uses the latest techniques in statistics and machine learning to extract insights from data. With the increasing amount of protein data, a number of novel research approaches have become feasible.</p><p>Micropeptides are an emerging field in the protein universe. They are small proteins with <= 100 amino acid residues (aa) and are translated from small open reading frames (sORFs) of <= 303 base pairs (bp). Traditionally, their existence was ignored because of the technical difficulties in isolating them. With technological advances, a growing number of micropeptides have been characterized and shown to play vital roles in many biological processes. Yet, we lack bioinformatics methods for predicting them directly from DNA sequences, which could substantially facilitate research in this field with minimal cost. With the increasing amount of data, developing new methods to address this need becomes possible. We therefore developed MiPepid, a machine-learning-based method specifically designed for predicting micropeptides from DNA sequences by curating a high-quality dataset and by training MiPepid using logistic regression with 4-mer features. MiPepid performed exceptionally well on holdout test sets and performed much better than existing methods. MiPepid is available for downloading, easy to use, and runs sufficiently fast.</p><p>Long noncoding RNAs (LncRNAs) are transcripts of > 200 bp and does not encode a protein. Contrary to their “noncoding” definition, an increasing number of lncRNAs have been found to be translated into functional micropeptides. Therefore, whether most lncRNAs are translated is an open question of great significance. To address this question, by harnessing the availability of large-scale human variation data, we have explored the relationships between lncRNAs, micropeptides, and canonical regular proteins (> 100 aa) from the perspective of genetic variation, which has long been used to study natural selection to infer functional relevance. Through rigorous statistical analyses, we find that lncRNAs share a similar genetic variation profile with proteins regarding single nucleotide polymorphism (SNP) density, SNP spectrum, enrichment of rare SNPs, etc., suggesting lncRNAs are under similar negative selection strength with proteins. Our study revealed similarities between micropeptides, lncRNAs, and canonical proteins and is the first attempt to explore the relationships between the three groups from a genetic variation perspective.</p><p>Deep learning has been tremendously successful in 2D image recognition. Protein binding ligand prediction is fundamental topic in protein research as most proteins bind ligands to function. Proteins are 3D structures and can be considered as 3D images. Prediction of binding ligands of proteins can then be converted to a 3D image classification problem. In addition, a large number of protein structure data are available now. We therefore utilized deep learning to predict protein binding ligands by designing a 3D convolutional neural network from scratch and by building a large 3D image dataset of protein structures. The trained model achieved an average F1 score of over 0.8 across 151 classes on the holdout test set. Compared to existing methods, our model performed better. In summary, we showed the feasibility of deploying deep learning in protein structure research.</p><p>In conclusion, by exploring various edges of the protein universe from the perspective of data science, we showed that the increasing amount of data and the advancement of data science methods made it possible to address a wide variety of pressing biological questions. We showed that for a successful data science study, the three components – goal, data, method – all of them are indispensable. We provided three successful data science studies: the careful data cleaning and selection of machine learning algorithm lead to the development of MiPepid that fits the urgent need of a micropeptide prediction method; identifying the question and exploring it from a different angle lead to the key insight that lncRNAs resemble micropeptides; applying deep learning to protein structure data lead to a new approach to the long-standing question of protein-ligand binding. The three studies serve as excellent examples in solving a wide range of data science problems with a variety of issues.</p> Bioinformatics Computational Biology Molecular Evolution Bioinformatics Software data science micropeptide Small ORF sORF coding noncoding lncRNA machine learning small protein genetic variation SNP natural selection
2	Role of a Mitochondrial Micropeptide in Regulating Innate Immune Responses Bhatta, Ankit 29 September 2020 (has links) Short ORF-encoded peptides (SEPs) are increasingly being identified as functional elements in various cellular processes. The current computational methods and experimental molecular biochemistry allow us to discover putative SEPs or micropeptides from proteogenomic datasets and experimentally validate them. Here, we identified a micropeptide produced from a putative long noncoding RNA (lncRNA) 1810058I24Rik which is downregulated in both human and murine myeloid cells exposed to lipopolysaccharide (LPS), as well as other TLR ligands and inflammatory cytokines. Analysis of lncRNA 1810058I24Rik subcellular localization revealed this transcript is localized in the cytosol, prompting us to evaluate its coding potential. In vitro translation with 35S-labeled methionine resulted in translation of a 47 amino acid micropeptide. Microscopy and subcellular fractionation studies in macrophages demonstrated endogenous expression of this peptide on the mitochondrion. We thus named this gene ‘Mitochondrial micropeptide-47 (Mm47)’. Functional studies using siRNA and Cripsr-cas9-mediated deletion in primary cells, showed that the transcriptional response downstream of TLR4 was not affected by Mm47 loss of function. In contrast, both the Crispr-cas9- and siRNA-targeted BMDM cells were compromised for Nlrp3 inflammasome responses. However, the primary macrophages derived from the Mm47 knockout mice do not require Mm47 for Nlrp3 activation, likely due to basal downregulation of a negative regulator microRNA of Nlrp3 called Mir-223. Notably, the Mm47-deficient mice are susceptible to influenza virus infection and succumb despite comparable antiviral and inflammatory response to wildtype mice. We hypothesize that the Mm47 deficiency may affect the antiviral resilience of mice due to secondary mitochondria dependent immunometabolic defect or failure of recovery from immune pathology, which warrants further investigation. This study therefore identifies a novel mitochondrial micropeptide Mm47 that is required for activation of the Nlrp3 inflammasome in cells and resistance to influenza virus infection. Broadly, this work highlights the presence of translatable ORFs is annotated noncoding RNA transcripts and underscores their importance in innate immunity and virus infection. LncRNA long noncoding RNA noncoding RNA Micropeptide Short-ORF-encoded peptides SEPs inflammation inflammasome NLRP3 Nod-like receptor bacterial LPS mitochondria innate immune signaling influenza A virus IAV Immunology and Infectious Disease Microbiology
3	Synthese und Untersuchung von Nukleobasen-funktionalisierten Peptiden / Synthesis and analysis of nucleobase-functionalized peptides Jede, Nadine 03 May 2006 (has links) No description available. 540 Chemie Mathematics and Computer Science Peptide Nukleobasen <i>α</i>-Helix Hegas PNA Mikropeptid MCoTI-29 peptides nucleobases <i>α</i>-helix hegas PNA micropeptide MCoTI-29 35 Chemie SU 000 Organische Chemie

1

Page generated in 0.0398 seconds