Global ETD Search

1	Machine Learning for Variant Detection and Population Analysis in Heterogenerous Cancer Sample Jiao, Wei 28 November 2013 (has links) Cancer is a complex and deadly disease that is caused by genetic lesions in somatic cells. Further research in computational methodology for detecting and characterizing somatic mutations is necessary in order to understand the comprehensive systems level model of the roles of those lesions in cancer development. In the first project, I trained a list of supervised machine learning classifiers that classify false positive versus true positive somatic single nucleotide variants (SNVs). I was able to show an improvement of somatic SNV detection on the data set over the reported classifier. In the second project, we developed PhyloSub model that uses a nonparametric Bayesian prior over a set of trees to cluster SNVs, and infer the subclonal phylogenetic structure of tumors with uncertainty from SNV sequencing data. Experiments showed that PhyloSub model could infer the subclonal phylogenetic structure from both single and multiple tumor samples. Single nucleotide variant Machine learning Cancer heterogeneity 0715
2	Machine Learning for Variant Detection and Population Analysis in Heterogenerous Cancer Sample Jiao, Wei 28 November 2013 (has links) Cancer is a complex and deadly disease that is caused by genetic lesions in somatic cells. Further research in computational methodology for detecting and characterizing somatic mutations is necessary in order to understand the comprehensive systems level model of the roles of those lesions in cancer development. In the first project, I trained a list of supervised machine learning classifiers that classify false positive versus true positive somatic single nucleotide variants (SNVs). I was able to show an improvement of somatic SNV detection on the data set over the reported classifier. In the second project, we developed PhyloSub model that uses a nonparametric Bayesian prior over a set of trees to cluster SNVs, and infer the subclonal phylogenetic structure of tumors with uncertainty from SNV sequencing data. Experiments showed that PhyloSub model could infer the subclonal phylogenetic structure from both single and multiple tumor samples. Single nucleotide variant Machine learning Cancer heterogeneity 0715
3	Hepatic inflammation facilitates transcription-associated mutagenesis via AID activity and enhances liver tumorigenesis / 肝炎はAIDによる転写依存性の遺伝子変異導入を促進し肝発癌を助長する Matsumoto, Tomonori 23 March 2017 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(医学) / 甲第20249号 / 医博第4208号 / 新制\|\|医\|\|1020(附属図書館) / 京都大学大学院医学研究科医学専攻 / (主査)教授清水章, 教授松田道行, 教授武田俊一 / 学位規則第4条第1項該当 / Doctor of Medical Science / Kyoto University / DFAM AID hepatitis liver cancer single nucleotide variant 490
4	Protein Conformational Dynamics In Genomic Analysis January 2016 (has links) abstract: Proteins are essential for most biological processes that constitute life. The function of a protein is encoded within its 3D folded structure, which is determined by its sequence of amino acids. A variation of a single nucleotide in the DNA during transcription (nSNV) can alter the amino acid sequence (i.e., a mutation in the protein sequence), which can adversely impact protein function and sometimes cause disease. These mutations are the most prevalent form of variations in humans, and each individual genome harbors tens of thousands of nSNVs that can be benign (neutral) or lead to disease. The primary way to assess the impact of nSNVs on function is through evolutionary approaches based on positional amino acid conservation. These approaches are largely inadequate in the regime where positions evolve at a fast rate. We developed a method called dynamic flexibility index (DFI) that measures site-specific conformational dynamics of a protein, which is paramount in exploring mechanisms of the impact of nSNVs on function. In this thesis, we demonstrate that DFI can distinguish the disease-associated and neutral nSNVs, particularly for fast evolving positions where evolutionary approaches lack predictive power. We also describe an additional dynamics-based metric, dynamic coupling index (DCI), which measures the dynamic allosteric residue coupling of distal sites on the protein with the functionally critical (i.e., active) sites. Through DCI, we analyzed 200 disease mutations of a specific enzyme called GCase, and a proteome-wide analysis of 75 human enzymes containing 323 neutral and 362 disease mutations. In both cases we observed that sites with high dynamic allosteric residue coupling with the functional sites (i.e., DARC spots) have an increased susceptibility to harboring disease nSNVs. Overall, our comprehensive proteome-wide analysis suggests that incorporating these novel position-specific conformational dynamics based metrics into genomics can complement current approaches to increase the accuracy of diagnosing disease nSNVs. Furthermore, they provide mechanistic insights about disease development. Lastly, we introduce a new, purely sequence-based model that can estimate the dynamics profile of a protein by only utilizing coevolution information, eliminating the requirement of the 3D structure for determining dynamics. / Dissertation/Thesis / Doctoral Dissertation Physics 2016 Biophysics computational biophysics disease prediction precision medicine protein conformational dynamics protein evolution single nucleotide variant
5	Uma abordagem integrativa usando dados de interação proteína-proteína e estudos genéticos para priorizar genes e funções biológicas em transtorno de déficit de atenção e hiperatividade / An integrative approach using protein-protein interaction data and genetic studies to prioritize genes and biological functions in attention-deficit/hyperactivty disorder Lima, Leandro de Araujo 22 July 2015 (has links) O Transtorno de Déficit de Atenção e Hiperatividade (TDAH) é a doença do neurodesenvolvimento mais comum na infância, afetando cerca de 5,8% de crianças e adolescentes no mundo. Muitos estudos vêm tentando investigar a suscetibilidade genética em TDAH, mas sem muito sucesso. Este estudo teve como objetivo analisar variantes raras e comuns contribuindo para a arquitetura genética do TDAH. Foram gerados os primeiros dados de exoma de TDAH de 30 trios brasileiros em que o filho foi diagnosticado com TDAH esporádico. Foram analisados tanto variações de único nucleotídeo (ou SNVs, single-nucleotide variants) quanto variações de número de cópias (ou CNVs, copy-number variants), tanto nesses trios quanto em outros conjuntos de dados, incluindo uma amostra brasileira de 503 crianças/adolescentes controles, bem como resultados previamente publicados em quatro estudos com variação de número de cópias e uma meta-análise de estudos de associação ao longo do genoma. Tanto os trios quanto os controles fazem parte da Coorte de Escolares de Alto Risco para o desenvolvimento de Psicopatologia e Resiliência na Infância do Instituto Nacional de Psiquiatria do Desenvolvimento (INPD). Os resultados de trios brasileiros mostraram três padrões marcantes: casos com variações herdadas e somente SNVs de novo ou CNVs de novo, e casos somente com variações herdadas. Embora o tamanho amostral seja pequeno, pudemos ver que diferentes comorbidades são mais frequentes em casos somente com variações herdadas. Após explorarmos a composição de variações nos probandos brasileiros, foram selecionados genes recorrentes entre amostras do nosso estudo ou em bancos de dados públicos. Além disso, usando somente genes expressos no cérebro (amostras pós-mortem dos projetos Brain Atlas e Genotype-Tissue Expression), construímos uma rede de interação proteína-proteína \"in silico\" com interações físicas confirmadas por pelo menos duas fontes. Análises topológicas e funcionais dos genes da rede mostraram genes relacionados a sinapse, adesão celular, vias glutamatérgicas e serotonérgicas, o que confirma achados de trabalhos independentes na literatura indicando ainda novos genes e variantes genéticas nessas vias. / Attention-Deficit/Hyperactivity Disorder (ADHD) is the most common neuro-developmental disorder in children, affecting 5.8% of children and adolescents in the world. Many studies have attempted to investigate the genetic susceptibility of ADHD without much success. The present study aimed to analyze rare and common variants contributing to the genetic architecture of ADHD. We generated exome data from 30 Brazilian trios where the children were diagnosed with sporadic ADHD. We analyzed both single-nucleotide variants (SNVs) and copy-number variants (CNVs) in these trios and across multiple datasets, including a Brazilian sample of 503 children/adolescent controls from the High Risk Cohort Study for the Development of Childhood Psychiatric Disorders, and also previously published results of four CNV studies of ADHD involving children/adolescent Caucasian samples. The results from the Brazilian trios showed 3 major patterns: cases with inherited variations and de novo SNVs or de novo CNVs and cases with only inherited variations. Although the sample size is small, we could see that various comorbidities are more frequent in cases with only inherited variants. After exploring the rare variant composition in our 30 cases we selected genes with variations (SNVs or located in CNV regions) in our trio analysis that are recurrent in the families analyzed or in public data sets. Moreover, using only genes expressed in brain (post-mortem samples from Brain Atlas and The Genotype-Tissue Expression project), we constructed an in silico protein-protein interaction (PPI) network, with physical interactions confirmed by at least two sources. Topological and functional analyses of genes in this network uncovered genes related to synapse, cell adhesion, glutamatergic and serotoninergic pathways, both confirming findings of previous studies and capturing new genes and genetic variants in these pathways. ADHD CNV CNV Copy-number variant PPI PPI Protein-protein interactions network Redes de interação proteína-proteína Single-nucleotide variant SNV SNV TDAH Variação de número de cópias Variação de único nucleotídeo
6	Uma abordagem integrativa usando dados de interação proteína-proteína e estudos genéticos para priorizar genes e funções biológicas em transtorno de déficit de atenção e hiperatividade / An integrative approach using protein-protein interaction data and genetic studies to prioritize genes and biological functions in attention-deficit/hyperactivty disorder Leandro de Araujo Lima 22 July 2015 (has links) O Transtorno de Déficit de Atenção e Hiperatividade (TDAH) é a doença do neurodesenvolvimento mais comum na infância, afetando cerca de 5,8% de crianças e adolescentes no mundo. Muitos estudos vêm tentando investigar a suscetibilidade genética em TDAH, mas sem muito sucesso. Este estudo teve como objetivo analisar variantes raras e comuns contribuindo para a arquitetura genética do TDAH. Foram gerados os primeiros dados de exoma de TDAH de 30 trios brasileiros em que o filho foi diagnosticado com TDAH esporádico. Foram analisados tanto variações de único nucleotídeo (ou SNVs, single-nucleotide variants) quanto variações de número de cópias (ou CNVs, copy-number variants), tanto nesses trios quanto em outros conjuntos de dados, incluindo uma amostra brasileira de 503 crianças/adolescentes controles, bem como resultados previamente publicados em quatro estudos com variação de número de cópias e uma meta-análise de estudos de associação ao longo do genoma. Tanto os trios quanto os controles fazem parte da Coorte de Escolares de Alto Risco para o desenvolvimento de Psicopatologia e Resiliência na Infância do Instituto Nacional de Psiquiatria do Desenvolvimento (INPD). Os resultados de trios brasileiros mostraram três padrões marcantes: casos com variações herdadas e somente SNVs de novo ou CNVs de novo, e casos somente com variações herdadas. Embora o tamanho amostral seja pequeno, pudemos ver que diferentes comorbidades são mais frequentes em casos somente com variações herdadas. Após explorarmos a composição de variações nos probandos brasileiros, foram selecionados genes recorrentes entre amostras do nosso estudo ou em bancos de dados públicos. Além disso, usando somente genes expressos no cérebro (amostras pós-mortem dos projetos Brain Atlas e Genotype-Tissue Expression), construímos uma rede de interação proteína-proteína \"in silico\" com interações físicas confirmadas por pelo menos duas fontes. Análises topológicas e funcionais dos genes da rede mostraram genes relacionados a sinapse, adesão celular, vias glutamatérgicas e serotonérgicas, o que confirma achados de trabalhos independentes na literatura indicando ainda novos genes e variantes genéticas nessas vias. / Attention-Deficit/Hyperactivity Disorder (ADHD) is the most common neuro-developmental disorder in children, affecting 5.8% of children and adolescents in the world. Many studies have attempted to investigate the genetic susceptibility of ADHD without much success. The present study aimed to analyze rare and common variants contributing to the genetic architecture of ADHD. We generated exome data from 30 Brazilian trios where the children were diagnosed with sporadic ADHD. We analyzed both single-nucleotide variants (SNVs) and copy-number variants (CNVs) in these trios and across multiple datasets, including a Brazilian sample of 503 children/adolescent controls from the High Risk Cohort Study for the Development of Childhood Psychiatric Disorders, and also previously published results of four CNV studies of ADHD involving children/adolescent Caucasian samples. The results from the Brazilian trios showed 3 major patterns: cases with inherited variations and de novo SNVs or de novo CNVs and cases with only inherited variations. Although the sample size is small, we could see that various comorbidities are more frequent in cases with only inherited variants. After exploring the rare variant composition in our 30 cases we selected genes with variations (SNVs or located in CNV regions) in our trio analysis that are recurrent in the families analyzed or in public data sets. Moreover, using only genes expressed in brain (post-mortem samples from Brain Atlas and The Genotype-Tissue Expression project), we constructed an in silico protein-protein interaction (PPI) network, with physical interactions confirmed by at least two sources. Topological and functional analyses of genes in this network uncovered genes related to synapse, cell adhesion, glutamatergic and serotoninergic pathways, both confirming findings of previous studies and capturing new genes and genetic variants in these pathways. CNV PPI Redes de interação proteína-proteína SNV TDAH Variação de número de cópias Variação de único nucleotídeo ADHD CNV Copy-number variant PPI Protein-protein interactions network Single-nucleotide variant SNV
7	Proteogenomics for personalised molecular profiling Schlaffner, Christoph Norbert January 2018 (has links) Technological advancements in mass spectrometry allowing quantification of almost complete proteomes make proteomics a key platform for generating unique functional molecular data. Furthermore, the integrative analysis of genomic and proteomic data, termed proteogenomics, has emerged as a new field revealing insights into gene expression regulation, cell signalling, and disease processes. However, the lack of software tools for high-throughput integration and unbiased modification and variant detection hinder efforts for large-scale proteogenomics studies. The main objectives of this work are to address these issues by developing and applying new software tools and data analysis methods. Firstly, I address mapping of peptide sequences to reference genomes. I introduce a novel tool for high-throughput mapping and highlight its unique features facilitating quantitative and post-translational modification mapping alongside accounting for amino acid substitutions. The performance is benchmarked. Furthermore, I offer an additional tool that permits generation of web accessible hubs of genome wide mappings. To enable unbiased identification of post-translational modifications and amino acid substitutions for high resolution mass spectrometry data, I present algorithmic updates the mass tolerant blind spectrum comparison tool ’MS SMiV’. I demonstrate the applicability of the changes by benchmarking against a published mass tolerant database search of a high resolution tandem mass spectrometry dataset. I then present the application of ‘MS SMiV’ on a panel of 50 colorectal cancer cell lines. I show that the adaption of ‘MS SMiV’ outperforms traditional sequence database based identification of single amino acid variants. Furthermore, I highlight the utility of mass tolerant spectrum matching in combination with isobaric labelled quantitative proteomics in distinguishing between post-translational modifications and amino acid variants of similar mass. In the last part of this work I integrate both tools with a high-throughput proteogenomic identification pipeline and apply it to a pilot study of chondrocytes derived from 12 osteoarthritic individuals. I show the value of this approach in identifying variation between individuals and molecular levels and highlight them with individual examples. I show that multi-plexed proteogenomics can be used to infer genotypes of individuals.

1

Page generated in 0.0976 seconds