• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 60
  • 21
  • 9
  • 7
  • 6
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 131
  • 131
  • 74
  • 74
  • 46
  • 34
  • 26
  • 24
  • 22
  • 21
  • 20
  • 20
  • 19
  • 15
  • 15
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
101

Associação entre polimorfismos em genes relacionados à resposta inflamatória e a suscetibilidade, progressão e prognóstico do câncer gástrico / Association between polymorphisms in inflammatory response related-genes and the susceptibility, progression and prognosis of gastric cancer

Furuya, Tatiane Katsue 17 February 2017 (has links)
INTRODUÇÃO: A persistência de um microambiente cronicamente inflamado no estômago tem sido descrita como um componente crítico tanto para a iniciação, quanto para a progressão tumoral. Além disso, variações genéticas tem demonstrado influenciar na variabilidade interindividual da resposta inflamatória. Desta forma, esse estudo teve como objetivo investigar a associação de polimorfismos em genes relacionados à resposta inflamatória com o risco para o desenvolvimento do câncer gástrico, com suas variáveis anatomopatológicas e com a sobrevida global e livre de doença em uma amostra da população Brasileira. MÉTODOS: Dezesseis variantes genéticas selecionadas em onze genes (COX-2, OGG1, TNFB, TNFA, HSPA1L, HSPA1B, VEGFA, IL17F, LGALS3, PHB e TP53) foram genotipadas em 262 indivíduos controles e 178 pacientes diagnosticados com câncer gástrico. As análises de associações genéticas foram realizadas em diferentes modelos (Genótipos, Alelos, Dominante e Recessivo) considerando a amostra total de casos (N=178) e estratificada somente para os casos com o subtipo histológico difuso de Lauren (N=112). Também foi investigado o desequilíbrio de ligação entre os polimorfismos e as análises de associação com os haplótipos formados foram realizadas por meio dos softwares Haploview e PLINK. RESULTADOS: No estudo caso-controle, indivíduos portadores do alelo Pro do polimorfismo rs1042522 (TP53) apresentaram risco cerca de duas vezes maior em desenvolver o câncer gástrico em análise multivariada, sendo esse risco ainda maior quando considerado somente os casos com o subtipo difuso. Por outro lado, a presença do alelo A do polimorfismo rs699947 (VEGFA) foi associada com uma proteção para o câncer gástrico. Em relação às variáveis anatomopatológicas, os polimorfismos rs689466 (COX-2); rs1052133 (OGG1); rs699947, rs833061 e rs2010963 (VEGFA); rs4644 (LGALS3) e rs1042522 (TP53) foram significativamente associados com características de pior progressão da doença, enquanto que rs5275 (COX-2); rs2227956 (HSPA1L) e rs3025039 (VEGFA) foram associados às variáveis de melhor progressão da doença, na amostra total de casos. Também foi observado que o polimorfismo rs909253 (TNFB) foi capaz de predizer uma melhor progressão da doença quando considerado somente os casos com subtipo difuso. Além disso, em relação ao impacto sobre a sobrevida, rs909253 (TNFB) foi associado a um melhor prognóstico quando analisadas ambas as curvas de sobrevida global e livre de doença, enquanto que portadores do alelo His do polimorfismo rs4644 (LGALS3) apresentaram um pior prognóstico com menor tempo de sobrevida livre de doença. Por fim, nas análises de associação com os haplótipos, identificamos que o haplótipo CTC (formado por rs699947, rs833061 e rs2010963 do gene VEGFA), demonstrou ser um fator de maior suscetibilidade ao câncer gástrico. Também foram observadas associações entre o haplótipo GG (TNFB/TNFA) e invasão perineural; haplótipo ACG (VEGFA) e invasão sanguínea; haplótipo CTC (VEGFA) e invasão para outros órgãos; haplótipo GT (rs689466 e rs5275 do gene COX-2) e subtipo histológico intestinal. CONCLUSÕES: Os resultados desse estudo nos ajudaram a esclarecer o potencial papel desses polimorfismos em genes envolvidos com a modulação da resposta inflamatória na patogênese do câncer gástrico, indicando que variantes genéticas do hospedeiro atuam conjuntamente com outros fatores, influenciando na suscetibilidade, progressão e prognóstico dessa doença / INTRODUCTION: The chronic inflammatory microenvironment in the stomach has been described as a critical component for both tumor initiation and progression. Furthermore, genetic variants have shown to influence the interindividual variations in the inflammatory response. Therefore, we aimed to investigate whether polymorphisms in inflammatory response related-genes were associated with risk for gastric tumor development, clinical outcomes, overall and disease free survival of this disease in a Brazilian population sample. METHODS: Sixteen selected genetic variants in eleven genes (COX-2, OGG1, TNFB, TNFA, HSPA1L, HSPA1B, VEGFA, IL17F, LGALS3, PHB and TP53) were genotyped in 262 control individuals and 178 gastric cancer patients. Genetic association analyses were investigated in different models (Genotype, Allele, Dominant and Recessive) in both total sample (N=178) and stratified for the diffuse histological subtype based on Lauren´s classification (N=112). We also calculated the linkage disequilibrium among the polymorphisms and the haplotype associations were carried out using Haploview and PLINK softwares. RESULTS: In the case-control study, rs1042522 (TP53) Pro allele carriers presented about 2-fold higher risk for developing gastric cancer in a multivariate analysis and this association was even stronger when analyzing only cases with the diffuse subtype. On the other hand, the presence of A allele of rs699947 (VEGFA) was associated with a protection for developing gastric cancer. About the significant associations detected with the clinicopathological features, we found that rs689466 (COX-2); rs1052133 (OGG1); rs699947, rs833061 and rs2010963 (VEGFA); rs4644 (LGALS3) and rs1042522 (TP53) were able to predict outcomes associated with a worse progression of the disease while rs5275 (COX-2); rs2227956 (HSPA1L) and rs3025039 (VEGFA) were associated with better outcomes in the total sample. We also observed that the polymorphism rs909253 (TNFB) was able to predict a better outcome only for the individuals diagnosed with the diffuse subtype. Additionally, regarding the impact on the survival curves, rs909253 (TNFB) was associated with a better prognosis when analyzing both the overall and disease-free survivals while rs4644 (LGALS3) His allele carriers presented a worse prognosis with shorter disease-free survival. Finally, concerning the haplotype associations, we found that CTC haplotype (composed by rs699947, rs833061 and rs2010963 of VEGFA) showed an association with gastric malignancy. We also observed associations between GG haplotype (TNFB/TNFA) and perineural invasion; ACG haplotype (VEGFA) and venous vascular invasion; CTC haplotype (VEGFA) and invasion to other organs and GT haplotype (rs689466 and rs5275 of COX-2 gene) and the intestinal histologic subtype. CONCLUSIONS: These results helped us to clarify the potential role of these polymorphisms in genes involved in the modulation of the inflammatory response in the pathogenesis of gastric malignancy, highlighting that the host genetic variants act together with other factors to influence in the susceptibility, progression and prognosis of gastric cancer
102

De novo algorithms to identify patterns associated with biological events in de Bruijn graphs built from NGS data / Algorithmes de novo pour l'identification de motifs associés à des événements biologiques dans les graphes de De Bruijn construits à partir de données NGS

Ishi Soares de Lima, Leandro 23 April 2019 (has links)
L'objectif principal de cette thèse est le développement, l'amélioration et l'évaluation de méthodes de traitement de données massives de séquençage, principalement des lectures de séquençage d'ARN courtes et longues, pour éventuellement aider la communauté à répondre à certaines questions biologiques, en particulier dans les contextes de transcriptomique et d'épissage alternatif. Notre objectif initial était de développer des méthodes pour traiter les données d'ARN-seq de deuxième génération à l'aide de graphes de De Bruijn afin de contribuer à la littérature sur l'épissage alternatif, qui a été exploré dans les trois premiers travaux. Le premier article (Chapitre 3, article [77]) a exploré le problème que les répétitions apportent aux assembleurs de transcriptome si elles ne sont pas correctement traitées. Nous avons montré que la sensibilité et la précision de notre assembleur local d'épissage alternatif augmentaient considérablement lorsque les répétitions étaient formellement modélisées. Le second (Chapitre 4, article [11]) montre que l'annotation d'événements d'épissage alternatifs avec une seule approche conduit à rater un grand nombre de candidats, dont beaucoup sont importants. Ainsi, afin d'explorer de manière exhaustive les événements d'épissage alternatifs dans un échantillon, nous préconisons l'utilisation combinée des approches mapping-first et assembly-first. Étant donné que nous avons une énorme quantité de bulles dans les graphes de De Bruijn construits à partir de données réelles d'ARN-seq, qui est impossible à analyser dans la pratique, dans le troisième travail (Chapitre 5, articles [1, 2]), nous avons exploré théoriquement la manière de représenter efficacement et de manière compacte l'espace des bulles via un générateur des bulles. L'exploration et l'analyse des bulles dans le générateur sont réalisables dans la pratique et peuvent être complémentaires aux algorithmes de l'état de l'art qui analysent un sous-ensemble de l'espace des bulles. Les collaborations et les avancées sur la technologie de séquençage nous ont incités à travailler dans d'autres sous-domaines de la bioinformatique, tels que: études d'association à l'échelle des génomes, correction d'erreur et assemblage hybride. Notre quatrième travail (Chapitre 6, article [48]) décrit une méthode efficace pour trouver et interpréter des unitigs fortement associées à un phénotype, en particulier la résistance aux antibiotiques, ce qui rend les études d'association à l'échelle des génomes plus accessibles aux panels bactériens, surtout ceux qui contiennent des bactéries plastiques. Dans notre cinquième travail (Chapitre 7, article [76]), nous évaluons dans quelle mesure les méthodes existantes de correction d'erreur ADN à lecture longue sont capables de corriger les lectures longues d'ARN-seq à taux d'erreur élevé. Nous concluons qu'aucun outil ne surpasse tous les autres pour tous les indicateurs et est le mieux adapté à toutes les situations, et que le choix devrait être guidé par l'analyse en aval. Les lectures longues d'ARN-seq fournissent une nouvelle perspective sur la manière d'analyser les données transcriptomiques, puisqu'elles sont capables de décrire les séquences complètes des ARN messagers, ce qui n'était pas possible avec des lectures courtes dans plusieurs cas, même en utilisant des assembleurs de transcriptome de l'état de l'art. En tant que tel, dans notre dernier travail (Chapitre 8, article [75]), nous explorons une méthode hybride d'assemblage d'épissages alternatifs qui utilise des lectures à la fois courtes et longues afin de répertorier les événements d'épissage alternatifs de manière complète, grâce aux lectures courtes, guidé par le contexte intégral fourni par les lectures longues / The main goal of this thesis is the development, improvement and evaluation of methods to process massively sequenced data, mainly short and long RNA-sequencing reads, to eventually help the community to answer some biological questions, especially in the transcriptomic and alternative splicing contexts. Our initial objective was to develop methods to process second-generation RNA-seq data through de Bruijn graphs to contribute to the literature of alternative splicing, which was explored in the first three works. The first paper (Chapter 3, paper [77]) explored the issue that repeats bring to transcriptome assemblers if not addressed properly. We showed that the sensitivity and the precision of our local alternative splicing assembler increased significantly when repeats were formally modeled. The second (Chapter 4, paper [11]), shows that annotating alternative splicing events with a single approach leads to missing out a large number of candidates, many of which are significant. Thus, to comprehensively explore the alternative splicing events in a sample, we advocate for the combined use of both mapping-first and assembly-first approaches. Given that we have a huge amount of bubbles in de Bruijn graphs built from real RNA-seq data, which are unfeasible to be analysed in practice, in the third work (Chapter 5, papers [1, 2]), we explored theoretically how to efficiently and compactly represent the bubble space through a bubble generator. Exploring and analysing the bubbles in the generator is feasible in practice and can be complementary to state-of-the-art algorithms that analyse a subset of the bubble space. Collaborations and advances on the sequencing technology encouraged us to work in other subareas of bioinformatics, such as: genome-wide association studies, error correction, and hybrid assembly. Our fourth work (Chapter 6, paper [48]) describes an efficient method to find and interpret unitigs highly associated to a phenotype, especially antibiotic resistance, making genome-wide association studies more amenable to bacterial panels, especially plastic ones. In our fifth work (Chapter 7, paper [76]), we evaluate the extent to which existing long-read DNA error correction methods are capable of correcting high-error-rate RNA-seq long reads. We conclude that no tool outperforms all the others across all metrics and is the most suited in all situations, and that the choice should be guided by the downstream analysis. RNA-seq long reads provide a new perspective on how to analyse transcriptomic data, since they are able to describe the full-length sequences of mRNAs, which was not possible with short reads in several cases, even by using state-of-the-art transcriptome assemblers. As such, in our last work (Chapter 8, paper [75]) we explore a hybrid alternative splicing assembly method, which makes use of both short and long reads, in order to list alternative splicing events in a comprehensive manner, thanks to short reads, guided by the full-length context provided by the long reads
103

Interrogation of Nucleic Acids by Parallel Threading

Pettersson, Erik January 2007 (has links)
Advancements in the field of biotechnology are expanding the scientific horizon and a promising era is envisioned with personalized medicine for improved health. The amount of genetic data is growing at an ever-escalating pace due to the availability of novel technologies that allow massively parallel sequencing and whole-genome genotyping, that are supported by the advancements in computer science and information technologies. As the amount of information stored in databases throughout the world is growing and our knowledge deepens, genetic signatures with significant importance are discovered. The surface of such a set in the data mining process may include causative- or marker single nucleotide polymorphisms (SNPs), revealing predisposition to disease, or gene expression signatures, profiling a pathological state. When targeting a reduced set of signatures in a large number of samples for diagnostic- or fine-mapping purposes, efficient interrogation and scoring require appropriate preparations. These needs are met by miniaturized and parallelized platforms that allow a low sample and template consumption. This doctoral thesis describes an attempt to tackle some of these challenges by the design and implementation of a novel assay denoted Trinucleotide Threading (TnT). The method permits multiplex amplification of a medium size set of specific loci and was adapted to genotyping, gene expression profiling and digital allelotyping. Utilizing a reduced number of nucleotides permits specific amplification of targeted loci while preventing the generation of spurious amplification products. This method was applied to genotype 96 individuals for 75 SNPs. In addition, the accuracy of genotyping from minute amounts of genomic DNA was confirmed. This procedure was performed using a robotic workstation running custom-made scripts and a software tool was implemented to facilitate the assay design. Furthermore, a statistical model was derived from the molecular principles of the genotyping assay and an Expectation-Maximization algorithm was chosen to automatically call the generated genotypes. The TnT approach was also adapted to profiling signature gene sets for the Swedish Human Protein Atlas Program. Here 18 protein epitope signature tags (PrESTs) were targeted in eight different cell lines employed in the program and the results demonstrated high concordance rates with real-time PCR approaches. Finally, an assay for digital estimation of allele frequencies in large cohorts was set up by combining the TnT approach with a second-generation sequencing system. Allelotyping was performed by targeting 147 polymorphic loci in a genomic pool of 462 individuals. Subsequent interrogation was carried out on a state-of-the-art massively parallelized Pyrosequencing instrument. The experiment generated more than 200,000 reads and with bioinformatic support, clonally amplified fragments and the corresponding sequence reads were converted to a precise set of allele frequencies. / QC 20100813
104

Functional Analysis of the TRIB1 Locus in Coronary Artery Disease

Douvris, Adrianna 21 July 2011 (has links)
The TRIB1 locus (8q24.13) is a novel locus associated with plasma TGs and CAD risk. Trib1 is a regulator of MAPK activity, and has been shown to regulate hepatic lipogenesis and VLDL production in mice. However, the functional relationship between common SNPs at the TRIB1 locus and plasma lipid traits is unknown; TRIB1 has not been identified as an eQTL. This cluster of SNPs falls within an intergenic region 25kb to 50kb downstream of the TRIB1 coding region. By phylogenetic footprinting analysis and DNA genotyping, we identified an evolutionarily conserved region (CNS1) within the risk locus that harbours two common SNPs in tight LD with GWAS risk SNPs and significantly associated with CAD. We investigated the regulatory function of CNS1 by luciferase reporter assays in HepG2 cells and demonstrate that this region has promoter activity. In addition, the rs2001844 risk allele significantly reduces luciferase activity, suggesting that altered expression of the EST-based gene may be associated with plasma TGs. We identified an EST within the risk locus directly downstream of CNS1. We performed 5'/3' RACE using HepG2 RNA, identified multiple variants of this EST-based gene, and confirmed its transcription start site within CNS1. We hypothesize that this EST is a long noncoding RNA due to low abundance, poor conservation, and absence of significant ORF. Over-expression of a short variant implicates its function in the regulation of target gene transcription, although the mechanism of action remains unknown. We conclude that the risk locus at 8q24.13 harbours a novel EST-based gene that may explain the relationship between GWAS SNPs at this locus and plasma lipid traits.
105

Functional Analysis of the TRIB1 Locus in Coronary Artery Disease

Douvris, Adrianna 21 July 2011 (has links)
The TRIB1 locus (8q24.13) is a novel locus associated with plasma TGs and CAD risk. Trib1 is a regulator of MAPK activity, and has been shown to regulate hepatic lipogenesis and VLDL production in mice. However, the functional relationship between common SNPs at the TRIB1 locus and plasma lipid traits is unknown; TRIB1 has not been identified as an eQTL. This cluster of SNPs falls within an intergenic region 25kb to 50kb downstream of the TRIB1 coding region. By phylogenetic footprinting analysis and DNA genotyping, we identified an evolutionarily conserved region (CNS1) within the risk locus that harbours two common SNPs in tight LD with GWAS risk SNPs and significantly associated with CAD. We investigated the regulatory function of CNS1 by luciferase reporter assays in HepG2 cells and demonstrate that this region has promoter activity. In addition, the rs2001844 risk allele significantly reduces luciferase activity, suggesting that altered expression of the EST-based gene may be associated with plasma TGs. We identified an EST within the risk locus directly downstream of CNS1. We performed 5'/3' RACE using HepG2 RNA, identified multiple variants of this EST-based gene, and confirmed its transcription start site within CNS1. We hypothesize that this EST is a long noncoding RNA due to low abundance, poor conservation, and absence of significant ORF. Over-expression of a short variant implicates its function in the regulation of target gene transcription, although the mechanism of action remains unknown. We conclude that the risk locus at 8q24.13 harbours a novel EST-based gene that may explain the relationship between GWAS SNPs at this locus and plasma lipid traits.
106

Functional Analysis of the TRIB1 Locus in Coronary Artery Disease

Douvris, Adrianna 21 July 2011 (has links)
The TRIB1 locus (8q24.13) is a novel locus associated with plasma TGs and CAD risk. Trib1 is a regulator of MAPK activity, and has been shown to regulate hepatic lipogenesis and VLDL production in mice. However, the functional relationship between common SNPs at the TRIB1 locus and plasma lipid traits is unknown; TRIB1 has not been identified as an eQTL. This cluster of SNPs falls within an intergenic region 25kb to 50kb downstream of the TRIB1 coding region. By phylogenetic footprinting analysis and DNA genotyping, we identified an evolutionarily conserved region (CNS1) within the risk locus that harbours two common SNPs in tight LD with GWAS risk SNPs and significantly associated with CAD. We investigated the regulatory function of CNS1 by luciferase reporter assays in HepG2 cells and demonstrate that this region has promoter activity. In addition, the rs2001844 risk allele significantly reduces luciferase activity, suggesting that altered expression of the EST-based gene may be associated with plasma TGs. We identified an EST within the risk locus directly downstream of CNS1. We performed 5'/3' RACE using HepG2 RNA, identified multiple variants of this EST-based gene, and confirmed its transcription start site within CNS1. We hypothesize that this EST is a long noncoding RNA due to low abundance, poor conservation, and absence of significant ORF. Over-expression of a short variant implicates its function in the regulation of target gene transcription, although the mechanism of action remains unknown. We conclude that the risk locus at 8q24.13 harbours a novel EST-based gene that may explain the relationship between GWAS SNPs at this locus and plasma lipid traits.
107

Genetic architecture of complex disease in humans :a cross-population exploration

Martínez Marigorta, Urko, 1983- 12 November 2012 (has links)
The aetiology of common diseases is shaped by the effects of genetic and environmental factors. Big efforts have been devoted to unravel the genetic basis of disease with the hope that it will help to develop new therapeutic treatments and to achieve personalized medicine. With the development of high-throughput genotyping technologies, hundreds of association studies have described many loci associated to disease. However, the depiction of disease architecture remains incomplete. The aim of this work is to perform exhaustive comparisons across human populations to evaluate pressing questions. Our results provide new insights in the allele frequency of risk variants, their sharing across populations and the likely architecture of disease / La etiología de las enfermedades comunes está formada por factores genéticos y ambientales. Se ha puesto mucho empeño en describir sus bases genéticas. Este conocimiento será útil para desarrollar nuevas terapias y la medicina personalizada. Gracias a las técnicas de genotipado masivo, centenares de estudios de asociación han descrito una infinidad de genes asociados a enfermedad. Pese a ello, la arquitectura genética de las enfermedades no ha sido totalmente descrita. Esta tesis pretende llevar a cabo exhaustivas comparaciones entre poblaciones para responder diversas preguntas candentes. Nuestros resultados dan pistas sobre la frecuencia de los alelos de riesgo, su presencia entre poblaciones y la probable arquitectura de las enfermedades.
108

Dissecting heterogeneity in GWAS meta-analysis

Magosi, Lerato Elaine January 2017 (has links)
Statistical heterogeneity refers to differences among results of studies combined in a meta-analysis beyond that expected by chance. On the one hand, excessive heterogeneity can diminish power to discover genetic signals; on the other, moderate heterogeneity can reveal important biological differences among studies. Given its double-edged nature, this thesis dissects heterogeneity in genetic association meta-analyses from three vantage points. First, a novel multi-variant statistic, M is proposed to detect genome-wide (systematic) heterogeneity patterns in genetic association meta-analyses. This was motivated by the limited availability of appropriate methodology to measure the impact of heterogeneity across genetic signals, since traditional metrics (Q, I<sup>2</sup> and T<sup>2</sup>) measure heterogeneity at individual variants. Second, given that meta-analyses comprising small numbers of studies typically report imprecise summary effect estimates; GWAS-derived empirical heterogeneity priors are used to improve precision in estimation of average genetic effects and heterogeneity in smaller meta-analyses (e.g. ≤ 10 studies). Third, a critical evaluation of the Han-Eskin random-effects model shows how it can identify small effect heterogeneous loci overlooked by traditional fixed and random-effects methods. This work draws attention to the existence of genome-wide heterogeneity patterns, to reveal systematic differences among the ascertainment criteria of participating studies in a meta-analysis of coronary disease (CAD) risk. Furthermore, simulation studies with the Han-Eskin random-effects model revealed inflated genetic signals at small effect loci when heterogeneity levels were high. However, it did reveal an additional CAD risk variant overlooked by traditional meta-analysis methods. We therefore recommend a holistic approach to exploring heterogeneity in meta-analyses which assesses heterogeneity of genetic effects both at individual variants with traditional statistics and across multiple genetic signals with the M statistic. Furthermore, it is critically important to review forest plots for small effect loci identified using the Han-Eskin random-effects model amidst moderate-to-high heterogeneity (I<sup>2</sup> ≥ 40%).
109

Associação entre polimorfismos em genes relacionados à resposta inflamatória e a suscetibilidade, progressão e prognóstico do câncer gástrico / Association between polymorphisms in inflammatory response related-genes and the susceptibility, progression and prognosis of gastric cancer

Tatiane Katsue Furuya 17 February 2017 (has links)
INTRODUÇÃO: A persistência de um microambiente cronicamente inflamado no estômago tem sido descrita como um componente crítico tanto para a iniciação, quanto para a progressão tumoral. Além disso, variações genéticas tem demonstrado influenciar na variabilidade interindividual da resposta inflamatória. Desta forma, esse estudo teve como objetivo investigar a associação de polimorfismos em genes relacionados à resposta inflamatória com o risco para o desenvolvimento do câncer gástrico, com suas variáveis anatomopatológicas e com a sobrevida global e livre de doença em uma amostra da população Brasileira. MÉTODOS: Dezesseis variantes genéticas selecionadas em onze genes (COX-2, OGG1, TNFB, TNFA, HSPA1L, HSPA1B, VEGFA, IL17F, LGALS3, PHB e TP53) foram genotipadas em 262 indivíduos controles e 178 pacientes diagnosticados com câncer gástrico. As análises de associações genéticas foram realizadas em diferentes modelos (Genótipos, Alelos, Dominante e Recessivo) considerando a amostra total de casos (N=178) e estratificada somente para os casos com o subtipo histológico difuso de Lauren (N=112). Também foi investigado o desequilíbrio de ligação entre os polimorfismos e as análises de associação com os haplótipos formados foram realizadas por meio dos softwares Haploview e PLINK. RESULTADOS: No estudo caso-controle, indivíduos portadores do alelo Pro do polimorfismo rs1042522 (TP53) apresentaram risco cerca de duas vezes maior em desenvolver o câncer gástrico em análise multivariada, sendo esse risco ainda maior quando considerado somente os casos com o subtipo difuso. Por outro lado, a presença do alelo A do polimorfismo rs699947 (VEGFA) foi associada com uma proteção para o câncer gástrico. Em relação às variáveis anatomopatológicas, os polimorfismos rs689466 (COX-2); rs1052133 (OGG1); rs699947, rs833061 e rs2010963 (VEGFA); rs4644 (LGALS3) e rs1042522 (TP53) foram significativamente associados com características de pior progressão da doença, enquanto que rs5275 (COX-2); rs2227956 (HSPA1L) e rs3025039 (VEGFA) foram associados às variáveis de melhor progressão da doença, na amostra total de casos. Também foi observado que o polimorfismo rs909253 (TNFB) foi capaz de predizer uma melhor progressão da doença quando considerado somente os casos com subtipo difuso. Além disso, em relação ao impacto sobre a sobrevida, rs909253 (TNFB) foi associado a um melhor prognóstico quando analisadas ambas as curvas de sobrevida global e livre de doença, enquanto que portadores do alelo His do polimorfismo rs4644 (LGALS3) apresentaram um pior prognóstico com menor tempo de sobrevida livre de doença. Por fim, nas análises de associação com os haplótipos, identificamos que o haplótipo CTC (formado por rs699947, rs833061 e rs2010963 do gene VEGFA), demonstrou ser um fator de maior suscetibilidade ao câncer gástrico. Também foram observadas associações entre o haplótipo GG (TNFB/TNFA) e invasão perineural; haplótipo ACG (VEGFA) e invasão sanguínea; haplótipo CTC (VEGFA) e invasão para outros órgãos; haplótipo GT (rs689466 e rs5275 do gene COX-2) e subtipo histológico intestinal. CONCLUSÕES: Os resultados desse estudo nos ajudaram a esclarecer o potencial papel desses polimorfismos em genes envolvidos com a modulação da resposta inflamatória na patogênese do câncer gástrico, indicando que variantes genéticas do hospedeiro atuam conjuntamente com outros fatores, influenciando na suscetibilidade, progressão e prognóstico dessa doença / INTRODUCTION: The chronic inflammatory microenvironment in the stomach has been described as a critical component for both tumor initiation and progression. Furthermore, genetic variants have shown to influence the interindividual variations in the inflammatory response. Therefore, we aimed to investigate whether polymorphisms in inflammatory response related-genes were associated with risk for gastric tumor development, clinical outcomes, overall and disease free survival of this disease in a Brazilian population sample. METHODS: Sixteen selected genetic variants in eleven genes (COX-2, OGG1, TNFB, TNFA, HSPA1L, HSPA1B, VEGFA, IL17F, LGALS3, PHB and TP53) were genotyped in 262 control individuals and 178 gastric cancer patients. Genetic association analyses were investigated in different models (Genotype, Allele, Dominant and Recessive) in both total sample (N=178) and stratified for the diffuse histological subtype based on Lauren´s classification (N=112). We also calculated the linkage disequilibrium among the polymorphisms and the haplotype associations were carried out using Haploview and PLINK softwares. RESULTS: In the case-control study, rs1042522 (TP53) Pro allele carriers presented about 2-fold higher risk for developing gastric cancer in a multivariate analysis and this association was even stronger when analyzing only cases with the diffuse subtype. On the other hand, the presence of A allele of rs699947 (VEGFA) was associated with a protection for developing gastric cancer. About the significant associations detected with the clinicopathological features, we found that rs689466 (COX-2); rs1052133 (OGG1); rs699947, rs833061 and rs2010963 (VEGFA); rs4644 (LGALS3) and rs1042522 (TP53) were able to predict outcomes associated with a worse progression of the disease while rs5275 (COX-2); rs2227956 (HSPA1L) and rs3025039 (VEGFA) were associated with better outcomes in the total sample. We also observed that the polymorphism rs909253 (TNFB) was able to predict a better outcome only for the individuals diagnosed with the diffuse subtype. Additionally, regarding the impact on the survival curves, rs909253 (TNFB) was associated with a better prognosis when analyzing both the overall and disease-free survivals while rs4644 (LGALS3) His allele carriers presented a worse prognosis with shorter disease-free survival. Finally, concerning the haplotype associations, we found that CTC haplotype (composed by rs699947, rs833061 and rs2010963 of VEGFA) showed an association with gastric malignancy. We also observed associations between GG haplotype (TNFB/TNFA) and perineural invasion; ACG haplotype (VEGFA) and venous vascular invasion; CTC haplotype (VEGFA) and invasion to other organs and GT haplotype (rs689466 and rs5275 of COX-2 gene) and the intestinal histologic subtype. CONCLUSIONS: These results helped us to clarify the potential role of these polymorphisms in genes involved in the modulation of the inflammatory response in the pathogenesis of gastric malignancy, highlighting that the host genetic variants act together with other factors to influence in the susceptibility, progression and prognosis of gastric cancer
110

Um método para seleção de atributos em dados genômicos

Oliveira, Fabrízzio Condé de 26 November 2015 (has links)
Submitted by Renata Lopes (renatasil82@gmail.com) on 2016-05-05T18:05:07Z No. of bitstreams: 1 fabrizziocondedeoliveira.pdf: 6115188 bytes, checksum: 9810536208119e2012e4ee9015470c3e (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2016-06-07T15:41:26Z (GMT) No. of bitstreams: 1 fabrizziocondedeoliveira.pdf: 6115188 bytes, checksum: 9810536208119e2012e4ee9015470c3e (MD5) / Made available in DSpace on 2016-06-07T15:41:26Z (GMT). No. of bitstreams: 1 fabrizziocondedeoliveira.pdf: 6115188 bytes, checksum: 9810536208119e2012e4ee9015470c3e (MD5) Previous issue date: 2015-11-26 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Estudos de associação em escala genômica buscam encontrar marcadores moleculares do tipo SNP que estão associados direta ou indiretamente a um fenótipo em questão tais como, uma ou mais características do indivíduo ou, até mesmo, uma doença. O SNP pode ser a própria mutação causal ou pode estar correlacionado com a mesma por serem herdados juntos. Para identi car a região causadora ou promotora do fenótipo, a qual não é conhecida a priori, milhares ou milhões de SNPs são genotipados em amostras compostas de centenas ou milhares de indivíduos. Com isso, surge o desa o de selecionar os SNPs mais informativos no conjunto de dados genotípico, onde o número de atributos é, geralmente, muito superior ao número de indivíduos, com a possibilidade de que existam atributos altamente correlacionados e, ainda, podendo haver interações entre pares, trios ou combinações de SNPs de quaisquer ordens. Os métodos mais usados em estudos de associação em escala genômica utilizam o valor-p de cada SNP em testes estatísticos de hipóteses, baseados em regressão para fenótipos contínuos e baseados nos testes qui-quadrado ou similares em classi cação para fenótipos discretos, como ltro para selecionar os SNPs mais signi cativos. Entretanto, essa classe de métodos captura somente SNPs com efeitos aditivos, pois a relação adotada é linear. Na tentativa de superar as limitações de procedimentos já estabelecidos, este trabalho propõe um novo método de seleção de SNPs baseado em técnicas de Aprendizado de Máquina e Inteligência Computacional denominado SNP Markers Selector (SMS). O modelo é construído a partir de uma abordagem que divide o problema de seleção de SNPs em três fases distintas: a primeira relacionada à análise de relevância dos marcadores, a segunda responsável pela de nição do conjunto de marcadores relevantes que serão considerados por meio de uma estratégia de corte com base em um limite de relevância dos marcadores e, nalmente, uma fase para o re namento do processo de corte, geralmente para diminuir marcadores falsos-positivos. No SMS, essas três etapas, foram implementadas utilizando-se Florestas Aleatórias, Máquina de Vetores Suporte e Algoritmos Genéticos respectivamente. O SMS objetiva a criação de um uxo de trabalho que maximize o potencial de seleção do modelo através de etapas complementares. Assim, espera-se aumentar o potencial do SMS capturar efeitos aditivos e/ou não-aditivos com interação moderada entre pares e trios de SNPs, ou até mesmo, interações de ordens superiores com efeitos que sejam minimamente detectáveis. O SMS pode ser aplicado tanto em problemas de regressão (fenótipo contínuo) quanto de classi cação (fenótipo discreto). Experimentos numéricos foram realizados para avaliação do potencial da estratégia apresentada, com o método sendo aplicado em sete conjuntos de dados simulados e em uma base de dados real, onde a capacidade de produção de leite predita de vacas leiteiras foi medida como fenótipo contínuo. Além disso, o método proposto foi comparado com os métodos baseados no valor-p e com o Lasso Bayesiano apresentando, de forma geral, melhores resultados do ponto de vista de SNPs verdadeiros-positivos nos dados simulados com efeitos aditivos juntamente com interações entre pares e trios de SNPs. No conjunto de dados reais, baseado em 56.947 SNPs e um único fenótipo relativo à produção de leite, o método identi cou 245 QTLs associados à produção e à composição do leite e 90 genes candidatos associados à mastite, à produção e à composição do leite, sendo esses QTLs e genes identi cados por estudos anteriores utilizando outros métodos de seleção. Assim, o método demonstrou ser competitivo frente aos métodos utilizados para comparação em cenários complexos, com dados simulados ou reais, o que indica seu potencial para estudos de associação em escala genômica em humanos, animais e vegetais. / Genome-wide association studies have as main objective to discovery SNP type molecular markers associated directly or indirectly to a speci c phenotype related to one or more characteristics of an individual or even a disease. The SNP could be the causative mutation itself or correlated with the causative mutation due to common inheritance. Aiming to identify the causal or promoter region of the phenotype, which is unknown a priori, thousands or millions of SNPs are genotyped in samples composed of hundreds or thousands of individuals. Therefore, emerges the necessity to confront a challenge of selecting the most informative SNPs in genotype data set where the number of attributes are, usually, much higher than the number of individuals. Besides, the possibility of highly correlated attributes should be considered, as well as interactions between pairs, trios or combinations of high order SNPs. The most usual methods applied on genomewide association studies adopt the p-value of each SNP as a lter to select the SNPs most signi cant. For continuous phenotypes the statistical regression-based hypothesis test is used and the Chi-Square test or similar for classi cation of discrete phenotypes. However, this class of methods capture only SNPs with additive e ects, due to the linear relationship considered. In an attempt to overcome the limitations of established procedures, this work proposes a new SNPs selection method, named SNP Markers Selector (SMS), based on Machine Learning and Computational Intelligence strategies. The model is built considering an approach which divides the SNPs selection problem in three distinct phases: the rst related to the evaluation of the markers relevance, a second responsible for the de nition of the set of the relevant markers that will be considered by means of a cut strategy based on a threshold of markers relevance and, nally, a phase for the re nement of the cut process, usually to diminish false-positive markers. In the SMS, these three steps were implemented using Random Forests, Support Vector Machine and Genetic Algorithms, respectively. The SMS intends to create a work ow that maximizes the SNPs selection potential of the model due to the adoption of steps considered complementary. In this way, there is an increasing expectation on the performance of the SMS to capture additive e ects, moderate non-additive interaction between pairs and trios of SNPs, or even, higher order interactions with minimally detectable e ects. The SMS can be applied both in regression problems (continuous phenotype) as in classi cation problems (discrete phenotype). Numerical experiments were performed to evaluate the potential of the strategy, with the method being applied in seven sets of simulated data and in a real data set, where milk production capacity predicated of dairy cows was measured as continuous phenotype. Besides, the comparison of the proposed method with methods based on p-value and Lasso Bayesian technique indicate, in general, competitive results from the point of view of true-positive SNPs using simulated data set with additive e ects in conjunction with interactions of pairs and trios of SNPs. In the real data, based on 56,947 SNPs and a single phenotype of milk production, the method identi ed 245 QTLs associated with milk production and composition and 90 candidate genes associated with mastitis, milk production and composition, standing out that these QTLs and genes were identi ed by previous studies using other selection methods. Thus, the experiments showed the potential of the method in relation to other strategies when complex scenarios with simulated or real data are adopted, indicating that the work ow developed to guide the construction of the method should be considered for genome-wide asociation studies in humans, animals and plants.

Page generated in 0.1042 seconds