Global ETD Search

271	Efficient analysis of complex, multimodal genomic data Acharya, Chaitanya Ramanuj January 2016 (has links) <p>Our primary goal is to better understand complex diseases using statistically disciplined approaches. As multi-modal data is streaming out of consortium projects like Genotype-Tissue Expression (GTEx) project, which aims at collecting samples from various tissue sites in order to understand tissue-specific gene regulation, new approaches are needed that can efficiently model groups of data with minimal loss of power. For example, GTEx project delivers RNA-Seq, Microarray gene expression and genotype data (SNP Arrays) from a vast number of tissues in a given individual subject. In order to analyze this type of multi-level (hierarchical) multi-modal data, we proposed a series of efficient-score based tests or score tests and leveraged groups of tissues or gene isoforms in order map genomic biomarkers. We model group-specific variability as a random effect within a mixed effects model framework. In one instance, we proposed a score-test based approach to map expression quantitative trait loci (eQTL) across multiple-tissues. In order to do that we jointly model all the tissues and make use of all the information available to maximize the power of eQTL mapping and investigate an overall shift in the gene expression combined with tissue-specific effects due to genetic variants. In the second instance, we showed the flexibility of our model framework by expanding it to include tissue-specific epigenetic data (DNA methylation) and map eQTL by leveraging both tissues and methylation. Finally, we also showed that our methods are applicable on different data type such as whole transcriptome expression data, which is designed to analyze genomic events such alternative gene splicing. In order to accomplish this, we proposed two different models that exploit gene expression data of all available gene-isoforms within a gene to map biomarkers of interest (either genes or gene-sets) in paired early-stage breast tumor samples before and after treatment with external beam radiation. Our efficient score-based approaches have very distinct advantages. They have a computational edge over existing methods because they do not need parameter estimation under the alternative hypothesis. As a result, model parameters only have to be estimated once per genome, significantly decreasing computation time. Also, the efficient score is the locally most powerful test and is guaranteed a theoretical optimality over all other approaches in a neighborhood of the null hypothesis. This theoretical performance is born out in extensive simulation studies which show that our approaches consistently outperform existing methods both in statistical power and computational speed. We applied our methods to publicly available datasets. It is important to note that all of our methods also accommodate the analysis of next-generation sequencing data.</p> / Dissertation Bioinformatics Genetics Biostatistics DNA methylation efficient score test expression quantitative trait loci genome wide study multimodal genomic data next-generation sequencing data
272	Genome-wide analysis of selection in mammals, insects and fungi Ridout, Kate E. January 2012 (has links) Characterising and understanding factors that affect the rate of molecular evolution in proteins has played a major part in the development of evolutionary theory. The early analyses of amino acid substitutions stimulated the development of the neutral theory of molecular evolution, which later evolved into the nearly neutral theory. More recent work has lead to a better understanding of the role selection plays at the molecular level, but there is still limited understanding of how higher levels of protein organisation affect the way natural selection acts. The investigation of this question is the central aim of this thesis, which is addressed via the analysis of selective pressures in secondary protein structures in insects, mammals and fungi. The analyses for the first two groups were conducted using publically available datasets. To conduct the analyses in fungi, genome sequence data from the fungal genus Microbotryum (sequenced in our laboratory) was assembled and annotated, resulting in the development of a number of bioinformatics tools which are described here. The fungal, insect and mammalian datasets were interrogated with regard to a number of structural features, such as protein secondary structure, position of a site with regard to adaptively evolving sites, hydropathy and solvent-accessibility. These features were correlated with the signals of positive and purifying selection detected using phylogenetic maximum likelihood and Bayesian approaches. I conclude that all of the factors examined can have an effect on the rate of molecular evolution. In particular, disordered and hydrophilic regions of the protein are found to experience fewer physiochemical constraints and contain a higher proportion of adaptively evolving sites. It is also revealed that positively selected residues are ‘clustered’ together spatially, and these trends persist in the three taxa. Finally, I show that this variation in adaptive evolution is a result of both selective events and physiochemical constraint. 572.838
273	Identification des bases génétiques des myopathies à multi-minicores avec ou sans cardiomyopathie Chauveau, Claire 09 1900 (has links) Thèse réalisée en cotutelle avec l'Université Pierre et Marie Curie, Paris 6(UPMC, Paris, France). / Bien que les bases physiopathologiques de beaucoup de maladies musculaires soient dorénavant connues, les myopathies congénitales à cores (MCs), maladies génétiques qui se présentent dès la naissance avec un retard du développement moteur, une faiblesse musculaire et des complications respiratoires et/ou cardiaques parfois mortelles, demeurent mal comprises. Des mutations dans RYR1, SEPN1, TTN, ACTA1, CFL2 et MEGF10 ont été associées aux MCs, pourtant, dans plus de 50% des cas, le gène responsable reste à identifier. L’objectif de ma thèse a été de clarifier les mécanismes physiopathologiques des MCs par l’identification de nouveaux gènes ou de nouvelles mutations. Cette thèse a eu une dimension internationale concrétisée par la mise en place d’une cotutelle UPMC (France) et UdeM (Québec). J’ai développé deux axes de recherche complémentaires. D’une part j’ai étudié 21 familles informatives avec MC récessive, scoliose et atteinte respiratoire, en combinant clonage positionnel et étude de gènes candidats et en utilisant des outils variés allant du génotypage au séquençage de nouvelle génération (NGS). En parallèle, j’ai étudié 24 familles avec une MC autosomique récessive affectant les muscles cardiaque et squelettiques et dont le phénotype était semblable à celui observé chez des patients avec des délétions dans les 6 derniers exons de TTN. Ainsi pour l'analyse de cette deuxième cohorte, nous avons appliqué une stratégie de séquençage de gène candidat ciblée sur ces exons et de NGS pour le reste du gène. Pendant mon doctorat j'ai identifié les défauts génétiques de 8 des 45 familles étudiées (18 %), et caractérisé 3 nouvelles entités médicales, dont deux MCs dues à des nouvelles mutations de TTN. Ces résultats ont servi à l’identification de nouvelles interactions protéiques de la titine et contribuent à définir TTN comme une cause majeure de pathologies musculaires cardiaques et/ou squelettiques. Une troisième nouvelle forme de MC est provoquée par une mutation d'un coactivateur transcriptionnel peu connu et jamais associé à une maladie. Ces résultats ont révélé un nouvel acteur clef et une nouvelle voie de signalisation dans la physiopathologie du muscle, ont eu un bénéfice direct en termes de conseil génétique et ouvrent la voie pour le développement de thérapies. / While the pathophysiological bases of many muscular diseases are nowadays well known, congenital core myopathies (CMs) remain poorly understood. CMs are genetic diseases which generally present at birth with delayed motor development, muscle weakness, and sometimes fatal respiratory or cardiological complications. Mutations in RYR1, SEPN1, ACTA1, TTN and MEGF10 have been associated with various CMs, yet for about 50% of CM cases the responsible gene has not been identified. The objective of my thesis was to clarify the pathophysiological mechanisms of new forms of CM through the identification of new genes or new mutations in known genes. This thesis had an international dimension as manifested by a UPMC (France) and UdeM (Québec) joint direction. I developed two complementary axes of research. First, I studied 21 informative families with a recessive CM with scoliosis and respiratory failure, for which I combined positional cloning and candidate gene studies, using various tools from genotyping to next generation sequencing (NGS). The second part of this work consisted on the analysis of 24 families with recessive CM affecting both cardiac and skeletal muscles. Their phenotype was similar to that previously observed in cases with deletions in the last 6 exons of the giant gene TTN. Thus we applied a candidate gene strategy through direct Sanger sequencing coupled with NGS for the analysis of this second cohort. During my PhD work I identified the molecular defect in 8 out of the 45 families included (18%), which led to the identification and characterization of 3 novel medical entities, including two new CMs due to novel defects of TTN. These results served to identify new titin protein interactions, and participate in the definition of TTN defects as a major cause of both cardiac and skeletal muscle conditions. A third new form of CM is due to mutations of a poorly-known transcriptional coactivator whose role in striated muscle physiology was unknown and which had never been associated to a human condition. Globally, these results unveiled a novel important protein and pathway in muscle pathophysiology, have direct health benefits (molecular diagnosis) and open the way for therapeutic investigations. Myopathie congénitale Cardiomyopathie Titine Analyse de liaison Séquençage de nouvelle génération Congenital myopathy Cardiomyopathy Titin Linkage analysis Next generation sequencing
274	Typage de la classe génotypique du gène PRDM9 à partir de données de séquençage de Nouvelle Génération Ang Houle, Marie-Armande 07 1900 (has links) Les positions des évènements de recombinaison s’agrègent ensemble, formant des hotspots déterminés en partie par la protéine à évolution rapide PRDM9. En particulier, ces positions de hotspots sont déterminées par le domaine de doigts de zinc (ZnF) de PRDM9 qui reconnait certains motifs d’ADN. Les allèles de PRDM9 contenant le ZnF de type k ont été préalablement associés avec une cohorte de patients affectés par la leucémie aigüe lymphoblastique. Les allèles de PRDM9 sont difficiles à identifier à partir de données de séquençage de nouvelle génération (NGS), en raison de leur nature répétitive. Dans ce projet, nous proposons une méthode permettant la caractérisation d’allèles de PRDM9 à partir de données de NGS, qui identifie le nombre d’allèles contenant un type spécifique de ZnF. Cette méthode est basée sur la corrélation entre les profils représentant le nombre de séquences nucléotidiques uniques à chaque ZnF retrouvés chez les lectures de NGS simulées sans erreur d’une paire d’allèles et chez les lectures d’un échantillon. La validité des prédictions obtenues par notre méthode est confirmée grâce à analyse basée sur les simulations. Nous confirmons également que la méthode peut correctement identifier le génotype d’allèles de PRDM9 qui n’ont pas encore été identifiés. Nous conduisons une analyse préliminaire identifiant le génotype des allèles de PRDM9 contenant un certain type de ZnF dans une cohorte de patients atteints de glioblastomes multiforme pédiatrique, un cancer du cerveau caractérisé par les mutations récurrentes dans le gène codant pour l’histone H3, la cible de l’activité épigénétique de PRDM9. Cette méthode ouvre la possibilité d’identifier des associations entre certains allèles de PRDM9 et d’autres types de cancers pédiatriques, via l’utilisation de bases de données de NGS de cellules tumorales. / The positions of recombination events cluster tightly together in recombination hotspots, which are determined in part by the rapidly evolving protein PRDM9 via its tri- methyltransferase activity. The locations of hotspots are determined by the repetitive ZnF array of PRDM9, which binds to DNA. Alleles of PRDM9 containing the k-ZnF have previously been associated with patients affected with childhood acute lymphoblastic leukaemia. PRDM9 alleles are notoriously difficult to type due to the repetitive nature of the ZnF arrays. Here, we propose a method to characterize the alleles of PRDM9 from next- generation sequencing samples, by identifying the number of alleles containing a specific ZnF type. Our method is based on the correlation between profiles from the sample, representing the counts of nucleotide sequences unique to each ZnF, and from ideal sets of short reads representing an allele pair. We conduct a simulation analysis to examine the validity of the predictions obtained by our method with all pairs of known alleles. We confirm that the method can accurately genotype previously unobserved PRDM9 alleles. We also conducted a preliminary analysis to identify the PRDM9 k-ZnF genotype in a cohort of paediatric glioblastoma (pGBM), a childhood cancer characterized by the recurrent mutations in the coding sequence of the histone H3, the target of the enzymatic activity of PRDM9. Although no associations of k-ZnF containing PRDM9 alleles is found in our pGBM cohort, this method opens the possibility of identifying associations between certain PRDM9 alleles with other types of early onset childhood cancers, through a data-mining effort in public cancer databases. PRDM9 Hotspots de recombinaison Cancers pédiatriques Modifications sur les histones Séquençage de Nouvelle Génération Recombination Hotspots Childhood Cancer Histone Modifications Next-Generation Sequencing
275	Évolution du génome des spartines polyploïdes envahissant les marais salés : apport des nouvelles techniques de séquençage haut-débit / Genome evolution of polyploid Spartina species invading salt-marshes : Contribution of Next-generation Sequencing technologies Ferreira de Carvalho, Julie 19 February 2013 (has links) Les Spartines jouent un rôle écologique majeur sur les marais salés. Elles représentent un excellent modèle pour appréhender les conséquences écologiques de la spéciation par hybridation et polyploïdie dans le contexte d'invasion biologique. On s'intéresse plus particulièrement, à l'hybridation récente entre une espèce hexaploïde d'origine américaine Spartina alterniflora et une espèce hexaploïde européenne S. maritima ayant donnés deux hybrides F1 (S. x townsendii et S. x neyrautii) et la nouvelle espèce envahissante allododécaploïde (S. anglica). Les nouvelles technologies de séquençage haut-débit facilitent l'exploration de ces génomes peu connus. L'assemblage et l'annotation d'un transcriptome de référence ont permis d'annoter 16 753 gènes chez les spartines hexaploïdes et d'identifier des gènes d'intérêts écologique et évolutif. Une sélection de ces gènes a ensuite été analysée à travers une étude d'expression par PCR quantitative sur les populations naturelles des 5 espèces du complexe. Les résultats ont permis de mettre en évidence une expression homogène intra-populations mais une grande variabilité entre les espèces. L'analyse du génome des Spartines a ciblé prioritairement le développement de ressources génomiques concernant l'espèce S. maritima pour l'analyse des compartiments codant et répété à l'aide de séquençage d'une banque BAC et d'un run de pyroséquençage d'ADN génomique. Les analyses ont permis d'évaluer une proportion d'éléments répétés représentant près de 30% du génome. Les données générées ont alors été comparées avec les génomes séquencés phylogénétiquement proches et ont permis de premières comparaisons entre les spartines et les autres Poaceae. / Spartina species play an important ecological role on salt marshes. They represent an excellent system to study the ecological consequences of hybrid and polyploid speciation in biological invasion contexts. In this study, we examined the effects of hybridization between the hexaploid American-native species Spartina alterniflora and the European species S. maritima, that gave rise to two F1 hybrids (S. x townsendii in England et S. x neyrautii in France) and the new invasive allododecaploid species (S. anglica). Next-generation sequencing technologies offer new perspectives to explore these previously poorly known genomes. The assembly of a reference transcriptome (from 454 Roche pyrosequencing) allowed annotation of 16,753 genes in hexaploid Spartina and identification of ecologically and evolutionary important genes. Expression levels of a subset of these genes were analyzed by quantitative PCR in Spartina natural populations. The results indicate intrapopulation homogenous expression but extreme variability between species. The European S. maritima beneficiated from genomic resource development through a BAC library and one pyrosequencing run. Our analyses estimated the relative proportions of repetitive sequences as about 30% and have identified the main transposable element families Data generated were also compared to closely related sequenced species and provided the first insights into the evolution of Spartina genomes in the Poaceae family. Evolution Ecologie Spartina Poaceae Séquençage haut-débit Transcriptome Génomique comparative Bioinformatique Evolution Ecology Spartina Poaceae Next-generation Sequencing Transcriptome Comparative genomics Bioinformatics
276	Etude des différents niveaux de régulation du stress oxydatif chez Hevea brasiliensis : implication des miRNAs / Oxidative stress regulation levels in hevea brasiliensis : miRNAs involvement Gebelin, Virginie 04 April 2012 (has links) Hevea brasiliensis est cultivé pour le caoutchouc naturel contenu dans le latex. Une exploitation intensive combinée aux stress environnementaux affectent la production de latex. L'encoche sèche ou Tapping Panel Dryness (TPD) est déclenché par un désordre physiologique complexe au sein des laticifères. Il est responsable d'une perte annuelle de production de 10 à 40% en fonction de l'âge de la plantation et des clones d'hévéa utilisés Le stress oxydatif, point de départ de la maladie, affecte l'écoulement, par la coagulation in situ des particules du caoutchouc. Chez les plantes, la réponse adaptative aux stress abiotiques dépend de la finesse de la régulation de l'expression des gènes. Ce contrôle se fait au niveau transcriptionnel mais également au niveau post-transcriptionnel. Les micro-ARNs jouent un rôle crucial en menant les ARN messagers cibles à la dégradation ou au blocage de leur traduction L'objectif de cette thèse est de comprendre et d'identifier la régulation du stress oxydatif en étudiant l'implication des micro-ARNs en réponse aux stress abiotiques et suite à l'apparition du TPD chez l'hévéa. L'isolement et le séquençage à haut débit de petits-ARNs ont permis l'identification de micro-ARNs d'hévéa. Dans un premier temps, une banque de petits ARNs a été effectuée à partir de vitroplants soumis ou non à des stress abiotiques, à laquelle s'est ajoutée dans un second temps deux banques fabriquées à partir de latex d'arbres en exploitation atteints ou non par le TPD. L'analyse de la population de petits ARNs montre une diminution de la taille des séquences en réponse à la maladie, la majorité des séquences de petits ARNs de latex étant de 21 nucléotides chez les arbres malades et 24 nucléotides chez les arbres sains. En combinant le pipeline LeARN et les données transcriptomiques, soixante huit familles de micro-ARNs conservés entre les espèces et quinze nouvelles familles de micro-ARNs ont été identifiées chez l'hévéa. Les gènes codant pour la voie de biogenèse des micro-ARNs sont présents dans le latex, suggérant leur production dans ce compartiment cellulaire particulier. L'identification des séquences de trente précurseurs de micro-ARNs ont permis d'étudier l'expression des gènes MIR en réponse aux stress abiotiques et en réponse au TPD. Les gènes MIR étudiés sont différentiellement exprimés chez des hévéas immatures en réponse au stress abiotiques et aux traitements par l'éthylène et le méthyl-jasmonate. L'abondance relative de transcrits des gènes MIR est fortement réduite par le TPD dès 5% de longueur d'encoche sèche à l'exception d'un gène.Les cibles potentielles des 83 familles de micro-ARNs ont été prédites. Ces micro-ARNs sont impliqués dans les voies de détoxication des espèces activées de l'oxygène, dans les voies de biosynthèse du caoutchouc naturel, dans les voies de biosynthèse et de signalisation de l'éthylène et du jasmonate. Trois cibles ont été validées expérimentalement dont la CuZnSOD chloroplastique, enzyme importante du système antioxydant. / Hevea brasiliensis is cultivated for natural rubber produced in latex cells. Intensive harvesting systems combined with environmental cues affect latex production. The Tapping Panel Dryness (TPD), a complex physiological disorder, causes a loss of production of 10-40%. Oxidative stress, starting point of the disease, affect latex flow because of in situ coagulation of rubber particles. In plants, the adaptation to abiotic stress relies on the fine tuning of the gene expression at the transcriptional and post-transcriptional levels. MicroRNAs play a crucial roles leading to mRNAs degradation or repression of their translation.The aim of this thesis is to understand and identify the regulation of the oxidative stress by studying the involvement of microRNAs in the regulation of abiotic stress and TPD occurrence in Hevea. Isolation and high-throughput sequencing of small RNAs allowed identifying Hevea microRNAs. Firstly, a small RNA library was constructed from in vitro plantlets subjected or not to abiotic stress, and secondly, two others small RNA libraries were constructed with latex from healthy and TPD-affected trees. Analyses of the small RNA population showed a decrease in the size of the reads in response to TPD, the majority of the small RNAs from latex being 21 nucleotides in TPD-affected trees and 24 nucleotides in healthy trees. Combining the LeARN pipeline and transcriptomic data, sixty eight microRNAs families conserved between plant species and fifteen new families were identified in Hevea. Genes involved in microRNA biogenesis are present in latex suggesting their production in this particular cellular compartment. Identification of thirty precursors of microRNAs allowed the expression analyses of the corresponding MIR genes in response to abiotic stress and upon TPD occurrence. MIR genes are differentially expressed in young plants in response to abiotic stress and in response to ethylene and methyl jasmonate treatments. Moreover, relative transcript abundance of MIR genes is strongly repressed upon TPD occurrence a soon as 5% of dry cut length except for one MIR gene.Putative targets were predicted for the 83 families. MicroRNAs are involved in ROS detoxification, natural rubber biosynthesis, ethylene and jasmonate biosynthesis and signalling pathways. Three targets were experimentally validated including the chloroplastic isoform of CuZnSOD, which is an important enzyme of the ROS-scavenging system. Hevea brasiliensis Micro-ARN Séquençage haut debit Stress abiotique Expression de gène Rubber tree Micro-RNA Next generation sequencing Abiotic stress Gene expression
277	Identification de gènes responsables d'épilepsies de l'enfant / Identification of genes implicated in childhood epilepsies Dimassi, Sarra 10 July 2017 (has links) L'épilepsie est une affection neurologique chronique qui se définit par la répétition de crises épileptiques, signe de l'hyperactivité paroxystique d'un groupe de neurones corticaux. Ces dernières années, plusieurs gènes responsables d'épilepsies monogéniques ont été mis en évidence. Notre travail avait pour objectif l'identification d'anomalies génétiques responsables ou favorisants certaines formes d'épilepsies de l'enfant. Ce travail est composé de quatre études complémentaires. La première était l'exploration pangénomique d'une cohorte de 47 patients porteurs d'épilepsie à paroxysme rolandique (EPR) par CGH array, à la recherche de variations de nombre de copies (CNV) récurrentes. Nous avons ainsi pu mettre en évidence plusieurs CNVs emportant des gènes impliqués dans l'épilepsie, dont PRRT2 et GRIN2A. La deuxième reposait sur la même approche appliquée à une cohorte de 8 patients tunisiens présentant des spasmes infantiles. Elle a permis d'identifier une délétion 9q34.3 emportant le gène EHMT1, responsable du syndrome de Kleefstra et une duplication 15q13.1, région impliquée dans des troubles du neurodéveloppement. Pour la troisième étude, nous avons comparé deux techniques de capture pour séquençage à haut débit d'un panel de gènes impliqués dans les épilepsies de l'enfant, à partir des échantillons de 24 patients épileptiques. Cette approche nous a permis de mettre au point un logiciel d'analyse de couverture, que nous avons nommé DeCovA. Lors de la dernière étude, nous avons appliqué une stratégie de séquençage d'exome en trio pour explorer 10 patients porteurs des spasmes infantiles. Nous avons ainsi pu mettre en évidence des variants pathogènes de novo chez quatre patients,impliquant les gènes KCNQ2, SCN1A, NR2F1 et ALG13. Nos résultats confirment ainsi la place importante de la génétique et l'intérêt majeur des nouvelles technologies dans l'exploration étiologique des épilepsies de l'enfant / Epilepsy is a chronic neurological disorder characterized by repeated epileptic seizures, a sign of cortical neurons paroxysmal hyperactivity. In recent years, several monogenic genes involved in epilepsy have been identified. The aim of our work is to identify new genetic abnormalities responsible for childhood epilepsies. This work is divided into four complementary studies. First, we searched copy number variation (CNV) by pangenomic exploration of a cohort of 47 patients with Rolandic epilepsy (RE) using CGH array. We identified several CNVs carrying genes involved in epilepsy, including PRRT2 and GRIN2A (genes). Secondly, the same approach was applied to a cohort of 8 Tunisian patients with infantile spasms. It allowed the identification of a 9q34.3 deletion includingEHMT1, implicated in Kleefstra syndrome and a 15q13.1 duplication, known to be involved in neurodevelopment disorders. For the third study, we compared two library-building methods for a gene-targeted panel for the diagnosis of Monogenic childhood epilepsies, in a cohort of 24 epileptic patients. This approach allowed us to develop a coverage analysis software, which we named DeCovA. In the last study, we used a trio-based exome-sequencing approach to look for de novo mutations in 10 patients with infantile spasms. We found de novo pathogenic variants in four patients, involving KCNQ2, SCN1A, NR2F1, and ALG13. Our results confirm the increasing role of genetics and the major interest of new technologies in the etiological exploration of childhood epilepsy Épilepsie CGH array Séquençage massif en parallèle Panel de genes Séquençage de l’exome Epilepsy Array CGH Next Generation sequencing Gene panel Exome sequencing 612.8
278	Characterising copy number polymorphisms using next generation sequencing data Li, Zhiwei January 2019 (has links) We developed a pipeline to identify the copy number polymorphisms (CNPs) in the Northern Swedish population using whole genome sequencing (WGS) data. Two different methodologies were applied to discover CNPs in more than 1,000 individuals. We also studied the association between the identified CNPs with the expression level of 438 plasma proteins collected in the same population. The identified CNPs were summarized and filtered as a population copy number matrix for 1,021 individuals in 243,987 non-overlapping CNP loci. For the 872 individuals with both WGS and plasma protein biomarkers data, we conducted linear regression analyses with age and sex as covariance. From the analyses, we detected 382 CNP loci, clustered in 30 collapsed copy number variable regions (CNVRs) that were significantly associated with the levels of 17 plasma protein biomarkers (p < 4.68×10-10). structural variations copy number variations copy number polymorphisms next generation sequencing Genome-wide association study Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
279	Variantes nos genes OCA2 e HERC2 associadas a fenótipos clássicos de pigmentação e estruturas secundárias presentes na íris em amostra miscigenada da população brasileira / Variants within OCA2 and HERC2 genes associated with classical pigmentation phenotypes and iris features in Brazilian admixed population sample Debortoli, Guilherme 20 June 2018 (has links) A pigmentação dos olhos, cabelos e pele, bem como presença ou ausência de sardas, está entre os exemplos mais visíveis da variação fenotípica humana. O estudo da diversidade genética em genes de pigmentação tem beneficiado diferentes áreas do conhecimento, como a área da genética e antropologia forense, bem como a área relacionada a saúde e bemestar. Adicionalmente, a presença de estruturas secundárias na íris tem sido reportada como importante fator na percepção de cor de olho observada que um indivíduo pode ter referente a íris e também a fatores de risco para algumas doenças oculares, ainda que as bases genéticas envolvidas nestas características sejam pouco conhecidas. Os genes OCA2 e HERC2 representam dois genes associados à variação normal da pigmentação. Este trabalho avaliou a relação de polimorfismos nas regiões regulatórias e codificantes destes dois genes com os fenótipos de pigmentação e estruturas secundárias presentes na íris encontrados em uma amostra populacional de 340 indivíduos do estado de São Paulo, por meio de sequenciamento de nova geração. Análises de regressão logística e linear para as variáveis qualitativas e quantitativas da cor dos olhos e estruturas secundárias presentes na íris foram realizadas. 170 pontos de variação ao longo das regiões estudadas foram identificados, dos quais 18 estão associadas a pelo menos um fenótipo de pigmentação e estruturas secundárias presentes na íris. Destaca-se a existência de muitos polimorfismos que não se mostrara-se associados quando avaliados independentemente, porém foram associados quando analisados sob a ótica de interações epistáticas, considerada uma possível explicação para a variabilidade encontrada nestes fenótipos, principalmente aqueles intermediários, como a cor dos olhos verdes e mel. O uso de variáveis quantitativas para os olhos revelou pela primeira vez a associação do polimorfismo não sinônimo rs201872292 no gene HERC2 com olhos claros, independente do efeito do polimorfismo rs12913832. Ainda, a associação do polimorfismo rs58358300 localizado em um íntron do gene HERC2 com pigmentação da esclera, o que representa a primeira vez que um polimorfismo é associado a esta característica. Este foi o primeiro estudo no Brasil que se propôs a analisar polimorfismos genéticos em genes candidatos à variação normal da pigmentação humana com estruturas secundárias presentes na íris. Os resultados confirmam a hipótese de que polimorfismos dos genes OCA2 e HERC2 podem contribuir para a formação dos fenótipos clássicos de pigmentação de olhos, pele, cabelos e estruturas secundárias presentes na íris humana dos indivíduos da população brasileira. / The pigmentation of the eyes, hair and skin, as well as the presence or absence of freckles, are amongst the most visible examples of human phenotypic variation. The study of genetic diversity in pigmentation genes has contributed greatly to the fields of forensics genetics, anthropological genetics and public health. In addition, the presence of iris features has been reported to influence the perception of overall iris color and also consists in risk factors for ocular diseases, although very little is known about the genetic basis of these traits. The OCA2 and HERC2 genes have been associated with normal variation of pigmentation in diverse populations. The present study evaluated the relationship of polymorphisms in the regulatory and coding regions of these two genes with the pigmentation phenotypes and iris features found in a population sample of 340 individuals from the state of São Paulo, Brazil, through next-generation sequencing. Logistic and linear regression analyzes for the qualitative and quantitative variables were performed. A total of 170 points of variation throughout the studied regions were identified, of which 18 were associated with at least one pigmentation phenotype when analyzed as qualitative and/or quantitative variables and iris features. It is worth mentioning that many associations that were not observed when evaluated independently, were indeed associated when analyzed from the perspective of epistatic effects, which is considered a possible explanation for the variability found in these phenotypes, especially those presented as intermediate, such as green and hazel eye colors. The use of quantitative variables to evaluate the eye color, acquired from photographs, revealed for the first time the association of the nonsynonymous mutation rs201872292 in the HERC2 gene with light eyes, independently of the effect of the rs12913832 polymorphism. We highlight the association of the polymorphism rs58358300 located in an intron of the HERC2 gene with sclera pigmentation, which was the first time that a polymorphism is associated with this feature. This was the first study in Brazil to analyze genetic polymorphisms in candidate genes related to normal variation of human pigmentation and iris features by next-generation sequencing. The results confirm the hypothesis that OCA2 and HERC2 genes may contribute to classic pigmentation phenotypes of eyes, skin, hair, freckles and iris features in the Brazilian population. Epistasia Epistasis Estruturas secundárias da íris HERC2 HERC2 Human pigmentation Iris features Next-generation sequencing OCA2 OCA2 Pigmentação humana Sequenciamento de nova geração
280	Análise de marcadores forenses (STRs e SNPs) rotineiramente empregados na identificação humana utilizando sequenciamento de nova geração / Analysis of forensic markers (STRs and SNPs) routinely used in human identification assays by means of next generation sequencing Silva, Guilherme do Valle 05 October 2018 (has links) A genética forense vem se desenvolvendo cada vez mais, com novas tecnologias e implementação de novos conjuntos de marcadores de DNA com maiores níveis de informatividade. Os marcadores genéticos são amplamente usados na identificação humana, pois permitem distinguir indivíduos com alta acurácia. Duas classes de marcadores muito utilizadas atualmente são os STRs (Short Tandem Repeats) e os SNPs (Single Nucleotide Polymorphisms). Os STRs são altamente informativos e, portanto, úteis para a prática forense. Kits mais novos como GlobalFiler (Thermo Fisher Scientific) e PowerPlex Fusion System (Promega) apresentam a análise de mais de 20 loci STRs de uma só vez. Já os SNPs, por possuírem sua informatividade mais reduzida (necessita de mais loci analisados), são menos utilizados, porém apresentam vantagem em amostras degradadas de DNA; assim, conjuntos de identificação como o 52-plex desenvolvido pelo consórcio SNPforID e o conjunto IISNPs, vêm sendo estudados em várias populações do mundo. Com o desenvolvimento de técnicas de sequenciamento de nova geração (NGS Next Generation Sequencing) para análise de DNA, a obtenção de perfis de DNA se tornou mais acurada. Algumas plataformas permitem gerar perfis de até 96 indivíduos simultaneamente. Este estudo tem por objetivo principal analisar 171 marcadores genéticos (Amelogenina, Y-INDEL, 30 STRSs e 139 SNPs) em 340 indivíduos miscigenados da região da cidade de Ribeirão Preto (SP) utilizando a plataforma de sequenciamento de nova geração MiSeq Personal Sequencer (Illumina Inc.), bem como calcular as frequências alélicas e genotípicas, verificar a aderência ao equilíbrio de HardyWeinberg e estimar parâmetros forenses para os diferentes conjuntos de marcadores. Análises de ancestralidade foram realizadas para os conjuntos de SNPs. Para o preparo das bibliotecas de amostras a serem sequenciadas, foi utilizado o kit HaloPlex (Agilent Technologies, Inc), onde foram incluídos os marcadores dos kits GlobalFiler e PowerPlex Fusion System, e os SNPs existentes no conjunto do consórcio SNPforID (52-plex) e IISNPs (92 SNPs). De todos os marcadores incluídos no ensaio, apenas um SNP (rs763869) presente no conjunto SNPforID não pôde ser analisado devido a questões técnicas. Dos 139 SNPs analisados apenas seis apresentaram desvios significativos em relação ao equilíbrio de Hardy-Weinberg,número este esperado devido ao acaso. Os conjuntos de SNPs apresentam elevada informatividade com Probabilidade de Match de 6,48 x 10-21 (52-plex) a 4,91 x 10-38 (IISNP), e Poder de Exclusão de 0,9997 (52-plex) e 0,99999997 (IISNP). De modo geral, as inferências de ancestralidade obtida utilizando estes conjuntos, indicaram elevada contribuição europeia (superior a 70%) e baixa contribuição ameríndia (inferior a 10%) na população, enquanto que as análises de mistura individual se mostraram consistentes, com a maioria dos indivíduos apresentando elevada ancestralidade europeia. Os resultados dos marcadores relativos ao sexo (Amelogenina, Y-INDEL e DYS391) foram consistentes com o sexo dos doadores das amostras. As frequências alélicas e parâmetros forenses foram calculados para os STRs, revelando uma alta informatividade. A Probabilidade de Match combinada e o Poder de Exclusão combinado foram de 1,19 x 10-36 e 0,999999999997 respectivamente. Dos 29 STRs autossômicos presentes, seis apresentaram desvios ao equilíbrio de Hardy-Weinberg, refletindo possíveis falhas no sequenciamento e genotipagem destes marcadores / The field of forensic genetics has developed increasingly with the implementation of new sets of DNA markers with higher levels of informativeness. The genetic markers are widely used in human identification as they allow distinguishing individuals with high accuracy. Two of the most commonly used markers are the Short Tandem Repeats (STRs) and the Single Nucleotide Polymorphisms (SNPs). Newer kits such as GlobalFiler (Thermo Fisher Scientific) and PowerPlex Fusion System (Promega) can analyze more than 20 STRs loci at once. When comparing with STRs, the SNPs are less informative and many more loci are needed to reach the same informativeness of STR kits. However, they are advantageous when using degraded DNA samples. The identification sets such as the 52-plex developed by the SNPforID Consortium and the IISNPs have been analyzed in many worldwide populations. With the development of next generation sequencing techniques (NGS Next Generation Sequencing), obtaining DNA profiles has become more accurate and some platforms allow generating profiles of up to 96 individuals simultaneously. The main goal of this study is to analyze 171 markers (Amelogenin, Y-INDEL, 30 STRs and 139 SNPs) in 340 admixed individuals from Ribeirão Preto, SP, using the NGS platform MiSeq Personal Sequencer (Illumina Inc.). This will allow the calculation of allele and genotype frequencies, the verification of adherence to Hardy-Weinbergs equilibrium and the estimation of forensic parameters for each set of marker. Ancestry analysis was performed for the sets of SNPs. The HaloPlex kit (Agilent Technologies, Inc) was used for library preparation including the STRs from the kits GlobalFiler and PowerPlex Fusion System and the SNPs from the SNPforID consortium (52-plex) and IISNPs (92 SNPs) identification sets. A single SNP (rs763869) from the SNPforID set was not analyzed due to technical issues. Only six of the 139 analyzed SNPs presented significant deviation from the Hardy-Weinberg equilibrium expectations, which is expected by chance alone. The SNPs sets exhibited high informativeness, with matchprobability ranging from 6.48 x 10-21 (52-plex) to 4.91 x 10-38 (IISNPs) and exclusion power of 0.9997 (52-plex) and 0.99999997 (IISNPs). In general, ancestry estimates obtained using these sets indicated a high European contribution (higher than 70%) and low Amerindian contribution (less than 10%) in the population sample, while the individual admixture analyses exhibited were highly consistent, with the majority of individuals presenting high European ancestry. The results of the sex markers (Amelogenin, Y-INDEL and DYS391) were in agreement with the reported sexes from sample donors. The allele frequencies and forensic parameters calculated for the STRs revealed high informativeness. The combined match probability and the combined exclusion power were 1.19 x 10-36 and 0.999999999997 respectively. Six of the 29 autosomal STRs presented significant deviations from the HardyWeinberg equilibrium expectations, reflecting possible failures in sequencing and genotyping of these markers Ancestralidade Ancestry Forensic genetics Genética forense Next generation sequencing Sequenciamento de nova geração Short Tandem Repeats Short Tandem Repeats Single Nucleotide Polymorphisms Single Nucleotide Polymorphisms

Search results