Global ETD Search

1	Biophysical Analysis of the AP1-DNA Interaction Seldeen, Kenneth Ladd 16 June 2009 (has links) Jun and Fos are components of the AP1 family of transcription factors that bind to the promoters of a diverse multitude of genes involved in critical cellular responses such as cell growth and proliferation, cell cycle regulation, embryonic development and cancer. The specific protein-DNA interactions are driven by the binding of basic zipper (bZIP) domains of Jun and Fos to TPA response element (TRE) and cAMP response element (CRE) within the promoters of target genes. Here, using a diverse array of biophysical techniques, including in particular isothermal titration calorimetry in conjunction with molecular modeling and semi-empirical analysis, I characterize AP1-DNA interactions in thermodynamic and structural terms. My data show that the binding of bZIP domains of Jun-Fos heterodimer to TRE and CRE are under enthalpic control accompanied by entropic penalty at physiological temperatures. This is in agreement with the notion that protein-DNA interactions are largely driven by electrostatic interactions and intermolecular hydrogen bonding. A larger than expected heat capacity change suggests that the basic regions within the bZIP domains are unstructured in the absence of DNA and interact in a coupled folding and binding manner. Further analysis demonstrates that Jun-Fos heterodimer can tolerate single nucleotide variants of the TRE consensus sequence and binds in the biologically relevant micromolar to submicromolar range. Of particular interest is the observation that the Jun-Fos heterodimer binds to specific variants in a preferred orientation. 3D atomic models reveal that such preference in orientation results from asymmetric binding and may in part be attributable to chemically distinct but structurally equivalent residues within the basic regions of Jun and Fos. I further demonstrate that binding of the biologically relevant Jun-Jun homodimer to TRE and CRE occurs with favorable enthalpic contributions accompanied by entropic penalty at physiological temperatures in a manner akin to the binding of Jun-Fos heterodimer. However, anomalously large negative heat capacity changes provoke a model whereby Jun loads onto DNA as unfolded monomers coupled with subsequent folding and homodimerization upon association. The data also reveal that the heterodimerization of leucine zippers is modulated by the basic regions and these regions may undergo at least partial folding upon heterodimerization. Large negative heat capacity changes accompanying the heterodimerization of leucine zippers are consistent with the view that leucine zippers do not retain a-helical conformation in isolation and the formation of the native coiled coil a-helical dimer is attained through a coupled folding-dimerization mechanism. Taken together, this dissertation marks the first comprehensive thermodynamic analysis of an otherwise well-studied and vitally important transcription factor. My studies shed new light on the forces driving the AP1-DNA interaction in thermodynamic and structural terms. The implications of these novel findings on the development of novel therapies for the treatment of disease with greater efficacy coupled with low toxicity cannot be overemphasized.
2	The effect of genome variation on human proteins: understanding variants and improving their deleteriousness prediction through extensive contextualisation Raimondi, Daniele 15 May 2017 (has links) Rapid technological advances are providing unprecedented insights in the biologicalsciences, with massive amounts of data generated on genomic and protein sequences.These data continue to grow exponentially, and they are extremely valuable for com-putational tools where the effect of genomic variants on human health is predicted.State of the art tools in this field give varying results and only tend to agree in thecase of single variants that are strongly correlated to disease. The aim of this workis to increase the reliability of these methods, as well as our understanding of theunderlying biological mechanisms that lead to disease. We first developed machinelearning (ML) based structural bioinformatics predictors that are able to predictmolecular features of proteins from the sequence alone. We then used these tools forin silico analysis of the molecular effects of known variants on the affected proteins,and integrated these data with other sources heterogenous sources of information,such as the essentiality of a gene, that put the variants into their broader biologicalcontext. With this information we created DEOGEN, a novel predictor in this field,which is able to deal with the two most common forms of genomic variation, namelySingle Nucleotide Variants (SNVs) and short Insertions and DELetions (INDELs).DEOGEN performs at least on par with other state of the art methods in this fieldon different datasets. The method was then extended with additional contextualdata and is now available as DEOGEN2 via a web server, which visualizes the pre-dicted results for all variants in most human proteins through an interactive interfacetargeted to both bioinformaticians and clinicians. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Sciences exactes et naturelles single nucleotide variants variant effect prediction bioinformatics cysteine oxidation machine learning
3	Screen Study of Potential Prostate Cancer Associated Genes via Single Nucleotide Variants Detection Al-Hasani, Hoor 19 December 2017 (has links) Prostate Cancer (PCa) is the second most diagnosed cancer in men across the world; it is considered the fifth leading cause of cancer related death according to cancer statistics 2012. Being a member of the internal parts in males reproductive system, testing any abnormality with the prostate gland remains both troublesome and inconvenient, and foremost inaccurate. The diagnostic practice starts with prostate-specific antigen (PSA) level testing, which in return is highly indecisive, provoking an over diagnosis and treatment. Genomic alteration and Single Nucleotide Variants (SNV s) are assumed to play a role during PCa progression. On behalf of the RIBOLUTION project, a project with the aim of finding diagnostic biomarkers from RNA sequences, SNV s in RNA sequences were analysed to pinpoint potential candidate genes in PCa. The fact that the cohort provides whole-transcriptome data of pro- static tissue promotes the possibility to obtain comprehensive knowledge of the cancerous changes. The advantage of detecting SNV s in RNA sequences relies in focusing on only those, which could be relevant to the gene’s func- tion. However, methods for detecting and analysing SNV s solely in RNA sequences are currently not yet established. This study aimed to (1) establish fitting and applicable assays to identify, inspect and conclude the potential role of SNV s in RNA sequences, (2) use the obtained knowledge to single out the genes that are potentially relevant for PCa. SNV s in the RIBOLUTION cohort were investigated. Prostate tissue was obtained from 40 PCa patients, and then RNA was sequenced using Next Generation Sequencing. In 16 patients, a pairwise prostatic tissue was taken, one a confirmed tumor tissue and the second a tumor-free tissue. As a control, samples from 8 men with benign prostatic hyperplasia were likewise sequenced. Different computational pipelines were established and successfully fulfilled the aim. The CVR Module (Calling Variants in RNA-Seq) is a computer- based pipeline intended to identify SNV s and discriminate between false positive and true positive calls. Validating the SNV s reported by the accomplished Module has shown high sensitivity (> 80% validated SNV s). Much as novel SNV s that had ∼ 101% higher median calling quality in comparison to SNV s found in dbSNP, the Single Nucleotide Polymorphism Database. In agreement with current knowledge, novel SNV s was observed in tumor samples with slight but significant increase vs. tfree tissue (P < 0.05, testing on proportion). On top of that, positive correlation between non-silent effect and novel SNV in tumor samples was also observed (P < 0.05, r = 0.33, Pearson’s correlation). Moreover, more than 40% of the candidate genes were found in COSMIC, the Catalog Of Somatic Mutations In Cancer; some of them are confirmed somatic mutation (cancer associated). About 11% were also reported in studies to be disease associated or observed in other diseases, mostly heredity related. Potential PCa associated genes were identified via combination of three different systematic methods: mutational clustering, mutational functional bias, and covariates of the mutated genes. The first method (mutational clustering), however, did not reveal any significant insight. The top candidate genes were then selected in accordance with the latter methods. The list of top candidate genes includes > 50% genes with direct association with PCa; > 80% genes previously reported in other cancer types, while ∼ 35% that are in- volved in PCa associated complexes. Besides well known and validated PCa biomarker (alpha-methylacyl-CoA racemase (AMACR)), we identify for the first time, from mutational prospective, 22% of the genes to be potentially associated with PCa. Among those, one of the most promising candidate genes is NWD1 (NACHT and WD repeat domain containing 1). This gene was mentioned in a previous study to be a potential player in PCa prognosis. We add to this our novel observation, NWD1 was found significantly mutated in the entire tumor samples. These significant findings were proven to be tumor-specific when they were compared to the available control and tumor-free (P < 0.05, non-parametric ranking). We conclude that analyzing SNV s from RNA is as useful and informative as DNA-based ones, and accomplish further benefits that could be gained once the suggested methods are adapted. info:eu-repo/classification/ddc/610 ddc:610
4	The Evolving Faces of the SARS-CoV-2 Genome Schmidt, Maria, Arshad, Mamoona, Bernhart, Stephan H., Hakobyan, Siras, Arakelyan, Arsen, Loeffler-Wirth, Henry, Binder, Hans 09 May 2023 (has links) Surveillance of the evolving SARS-CoV-2 genome combined with epidemiological monitoring and emerging vaccination became paramount tasks to control the pandemic which is rapidly changing in time and space. Genomic surveillance must combine generation and sharing sequence data with appropriate bioinformatics monitoring and analysis methods. We applied molecular portrayal using self-organizing maps machine learning (SOM portrayal) to characterize the diversity of the virus genomes, their mutual relatedness and development since the beginning of the pandemic. The genetic landscape obtained visualizes the relevant mutations in a lineage-specific fashion and provides developmental paths in genetic state space from early lineages towards the variants of concern alpha, beta, gamma and delta. The different genes of the virus have specific footprints in the landscape reflecting their biological impact. SOM portrayal provides a novel option for ‘bioinformatics surveillance’ of the pandemic, with strong odds regarding visualization, intuitive perception and ‘personalization’ of the mutational patterns of the virus genomes. info:eu-repo/classification/ddc/610 ddc:610
5	Approches bio-informatiques appliquées aux technologies émergentes en génomique Lemieux Perreault, Louis-Philippe 02 1900 (has links) Les études génétiques, telles que les études de liaison ou d’association, ont permis d’acquérir une plus grande connaissance sur l’étiologie de plusieurs maladies affectant les populations humaines. Même si une dizaine de milliers d’études génétiques ont été réalisées sur des centaines de maladies ou autres traits, une grande partie de leur héritabilité reste inexpliquée. Depuis une dizaine d’années, plusieurs percées dans le domaine de la génomique ont été réalisées. Par exemple, l’utilisation des micropuces d’hybridation génomique comparative à haute densité a permis de démontrer l’existence à grande échelle des variations et des polymorphismes en nombre de copies. Ces derniers sont maintenant détectables à l’aide de micropuce d’ADN ou du séquençage à haut débit. De plus, des études récentes utilisant le séquençage à haut débit ont permis de démontrer que la majorité des variations présentes dans l’exome d’un individu étaient rares ou même propres à cet individu. Ceci a permis la conception d’une nouvelle micropuce d’ADN permettant de déterminer rapidement et à faible coût le génotype de plusieurs milliers de variations rares pour un grand ensemble d’individus à la fois. Dans ce contexte, l’objectif général de cette thèse vise le développement de nouvelles méthodologies et de nouveaux outils bio-informatiques de haute performance permettant la détection, à de hauts critères de qualité, des variations en nombre de copies et des variations nucléotidiques rares dans le cadre d’études génétiques. Ces avancées permettront, à long terme, d’expliquer une plus grande partie de l’héritabilité manquante des traits complexes, poussant ainsi l’avancement des connaissances sur l’étiologie de ces derniers. Un algorithme permettant le partitionnement des polymorphismes en nombre de copies a donc été conçu, rendant possible l’utilisation de ces variations structurales dans le cadre d’étude de liaison génétique sur données familiales. Ensuite, une étude exploratoire a permis de caractériser les différents problèmes associés aux études génétiques utilisant des variations en nombre de copies rares sur des individus non reliés. Cette étude a été réalisée avec la collaboration du Wellcome Trust Centre for Human Genetics de l’University of Oxford. Par la suite, une comparaison de la performance des algorithmes de génotypage lors de leur utilisation avec une nouvelle micropuce d’ADN contenant une majorité de marqueurs rares a été réalisée. Finalement, un outil bio-informatique permettant de filtrer de façon efficace et rapide des données génétiques a été implémenté. Cet outil permet de générer des données de meilleure qualité, avec une meilleure reproductibilité des résultats, tout en diminuant les chances d’obtenir une fausse association. / Genetic studies, such as linkage and association studies, have contributed greatly to a better understanding of the etiology of several diseases. Nonetheless, despite the tens of thousands of genetic studies performed to date, a large part of the heritability of diseases and traits remains unexplained. The last decade experienced unprecedented progress in genomics. For example, the use of microarrays for high-density comparative genomic hybridization has demonstrated the existence of large-scale copy number variations and polymorphisms. These are now detectable using DNA microarray or high-throughput sequencing. In addition, high-throughput sequencing has shown that the majority of variations in the exome are rare or unique to the individual. This has led to the design of a new type of DNA microarray that is enriched for rare variants that can be quickly and inexpensively genotyped in high throughput capacity. In this context, the general objective of this thesis is the development of methodological approaches and bioinformatics tools for the detection at the highest quality standards of copy number polymorphisms and rare single nucleotide variations. It is expected that by doing so, more of the missing heritability of complex traits can then be accounted for, contributing to the advancement of knowledge of the etiology of diseases. We have developed an algorithm for the partition of copy number polymorphisms, making it feasible to use these structural changes in genetic linkage studies with family data. We have also conducted an extensive study in collaboration with the Wellcome Trust Centre for Human Genetics of the University of Oxford to characterize rare copy number definition metrics and their impact on study results with unrelated individuals. We have conducted a thorough comparison of the performance of genotyping algorithms when used with a new DNA microarray composed of a majority of very rare genetic variants. Finally, we have developed a bioinformatics tool for the fast and efficient processing of genetic data to increase quality, reproducibility of results and to reduce spurious associations. Bio-informatique Micropuces d’ADN Nettoyage de données génétiques Bioinformatics Copy number variations and polymorphisms DNA microchip Genetic data quality control
6	Approches bio-informatiques appliquées aux technologies émergentes en génomique Lemieux Perreault, Louis-Philippe 02 1900 (has links) Les études génétiques, telles que les études de liaison ou d’association, ont permis d’acquérir une plus grande connaissance sur l’étiologie de plusieurs maladies affectant les populations humaines. Même si une dizaine de milliers d’études génétiques ont été réalisées sur des centaines de maladies ou autres traits, une grande partie de leur héritabilité reste inexpliquée. Depuis une dizaine d’années, plusieurs percées dans le domaine de la génomique ont été réalisées. Par exemple, l’utilisation des micropuces d’hybridation génomique comparative à haute densité a permis de démontrer l’existence à grande échelle des variations et des polymorphismes en nombre de copies. Ces derniers sont maintenant détectables à l’aide de micropuce d’ADN ou du séquençage à haut débit. De plus, des études récentes utilisant le séquençage à haut débit ont permis de démontrer que la majorité des variations présentes dans l’exome d’un individu étaient rares ou même propres à cet individu. Ceci a permis la conception d’une nouvelle micropuce d’ADN permettant de déterminer rapidement et à faible coût le génotype de plusieurs milliers de variations rares pour un grand ensemble d’individus à la fois. Dans ce contexte, l’objectif général de cette thèse vise le développement de nouvelles méthodologies et de nouveaux outils bio-informatiques de haute performance permettant la détection, à de hauts critères de qualité, des variations en nombre de copies et des variations nucléotidiques rares dans le cadre d’études génétiques. Ces avancées permettront, à long terme, d’expliquer une plus grande partie de l’héritabilité manquante des traits complexes, poussant ainsi l’avancement des connaissances sur l’étiologie de ces derniers. Un algorithme permettant le partitionnement des polymorphismes en nombre de copies a donc été conçu, rendant possible l’utilisation de ces variations structurales dans le cadre d’étude de liaison génétique sur données familiales. Ensuite, une étude exploratoire a permis de caractériser les différents problèmes associés aux études génétiques utilisant des variations en nombre de copies rares sur des individus non reliés. Cette étude a été réalisée avec la collaboration du Wellcome Trust Centre for Human Genetics de l’University of Oxford. Par la suite, une comparaison de la performance des algorithmes de génotypage lors de leur utilisation avec une nouvelle micropuce d’ADN contenant une majorité de marqueurs rares a été réalisée. Finalement, un outil bio-informatique permettant de filtrer de façon efficace et rapide des données génétiques a été implémenté. Cet outil permet de générer des données de meilleure qualité, avec une meilleure reproductibilité des résultats, tout en diminuant les chances d’obtenir une fausse association. / Genetic studies, such as linkage and association studies, have contributed greatly to a better understanding of the etiology of several diseases. Nonetheless, despite the tens of thousands of genetic studies performed to date, a large part of the heritability of diseases and traits remains unexplained. The last decade experienced unprecedented progress in genomics. For example, the use of microarrays for high-density comparative genomic hybridization has demonstrated the existence of large-scale copy number variations and polymorphisms. These are now detectable using DNA microarray or high-throughput sequencing. In addition, high-throughput sequencing has shown that the majority of variations in the exome are rare or unique to the individual. This has led to the design of a new type of DNA microarray that is enriched for rare variants that can be quickly and inexpensively genotyped in high throughput capacity. In this context, the general objective of this thesis is the development of methodological approaches and bioinformatics tools for the detection at the highest quality standards of copy number polymorphisms and rare single nucleotide variations. It is expected that by doing so, more of the missing heritability of complex traits can then be accounted for, contributing to the advancement of knowledge of the etiology of diseases. We have developed an algorithm for the partition of copy number polymorphisms, making it feasible to use these structural changes in genetic linkage studies with family data. We have also conducted an extensive study in collaboration with the Wellcome Trust Centre for Human Genetics of the University of Oxford to characterize rare copy number definition metrics and their impact on study results with unrelated individuals. We have conducted a thorough comparison of the performance of genotyping algorithms when used with a new DNA microarray composed of a majority of very rare genetic variants. Finally, we have developed a bioinformatics tool for the fast and efficient processing of genetic data to increase quality, reproducibility of results and to reduce spurious associations. Bio-informatique Micropuces d’ADN Nettoyage de données génétiques Bioinformatics Copy number variations and polymorphisms DNA microchip Genetic data quality control
7	The detection of high-qualified indels in exomes and their effect on cognition Younis, Nadine 12 1900 (has links) Plusieurs insertions/délétions (indels) génétiques ont été identifiées en lien avec des troubles du neurodéveloppement, notamment le trouble du spectre de l’autisme (TSA) et la déficience intellectuelle (DI). Bien que ce soit le deuxième type de variant le plus courant, la détection et l’identification des indels demeure difficile à ce jour, et on y retrouve un grand nombre de faux positifs. Ce projet vise à trouver une méthode pour détecter des indels de haute qualité ayant une forte probabilité d’être des vrais positifs. Un « ensemble de vérité » a été construit à partir d’indels provenant de deux cohortes familiales basé sur un diagnostic d’autisme. Ces indels ont été filtrés selon un ensemble de paramètres prédéterminés et ils ont été appelés par plusieurs outils d’appel de variants. Cet ensemble a été utilisé pour entraîner trois modèles d’apprentissage automatique pour identifier des indels de haute qualité. Par la suite, nous avons utilisé ces modèles pour prédire des indels de haute qualité dans une cohorte de population générale, ayant été appelé par une technologie d’appel de variant. Les modèles ont pu identifier des indels de meilleure qualité qui ont une association avec le QI, malgré que cet effet soit petit. De plus, les indels prédits par les modèles affectent un plus petit nombre de gènes par individu que ceux ayant été filtrés par un seuil de rejet fixe. Les modèles ont tendance à améliorer la qualité des indels, mais nécessiteront davantage de travail pour déterminer si ce serait possible de prédire les indels qui ont un effet non-négligeable sur le QI. / Genetic insertions/deletions (indels) have been linked to many neurodevelopmental disorders (NDDs) such as autism spectrum disorder (ASD) and intellectual disability (ID). However, although they are the second most common type of genetic variant, they remain to this day difficult to identify and verify, presenting a high number of false positives. We sought to find a method that would appropriately identify high-quality indels that are likely to be true positives. We built an indel “truth set” using indels from two diagnosis-based family cohorts that were filtered according to a set of threshold values and called by several variant calling tools in order to train three machine learning models to identify the highest quality indels. The two best performing models were then used to identify high quality indels in a general population cohort that was called using only one variant calling technology. The machine learning models were able to identify higher quality indels that showed a association with IQ, although the effect size was small. The indels predicted by the models also affected a much smaller number of genes per individual than those predicted through using minimum thresholds alone. The models tend to show an overall improvement in the quality of the indels but would require further work to see if it could a noticeable and significant effect on IQ. single-nucleotide variants IQ machine learning indels genetic scores statistical analysis ASD Variants nucléotide simple QI apprentissage automatique scores génétiques analyses statistiques trouble du spectre de l’autisme

1

Page generated in 0.0821 seconds