1 |
From the inside out : determining sequence conservation within the context of relative solvent accessibilityScherrer, Michael Paul 17 October 2013 (has links)
Evolutionary rates vary vastly across intraspecific genes and the determinants of these rates is of central concern to the field of comparative genomics. Tradition has held that preservation of protein function conserved the sequence, however mounting evidence implicates the biophysical properties of proteins themselves as the elements that constrain sequence evolution. Of these properties, the exposure of a residue to solvent is the most prevalent determinant of its evolutionary rate due to pressures to maintain proper synthesis and folding of the structure. In this work, we have developed a model that considers the microenvironment of a residue in the estimation of its evolutionary rate. By working within the structural context of a protein's residues, we show that our model is better able to capture the overall evolutionary trends affecting conservation of both the coding sequences and the protein structures from a genomic level down to individual genes. / text
|
2 |
Genome analysis of the planarian Dugesia japonica / プラナリアDugesia japonicaゲノムの解析An, Yang 23 March 2015 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(理学) / 甲第18831号 / 理博第4089号 / 新制||理||1588(附属図書館) / 31782 / 京都大学大学院理学研究科生物科学専攻 / (主査)教授 阿形 清和, 教授 緒方 博之, 教授 高田 彰二 / 学位規則第4条第1項該当 / Doctor of Science / Kyoto University / DFAM
|
3 |
Common Features in lncRNA Annotation and Classification: A SurveyKlapproth, Christopher, Sen, Rituparno, Stadler, Peter F., Findeiß, Sven, Fallmann, Jörg 05 May 2023 (has links)
Long non-coding RNAs (lncRNAs) are widely recognized as important regulators of gene expression. Their molecular functions range from miRNA sponging to chromatin-associated mechanisms, leading to effects in disease progression and establishing them as diagnostic and therapeutic targets. Still, only a few representatives of this diverse class of RNAs are well studied, while the vast majority is poorly described beyond the existence of their transcripts. In this review we survey common in silico approaches for lncRNA annotation. We focus on the well-established sets of features used for classification and discuss their specific advantages and weaknesses. While the available tools perform very well for the task of distinguishing coding sequence from other RNAs, we find that current methods are not well suited to distinguish lncRNAs or parts thereof from other non-protein-coding input sequences. We conclude that the distinction of lncRNAs from intronic sequences and untranslated regions of coding mRNAs remains a pressing research gap.
|
4 |
Computational Methods for Cis-Regulatory Module DiscoveryLiang, Xiaoyu January 2010 (has links)
No description available.
|
5 |
Engineering membrane proteins for production and topologyToddo, Stephen January 2015 (has links)
The genomes of diverse organisms are predicted to contain 20 – 30% membrane protein encoding genes and more than half of all therapeutics target membrane proteins. However, only 2% of crystal structures deposited in the protein data bank represent integral membrane proteins. This reflects the difficulties in studying them using standard biochemical and crystallographic methods. The first problem frequently encountered when investigating membrane proteins is their low natural abundance, which is insufficient for biochemical and structural studies. The aim of my thesis was to provide a simple method to improve the production of recombinant proteins. One of the most commonly used methods to increase protein yields is codon optimization of the entire coding sequence. However, our data show that subtle synonymous codon substitutions in the 5’ region can be more efficient. This is consistent with the view that protein yields under normal conditions are more dependent on translation initiation than elongation. mRNA secondary structures around the 5’ region are in large part responsible for this effect although rare codons, as well as other factors, also contribute. We developed a PCR based method to optimize the 5’ region for increased protein production in Escherichia coli. For those proteins produced in sufficient quantities several additional hurdles remain before high quality crystals can be obtained. A second aim of my thesis work was to provide a simple method for topology mapping membrane proteins. A topology map provides information about the orientation of transmembrane regions and the location of protein domains in relation to the membrane, which can give information on structure-function relationships. To this end we explored the split-GFP system in which GFP is split between the 10th and 11th β-strands. This results in one large and one small fragment, both of which are non-fluorescent but can re-anneal and regain fluorescence if localized to the same cellular compartment. Fusing the 11th β-strand to the termini of a protein of interest and expressing it, followed by expression of the detector fragment in the cytosol, allows determination of the topology of inner membrane proteins. Using this strategy the topology of three model proteins was correctly determined. We believe that this system could be used to predict the topology of a large number of additional proteins, especially single-spanning inner membrane proteins in E. coli. The methods for efficient protein production and topology mapping engineered during my thesis work are simple and cost-efficient and may be very valuable in future studies of membrane proteins. / <p>At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 2: Manuscript.</p>
|
6 |
Modélisation des biais mutationnels et rôle de la sélection sur l’usage des codonsLaurin-Lemay, Simon 10 1900 (has links)
L’acquisition de données génomiques ne cesse de croître, ainsi que l’appétit pour les interpréter. Mais déterminer les processus qui ont façonné l’évolution des séquences codantes (et leur importance relative) est un défi scientifique passant par le développement de modèles statistiques de l’évolution prenant en compte de plus en plus d’hétérogénéités au niveau des processus mutationnels et de sélection.
Identifier la sélection est une tâche qui nécessite typiquement de détecter un écart entre deux modèles : un modèle nulle ne permettant pas de régime évolutif adaptatif et un modèle alternatif qui lui en permet. Lorsqu’un test entre ces deux modèles rejette le modèle nulle, on considère avoir détecter la présence d’évolution adaptative. La tâche est d’autant plus difficile que le signal est faible et confondu avec diverses hétérogénéités négligées par les modèles.
La détection de la sélection sur l’usage des codons spécifiquement est controversée, particulièrement chez les Vertébrés. Plusieurs raisons peuvent expliquer cette controverse : (1) il y a un biais sociologique à voir la sélection comme moteur principal de l’évolution, à un tel point que les hétérogénéités relatives aux processus de mutation sont historiquement négligées ; (2) selon les principes de la génétique des populations, la petite taille efficace des populations des Vertébrés limite le pouvoir de la sélection sur les mutations synonymes conférant elles-mêmes un avantage minime ; (3) par contre, la sélection sur l’usage des codons pourrait être très localisée le long des séquences codantes, à des sites précis, relevant de contraintes de sélection relatives à des motifs utilisés par la machinerie d’épissage, par exemple.
Les modèles phylogénétiques de type mutation-sélection sont les outils de prédilection pour aborder ces questions, puisqu’ils modélisent explicitement les processus mutationnels ainsi que les contraintes de sélection. Toutes les hétérogénéités négligées par les modèles mutation-sélection de Yang and Nielsen [2008] peuvent engendrer de faux positifs allant de 20% (préférence site-spécifique en acides aminés) à 100% (hypermutabilité des transitions en contexte CpG) [Laurin-Lemay et al., 2018b]. En particulier, l’hypermutabilité des transitions du contexte CpG peut à elle seule expliquer la sélection détectée par Yang and Nielsen [2008] sur l’usage des codons.
Mais, modéliser des phénomènes qui prennent en compte des interdépendances dans les données (par exemple l’hypermutabilité du contexte CpG) augmente de beaucoup la complexité des fonctions de vraisemblance. D’autre part, aujourd’hui le niveau de sophistication des modèles fait en sorte que des vecteurs de paramètres de haute dimensionnalité sont nécessaires pour modéliser l’hétérogénéité des processus étudiés, dans notre cas de contraintes de sélection sur la protéine.
Le calcul bayésien approché (Approximate Bayesian Computation ou ABC) permet de contourner le calcul de la vraisemblance. Cette approche diffère de l’échantillonnage par Monte Carlo par chaîne de Markov (MCMC) communément utilisé pour faire l’approximation de la distribution a posteriori. Nous avons exploré l’idée de combiner ces approches pour une problématique spécifique impliquant des paramètres de haute dimensionnalité et de nouveaux paramètres prenant en compte des dépendances entre sites. Dans certaines conditions, lorsque les paramètres de haute dimensionnalité sont faiblement corrélés aux nouveaux paramètres d’intérêt, il est possible d’inférer ces mêmes paramètres de haute dimensionnalité avec la méthode MCMC, et puis les paramètres d’intérêt au moyen de l’ABC. Cette nouvelle approche se nomme CABC [Laurin-Lemay et al., 2018a], pour calcul bayésien approché conditionnel (Conditional Approximate Bayesian Computation : CABC).
Nous avons pu vérifier l’efficacité de la méthode CABC en étudiant un cas d’école, soit celui de l’hypermutabilité des transitions en contexte CpG chez les Eutheria [Laurin-Lemay et al., 2018a]. Nous trouvons que 100% des 137 gènes testés possèdent une hypermutabilité des transitions significative. Nous avons aussi montré que les modèles incorporant l’hypermutabilité des transitions en contexte CpG prédisent un usage des codons plus proche de celui des gènes étudiés. Ceci suggère qu’une partie importante de l’usage des codons peut être expliquée à elle seule par les processus mutationnels et non pas par la sélection.
Finalement nous explorons plusieurs pistes de recherche suivant nos développements méthodologiques : l’application de la détection de l’hypermutabilité des transitions en contexte CpG à l’échelle des Vertébrés ; l’expansion du modèle pour reconnaître des contextes autres que seul le CpG (e.g., hypermutabilité des transitions et transversions en contexte CpG et TpA) ; ainsi que des perspectives méthodologiques d’amélioration de la performance du CABC. / The acquisition of genomic data continues to grow, as does the appetite to interpret them. But determining the processes that shaped the evolution of coding sequences (and their relative importance) is a scientific challenge that requires the development of statistical models of evolution that increasingly take into account heterogeneities in mutation and selection processes.
Identifying selection is a task that typically requires comparing two models: a null model that does not allow for an adaptive evolutionary regime and an alternative model that allows it. When a test between these two models rejects the null, we consider to have detected the presence of adaptive evolution. The task is all the more difficult as the signal is weak and confounded with various heterogeneities neglected by the models.
The detection of selection on codon usage is controversial, particularly in Vertebrates. There are several reasons for this controversy: (1) there is a sociological bias in seeing selection as the main driver of evolution, to such an extent that heterogeneities relating to mutation processes are historically neglected; (2) according to the principles of population genetics, the small effective size of vertebrate populations limits the power of selection over synonymous mutations conferring a minimal advantage; (3) On the other hand, selection on the use of codons could be very localized along the coding sequences, at specific sites, subject to selective constraints related to DNA patterns used by the splicing machinery, for example.
Phylogenetic mutation-selection models are the preferred tools to address these issues, as they explicitly model mutation processes and selective constraints. All the heterogeneities neglected by the mutation-selection models of Yang and Nielsen [2008] can generate false positives, ranging from 20% (site-specific amino acid preference) to 100% (hypermutability of transitions in CpG context)[Laurin-Lemay et al., 2018b]. In particular, the hypermutability of transitions in the CpG context alone can explain the selection on codon usage detected by Yang and Nielsen [2008].
However, modelling phenomena that take into account data interdependencies (e.g., hypermutability of the CpG context) greatly increases the complexity of the likelihood function. On the other hand, today’s sophisticated models require high-dimensional parameter vectors to model the heterogeneity of the processes studied, in our case selective constraints on the protein.
Approximate Bayesian Computation (ABC) is used to bypass the calculation of the likelihood function. This approach differs from the Markov Chain Monte Carlo (MCMC) sampling commonly used to approximate the posterior distribution. We explored the idea of combining these approaches for a specific problem involving high-dimensional parameters and new parameters taking into account dependencies between sites. Under certain conditions, when the high dimensionality parameters are weakly correlated to the new parameters of interest, it is possible to infer the high dimensionality parameters with the MCMC method, and then the parameters of interest using the ABC. This new approach is called Conditional Approximate Bayesian Computation (CABC) [Laurin-Lemay et al., 2018a]. We were able to verify the effectiveness of the CABC method in a case study, namely the hypermutability of transitions in the CpG context within Eutheria [Laurin-Lemay et al.,2018a]. We find that 100% of the 137 genes tested have significant hypermutability of transitions. We have also shown that models incorporating hypermutability of transitions in CpG contexts predict a codon usage closer to that of the genes studied. This suggests that a significant part of codon usage can be explained by mutational processes alone.
Finally, we explore several avenues of research emanating from our methodological developments: the application of hypermutability detection of transitions in CpG contexts to the Vertebrate scale; the expansion of the model to recognize contexts other than only CpG (e.g., hypermutability of transitions and transversions in CpG and TpA context); and methodological perspectives to improve the performance of the CABC approach.
|
Page generated in 0.0679 seconds