Global ETD Search

1	Towards the characterization of the eukaryotic selenoproteome: a computational approach Castellano Hereza, Sergi 23 July 2004 (has links) Although the genome sequence and gene content are available for an increasing number of organisms, eukaryotic selenoproteins remain poorly characterized. In these proteins, selenium (Se) is incorporated in the form of selenocysteine(Sec), the 21st amino acid. Selenocysteine is cotranslationally inserted in response to UGA codons (a stop signal in the canonical genetic code). The alternative decoding is mediated by a stem-loop structure in the 3'UTR of selenoprotein mRNAs (the SECIS element). Selenium is implicated in male infertility, cancer and heart diseases, viral expression and ageing. In addition, most selenoproteins have homologues in which Sec is replaced by cysteine (Cys).Genome biologists rely on the high-quality annotation of genomes to bridge the gap from the sequence to the biology of the organism. However, for selenoproteins, which mediate the biological functions of selenium, the dual role of the UGA codon confounds both the automatic annotation pipelines and the human curators. In consequence, selenoproteins are misannotated in the majority of genome projects. Furthermore, the finding of novel selenoprotein families remains a difficult task in the newly released genome sequences.In the last few years, we have contributed to the exhaustive description of the eukaryotic selenoproteome (set of eukaryotic selenoproteins) through the development of a number of ad hoc computational tools. Our approach is based on the capacity of predicting SECIS elements, standard genes and genes with a UGA codon in-frame in one or multiple genomes. Indeed, the comparative analysis plays an essential role because 1) SECIS sequences are conserved between close species (eg. human-mouse); and 2) sequence conservation across a UGA codon between genomes at further phylogenetic distance strongly suggests a coding function (eg. human-fugu). Our analysis of the fly, human and Takifugu and Tetraodon genomes have resulted in 9 novel selenoprotein families. Therefore, 20 distinct selenoprotein families have been described in eukaryotes to date. Most of these families are widely (but not uniformly) distributed across eukaryotes, either as true selenoproteins or Cys-homologues.The correct annotation of selenoproteins is thus providing insight into the evolution of the usage of Sec. Our data indicate a discrete evolutionary distribution of selenoprotein in eukaryotes and suggest that, contrary to the prevalent thinking of an increase in the number of selenoproteins from less to more complex genomes, Sec-containing proteins scatter all along the complexity scale. We believe that the particular distribution of each family is mediated by an ongoing process of Sec/Cys interconversion, in which contingent events could play a role as important as functional constraints. The characterization of eukaryotic selenoproteins illustrates some of the most important challenges involved in the completion of the gene annotation of genomes. Notably among them, the increasing number of exceptions to our standard theory of the eukaryotic gene and the necessity of sequencing genomes at different evolutionary distances towards such a complete annotation. aspectes genètics selenocisteïna processament de dades data procesing seqüències dels aminoàcids genetic aspects amino acid sequence selenocisteine 575
2	Finding a needle in haystack: the Eukaryotic selenoproteome Chapple, Charles E. 15 July 2009 (has links) Les selenoproteïnes constitueixen una família diversa de proteïnes, caracteritzada per la presència del Seleni (Se), en forma de l'amino àcid atípic, la selenocisteïna (Sec). La selenocisteïna, coneguda com l'amino àcid 21, és similar a la cisteïna (Cys) amb un àtom de seleni en lloc de sofre (S). Les selenoproteïnes són els responsables majoritaris dels efectes biològics del seleni i s'ha observat que poden estar implicades en la infertilitat masculina, el càncer, algunes malalties coronàries,l'activació de virus latents i l'envelliment. La selenocisteïna es codifica pel codó UGA, normalment codó de parada (STOP). Per a la recodificació correcta del UGA són necessaris diversos factors. A la part 3' de la regió no traduïda (UTR) dels transcrits dels gens de selenoproteïnes en organismes eucariotes s'hi troba una estructura de "stem-loop" anomenada SECIS. La proteïna SBP2 interactua amb el SECIS, així com amb el ribosoma, i forma un complex amb el factor d'elongació EFsec i el tRNA de la selenocisteïna, el tRNASec. Donat que el codó TGA normalment significa fi de la traducció, les formes tradicionals de cerca de gens no el reconeixen com a codó codificant. Per aquesta raó ha estat necessari desenvolupar una metodologia específica per a la predicció de gens de selenoproteïnes. En els últims anys, hem contribuït a la descripció del selenoproteoma eucariota amb el descobriment de noves famílies (Castellano et al., 2005), amb l'elaboració de nous mètodes (Taskov et al., 2005; Chapple et al., 2009) i l'anotació de diferents genomes (Jaillon et al., 2004; Drosophila 12 genomes Consortium, 2007; Bovine Genome Sequencing and Analysis Consortium, 2009). Finalment, hem identificat el primer animal que no té selenoproteïnes (Drosophila 12 genomes Consortium, 2007; Chapple and Guigó, 2008), un descobriment soprenent donat que, fins el moment, es creia que les selenoproteïnes eren essencials per la vida animal. / Selenoproteins are a diverse family of proteins containing the trace element Selenium (Se)in the form of the non-canonical amino acid selenocysteine (Sec). Selenocysteine, the 21st amino acid, is similar to cysteine (Cys)but with Se replacing Sulphur. In many cases the homologous gene of a known selenoprotein is present with cysteine in the place of Sec in a different genome. Selenoproteins are believed to be the effectors of the biological functions of Selenium and have been implicated in male infertility, cancer and heart diseases, viral expression and ageing. Selenocysteine is coded by the opal STOP codon (TGA). A number of factors combine to achieve the co-translational recoding of TGA to Sec. The 3' Untranslated regions (UTRs) of eukaryotic selenoprotein transcripts contain a stem-loop structure called a Sec Insertion Sequence (SECIS) element. This is recognised by the Secis Binding Protein 2 (SBP2), which binds to both the SECIS element and the ribosome. SBP2, in turn, recruits the Sec-specific Elongation Factor EFsec, and the selenocysteine transfer RNA, tRNASec. The dual meaning of the TGA codon means that selenoprotein genes are often mispredicted by the standard annotation pipelines. The correct prediction of these genes, therefore, requires the development of specific methods. In the past few years we have contributed significally to the description of the eukaryotic selenoproteome2 with the discovery of novel families (Castellano et al., 2005), the elaboration of novel methods (Taskov et al., 2005; Chapple et al., 2009) and the annotation of different genomes (Jaillon et al., 2004; Drosophila 12 genomes Consortium, 2007; Bovine Genome Sequencing and Analysis Consortium, 2009). Finally, and perhaps most importantly, we have identified the first animal to lack selenoprotein genes (Drosophila 12 genomes Consortium, 2007; Chapple and Guigó, 2008). This last finding is particularly surprising because it had previously been believed that selenoproteins were essential for animal life. amino acid sequence - data processing cysteine - genetic aspects selenium - genetic aspects selenocisteïna -- aspectes genètics 575
3	Comparative analysis of eukaryotic gene sequence features Abril Ferrando, Josep Francesc 17 May 2005 (has links) L'incessant augment del nombre de seqüències genòmiques, juntament amb l'increment del nombre de tècniques experimentals de les que es disposa, permetrà obtenir el catàleg complet de les funcions cel.lulars de diferents organismes, incloent-hi la nostra espècie. Aquest catàleg definirà els fonaments sobre els que es podrà entendre millor com els organismes funcionen a nivell molecular. Al mateix temps es tindran més pistes sobre els canvis que estan associats amb les malalties. Per tant, la seqüència en brut, tal i com s'obté dels projectes de seqüenciació de genomes, no té cap valor sense les anàlisis i la subsegüent anotació de les característiques que defineixen aquestes funcions. Aquesta tesi presenta la nostra contribució en tres aspectes relacionats de l'anotació dels gens en genomes eucariotes. Primer, la comparació a nivell de seqüència entre els genomes humà i de ratolí es va dur a terme mitjançant un protocol semi-automàtic. El programa de predicció de gens SGP2 es va desenvolupar a partir d'elements d'aquest protocol. El concepte al darrera de l'SGP2 és que les regions de similaritat obtingudes amb el programa TBLASTX, es fan servir per augmentar la puntuació dels exons predits pel programa geneid, amb el que s obtenen conjunts d'anotacions més acurats d'estructures gèniques. SGP2 té una especificitat que és prou gran com per que es puguin validar experimentalment via RT-PCR. La validació de llocs d'splicing emprant la tècnica de la RT-PCR és un bon exemple de com la combinació d'aproximacions computacionals i experimentals produeix millors resultats que per separat. S'ha dut a terme l'anàlisi descriptiva a nivell de seqüència dels llocs d'splicing obtinguts sobre un conjunt fiable de gens ortòlegs per humà, ratolí, rata i pollastre. S'han explorat les diferències a nivell de nucleòtid entre llocs U2 i U12, pel conjunt d'introns ortòlegs que se'n deriva d'aquests gens. S'ha trobat que els senyals d'splicing ortòlegs entre humà i rossegadors, així com entre rossegadors, estan més conservats que els llocs no relacionats. Aquesta conservació addicional pot ser explicada però a nivell de conservació basal dels introns. D'altra banda, s'ha detectat més conservació de l'esperada entre llocs d'splicing ortòlegs entre mamífers i pollastre. Els resultats obtinguts també indiquen que les classes intròniques U2 i U12 han evolucionat independentment des de l'ancestre comú dels mamífers i les aus. Tampoc s'ha trobat cap cas convincent d'interconversió entre aquestes dues classes en el conjunt d'introns ortòlegs generat, ni cap cas de substitució entre els subtipus AT-AC i GT-AG d'introns U12. Al contrari, el pas de GT-AG a GC-AG, i viceversa, en introns U2 no sembla ser inusual. Finalment, s'han implementat una sèrie d'eines de visualització per integrar anotacions obtingudes pels programes de predicció de gens i per les anàlisis comparatives sobre genomes. Una d'aquestes eines, el gff2ps, s'ha emprat en la cartografia dels genomes humà, de la mosca del vinagre i del mosquit de la malària, entre d'altres. El programa gff2aplot i els filtres associats, han facilitat la tasca d'integrar anotacions de seqüència amb els resultats d'eines per la cerca d'homologia, com ara el BLAST. S'ha adaptat també el concepte de pictograma a l'anàlisi comparativa de llocs d splicing ortòlegs, amb el desenvolupament del programa compi. / El aumento incesante del número de secuencias genómicas, junto con el incremento del número de técnicas experimentales de las que se dispone, permitirá la obtención del catálogo completo de las funciones celulares de los diferentes organismos, incluida nuestra especie. Este catálogo definirá las bases sobre las que se pueda entender mejor el funcionamiento de los organismos a nivel molecular. Al mismo tiempo, se obtendrán más pistas sobre los cambios asociados a enfermedades. Por tanto, la secuencia en bruto, tal y como se obtiene en los proyectos de secuenciación masiva, no tiene ningún valor sin los análisis y la posterior anotación de las características que definen estas funciones. Esta tesis presenta nuestra contribución a tres aspectos relacionados de la anotación de los genes en genomas eucariotas. Primero, la comparación a nivel de secuencia entre el genoma humano y el de ratón se llevó a cabo mediante un protocolo semi-automático. El programa de predicción de genes SGP2 se desarrolló a partir de elementos de dicho protocolo. El concepto sobre el que se fundamenta el SGP2 es que las regiones de similaridad obtenidas con el programa TBLASTX, se utilizan para aumentar la puntuación de los exones predichos por el programa geneid, con lo que se obtienen conjuntos más precisos de anotaciones de estructuras génicas. SGP2 tiene una especificidad suficiente como para validar esas anotaciones experimentalmente vía RT-PCR. La validación de los sitios de splicing mediante el uso de la técnica de la RT-PCR es un buen ejemplo de cómo la combinación de aproximaciones computacionales y experimentales produce mejores resultados que por separado. Se ha llevado a cabo el análisis descriptivo a nivel de secuencia de los sitios de splicing obtenidos sobre un conjunto fiable de genes ortólogos para humano, ratón, rata y pollo. Se han explorado las diferencias a nivel de nucleótido entre sitios U2 y U12 para el conjunto de intrones ortólogos derivado de esos genes. Se ha visto que las señales de splicing ortólogas entre humanos y roedores, así como entre roedores, están más conservadas que las no ortólogas. Esta conservación puede ser explicada en parte a nivel de conservación basal de los intrones. Por otro lado, se ha detectado mayor conservación de la esperada entre sitios de splicing ortólogos entre mamíferos y pollo. Los resultados obtenidos indican también que las clases intrónicas U2 y U12 han evolucionado independientemente desde el ancestro común de mamíferos y aves. Tampoco se ha hallado ningún caso convincente de interconversión entre estas dos clases en el conjunto de intrones ortólogos generado, ni ningún caso de substitución entre los subtipos AT-AC y GT-AG en intrones U12. Por el contrario, el paso de GT-AG a GC-AG, y viceversa, en intrones U2 no parece ser inusual. Finalmente, se han implementado una serie de herramientas de visualización para integrar anotaciones obtenidas por los programas de predicción de genes y por los análisis comparativos sobre genomas. Una de estas herramientas, gff2ps, se ha utilizado para cartografiar los genomas humano, de la mosca del vinagre y del mosquito de la malaria. El programa gff2aplot y los filtros asociados, han facilitado la tarea de integrar anotaciones a nivel de secuencia con los resultados obtenidos por herramientas de búsqueda de homología, como BLAST. Se ha adaptado también el concepto de pictograma al análisis comparativo de los sitios de splicing ortólogos, con el desarrollo del programa compi. / The constantly increasing amount of available genome sequences, along with an increasing number of experimental techniques, will help to produce the complete catalog of cellular functions for different organisms, including humans. Such a catalog will define the base from which we will better understand how organisms work at the molecular level. At the same time it will shed light on which changes are associated with disease. Therefore, the raw sequence from genome sequencing projects is worthless without the complete analysis and further annotation of the genomic features that define those functions. This dissertation presents our contribution to three related aspects of gene annotation on eukaryotic genomes. First, a comparison at sequence level of human and mouse genomes was performed by developing a semi-automatic analysis pipeline. The SGP2 gene-finding tool was developed from procedures used in this pipeline. The concept behind SGP2 is that similarity regions obtained by TBLASTX are used to increase the score of exons predicted by geneid, in order to produce a more accurate set of gene structures. SGP2 provides a specificity that is high enough for its predictions to be experimentally verified by RT-PCR. The RT-PCR validation of predicted splice junctions also serves as example of how combined computational and experimental approaches will yield the best results. Then, we performed a descriptive analysis at sequence level of the splice site signals from a reliable set of orthologous genes for human, mouse, rat and chicken. We have explored the differences at nucleotide sequence level between U2 and U12 for the set of orthologous introns derived from those genes. We found that orthologous splice signals between human and rodents and within rodents are more conserved than unrelated splice sites. However, additional conservation can be explained mostly by background intron conservation. Additional conservation over background is detectable in orthologous mammalian and chicken splice sites. Our results also indicate that the U2 and U12 intron classes have evolved independently since the split of mammals and birds. We found neither convincing case of interconversion between these two classes in our sets of orthologous introns, nor any single case of switching between AT-AC and GT-AG subtypes within U12 introns. In contrast, switching between GT-AG and GC-AG U2 subtypes does not appear to be unusual. Finally, we implemented visualization tools to integrate annotation features for gene- finding and comparative analyses. One of those tools, gff2ps, was used to draw the whole genome maps for human, fruitfly and mosquito. gff2aplot and the accompanying parsers facilitate the task of integrating sequence annotations with the output of homologybased tools, like BLAST.We have also adapted the concept of pictograms to the comparative analysis of orthologous splice sites, by developing compi. amino acid sequences eukaryotic cells cèl·lules seqüències dels aminoàcids genòmica chicken gallus gallus rattus norvegicus rat mus musculus mouse gene prediction RT-PCR validation SGP2 evaluation geneid comparative computational gene finding anopheles gambiae genome map drosophila melanogaster fruitfly mosquito human compi gff2aplot gff2ps feature visualization U12 genome annotation U2 splice sites exonic gene structure genomics bioinformatics 575

1

Page generated in 0.0945 seconds