Global ETD Search

1	NOVEL APPROACH TO STORAGE AND STORTING OF NEXT GENERATION SEQUENCING DATA FOR THE PURPOSE OF FUNCTIONAL ANNOTATION TRANSFER Candelli, Tito January 2012 (has links) The problem of functional annotation of novel sequences has been a sigfinicant issue for many laboratories that decided to apply next generation sequencing techniques to less studied species. In particular experiments such as transcriptome analysis heavily suer from this problem due to the impossibility of ascribing their results in a relevant biological context. Several tools have been proposed to solve this problem through homology annotation transfer. The principle behind this strategy is that homologous genes share common functions in dierent organisms, and therefore annotations are transferable between these genes. Commonly, BLAST reports are used to identify a suitable homologousgene in a well annotated species and the annotation is then transferred fromthe homologue to the novel sequence. Not all homologues, however, possess valid functional annotations. The aim of this project was to devise an algorithm to process BLAST reports and provide a criterion to discriminate between homologues with a biologically informative and uninformative annotation, respectively. In addition, all data obtained from the BLAST report isto be stored in a relational database for ease of consultation and visualization. In order to test the solidity of the system, we utilized 750 novel sequences obtained through application of next generation sequencing techniques to Avena sativa samples. This species particularly suits our needs as it represents the typical target for homology annotation transfer: lack of a reference genome and diculty in attributing functional annotation. The system was able to perform all the required tasks. Comparisons between best hits asdetermined by BLAST and best hits as determined by the algorithm showed a significant increase in the biological significance of the results when thealgorithm sorting system was applied. homology annotation transfer blast parsing relational database functional information
2	Bayesian Models for Multilingual Word Alignment Östling, Robert January 2015 (has links) In this thesis I explore Bayesian models for word alignment, how they can be improved through joint annotation transfer, and how they can be extended to parallel texts in more than two languages. In addition to these general methodological developments, I apply the algorithms to problems from sign language research and linguistic typology. In the first part of the thesis, I show how Bayesian alignment models estimated with Gibbs sampling are more accurate than previous methods for a range of different languages, particularly for languages with few digital resources available—which is unfortunately the state of the vast majority of languages today. Furthermore, I explore how different variations to the models and learning algorithms affect alignment accuracy. Then, I show how part-of-speech annotation transfer can be performed jointly with word alignment to improve word alignment accuracy. I apply these models to help annotate the Swedish Sign Language Corpus (SSLC) with part-of-speech tags, and to investigate patterns of polysemy across the languages of the world. Finally, I present a model for multilingual word alignment which learns an intermediate representation of the text. This model is then used with a massively parallel corpus containing translations of the New Testament, to explore word order features in 1001 languages. word alignment parallel text Bayesian models MCMC linguistic typology sign language annotation transfer transfer learning
3	Ordföljdsvariation inom kardinaltalssystem : Extraktion av ordföljdstypologi ur parallella texter / Numeral-dependent word order of cardinal numbers Kann, Amanda January 2019 (has links) Typologisk klassificering av kardinaltals ordföljdstendenser har generellt utgått från en binär uppdelning i pre- och postnominella språk, men viss inomspråklig variation i ordföljdsmönster mellan olika kardinaltal har hittats bland världens språk. Tillgång till parallelltexter på många olika språk möjliggör storskalig kvantitativ typologisk analys av syntaktiska fenomen som detta, givet en lämplig strategi för språkoberoende parsning av icke-annoterat material. I denna studie undersöks aspekter av kardinaltalsberoende ordföljdsvariation i 1336 språk genom ordlänkning och annoteringsöverföring i en massivt parallell korpus av Bibelöversättningar. Källtexter märks upp med syntaktisk och lexikal annotering som förs över till icke-annoterad ordlänkad data på andra språk, och ordföljdstendenser för varje kardinaltal och språk mäts statistiskt. Utvärdering av metodens klassificering av generell kardinaltalsordföljd gav 87 % överensstämmelse med data från den manuellt sammanställda WALS-databasen, i linje med tidigare evalueringar av liknande metoder. Variation i ordföljdsmönster mellan individuella kardinaltal uppvisades i en väsentlig andel av undersökta språk, vilket motiverar värdet av en mer detaljerad klassificering av kardinaltals ordföljdstypologi. Undersökning av seriell ordföljdsvariation, där ett seriellt gränsvärde finns mellan olika dominerande ordföljdstyper i ett språks kardinaltalssystem, visade att den överlägset vanligaste strukturen för seriell variation i den undersökta datan var prenominella uttryck för 1 i språk där den dominerande kardinaltalsordföljden klassats som postnominell. / Typological word order classification for cardinal numerals has generally used a binary pre- or postnominal model, but in some languages word order behaviour has been shown to vary between individual cardinal numerals. This phenomenon can be quantitatively studied on a larger typological scale using massively parallel texts, given a cross-language method for parsing non-annotated texts. In this study, cardinal numeral-dependent word order variation is extracted from Bible translations in 1336 languages through word alignment and annotation transfer from syntactically and lexically annotated source texts to all translations in the corpus. Classification of dominant numeral word order using the transferred annotations agreed with manually gathered classifications from the WALS database for 87 % of common languages, which is in line with previous similar studies. Possible numeral-dependent word order variation was identified in a significant number of languages in the sample, supporting the case for use of a more nuanced word order classification structure. Analysis of serial word order variation, where a cardinal numeral of a certain value separates continuous numeral sequences with different dominant word orders, showed the most common structure for this type of variation to be the 1-numeral preceding the noun while all other numerals follow the noun they modify. cardinal numerals word order typology annotation transfer word alignments kardinaltal ordföljdstypologi annoteringsöverföring ordlänkning General Language Studies and Linguistics
4	On the study of 3D structure of proteins for developing new algorithms to complete the interactome and cell signalling networks Planas Iglesias, Joan, 1980- 21 January 2013 (has links) Proteins are indispensable players in virtually all biological events. The functions of proteins are determined by their three dimensional (3D) structure and coordinated through intricate networks of protein-protein interactions (PPIs). Hence, a deep comprehension of such networks turns out to be crucial for understanding the cellular biology. Computational approaches have become critical tools for analysing PPI networks. In silico methods take advantage of the existing PPI knowledge to both predict new interactions and predict the function of proteins. Regarding the task of predicting PPIs, several methods have been already developed. However, recent findings demonstrate that such methods could take advantage of the knowledge on non-interacting protein pairs (NIPs). On the task of predicting the function of proteins,the Guilt-by-Association (GBA) principle can be exploited to extend the functional annotation of proteins over PPI networks. In this thesis, a new algorithm for PPI prediction and a protocol to complete cell signalling networks are presented. iLoops is a method that uses NIP data and structural information of proteins to predict the binding fate of protein pairs. A novel protocol for completing signalling networks –a task related to predicting the function of a protein, has also been developed. The protocol is based on the application of GBA principle in PPI networks. / Les proteïnes tenen un paper indispensable en virtualment qualsevol procés biològic. Les funcions de les proteïnes estan determinades per la seva estructura tridimensional (3D) i són coordinades per mitjà d’una complexa xarxa d’interaccions protiques (en anglès, protein-protein interactions, PPIs). Axí doncs, una comprensió en profunditat d’aquestes xarxes és fonamental per entendre la biologia cel•lular. Per a l’anàlisi de les xarxes d’interacció de proteïnes, l’ús de tècniques computacionals ha esdevingut fonamental als darrers temps. Els mètodes in silico aprofiten el coneixement actual sobre les interaccions proteiques per fer prediccions de noves interaccions o de les funcions de les proteïnes. Actualment existeixen diferents mètodes per a la predicció de noves interaccions de proteines. De tota manera, resultats recents demostren que aquests mètodes poden beneficiar-se del coneixement sobre parelles de proteïnes no interaccionants (en anglès, non-interacting pairs, NIPs). Per a la tasca de predir la funció de les proteïnes, el principi de “culpable per associació” (en anglès, guilt by association, GBA) és usat per extendre l’anotació de proteïnes de funció coneguda a través de xarxes d’interacció de proteïnes. En aquesta tesi es presenta un nou mètode pre a la predicció d’interaccions proteiques i un nou protocol basat per a completar xarxes de senyalització cel•lular. iLoops és un mètode que utilitza dades de parells no interaccionants i coneixement de l’estructura 3D de les proteïnes per a predir interaccions de proteïnes. També s’ha desenvolupat un nou protocol per a completar xarxes de senyalització cel•lular, una tasca relacionada amb la predicció de les funcions de les proteïnes. Aquest protocol es basa en aplicar el principi GBA a xarxes d’interaccions proteiques. Structural Biology Protein-protein interactions Protein-protein interaction networks Protein-protein interaction prediction Protein loops Negative protein interaction models Annotation transfer Apoptosis Biologia Estructural Interaccions proteïna-proteïna Xarxes d’interaccions proteiques Predicció d’interaccions proteiques Llaços de proteïnes Transferència d’annotació Apoptosi 577

1

Page generated in 0.1088 seconds