Global ETD Search

1	Διερεύνηση της βάσης βιολογικών δεδομένων COGENT για την πρόσθεση πληροφοριών βιβλιογραφικής ύλης και πληροφοριών νουκλεοτιδικής αλληλουχίας (DNA) Χριστοπούλου, Δέσποινα 09 October 2009 (has links) Σήμερα υπάρχει ελεύθερη πρόσβαση μέσω του internet σε εκατοντάδες δημόσιες βάσεις βιολογικών δεδομένων. Παραταύτα, η προσπάθεια του να εκμεταλλευτεί κάποιος τα αποθηκευμένα δεδομένα ανομοιογενών βάσεων δεδομένων, καταλήγει να αποτελεί μια διαδικασία ιδιαίτερα δύσκολη και χρονοβόρα λόγω ποικίλων αιτιάσεων. Στις αιτίες αυτές συμπεριλαμβάνονται ο χαοτικός όγκος των βιολογικών δεδομένων, ο ολοένα αυξανόμενος αριθμός βιολογικών βάσεων δεδομένων, η υπεραφθονία τύπων και μορφών δεδομένων (format), η ποικιλομορφία βιοπληροφορικών τεχνικών πρόσβασης στα δεδομένα και βέβαια η διαφορετικότητα των βάσεων βιολογικών δεδομένων. Χάρη στις διεθνείς προσπάθειες ολοκλήρωσης αλληλουχιών (sequencing), οι ομάδες γονιδιακών δεδομένων έχουν αυξηθεί γεωμετρικά την τελευταία δεκαετία. Το έτος 2003 για παράδειγμα, η βάση βιολογικών δεδομένων Genbank διπλασιάστηκε σε μέγεθος μέσα σε 15 μήνες. Με τόσο γρήγορη ανάπτυξη, τα γενωμικά δεδομένα και οι συνδεόμενες με αυτά δομές έχουν αποκτήσει τεράστιο μέγεθος για να χωρέσουν στην κεντρική μνήμη ενός υπολογιστή. Το σημαντικότερο πρόβλημα που ανακύπτει έγκειται στο ότι μεγάλο μέρος της πληροφορίας που αναζητείται μέσα στο τεράστιο και ολοένα αυξανόμενο σε μέγεθος ορυχείο των δεδομένων εν τέλει χάνεται. Η ανάγκη κατασκευής των κατάλληλων εργαλείων εξ’ όρυξης της ζητούμενης πληροφορίας από το ορυχείο αυτό είναι μονόδρομος. Η παρούσα διπλωματική εργασία επικεντρώνεται στην διεύρυνση μιας υπάρχουσας βάσης βιολογικών δεδομένων ολοκληρωμένων γονιδιωμάτων, της COGENT. Η COGENT αναπτύχθηκε το 2003 από την Ομάδα Υπολογιστικής Γενωμικής (Computational Genomics Group – CGG), στο Ευρωπαϊκό Ινστιτούτο Βιοπληροφορικής (European Bioinformatics Institute – EBI), και τελικός τεχνικός στόχος της διπλωματικής εργασίας αποτελεί η προσθήκη βιβλιογραφικών δεδομένων καθώς και νουκλεοτιδικών πληροφοριών αλληλουχίας (DNA) στην βάση COGENT. / Today, hundreds of public biological databases are accessible via the Internet However taking advantage of data stored in heterogeneous biological databases can be a difficult, time consuming task for a multitude of reasons. These reasons include the vast volume of biological data, the growing number of biological databases, the rapid rate in the growth of data, the overabundance of data types and formats, the wide Variety of bioinformatics data access techniques, and database heterogeneity. Thanks to international sequencing efforts, genome data sets have been growing exponentially in the past few years. The GenBank database, for example, has doubled every 15 months. With such a rapid growth, genome data and the associated access structures have become too large to fit in the main memory of a computer, leading to a large number of disk accesses (and therefore, slow response times) for homology searches and other queries. Much of the important information in this enormous and exponentially growing gold mine will be wasted if we do not develop proper tools to access and mine them efficiently. The focus of this thesis was to extend an existing biological database for the complete tracking of genomes, the COGENT database, which the Computational Genomics Group at the European Bioinformatics Institute in Cambridge produced in 2003, so that it can incorporate literature and DNA sequence information. Γενωμική 572.802 85 Biological databases Genomics
2	Analyse bioinformatique du transcriptome des champignons mycorhiziens Tuber melanosporum et Glomus intraradices / Bioinformatic analysis of the transcriptome of mycorrhizal fungi Tuber melanosporum and Glomus intraradices Tisserant, Emilie 15 December 2011 (has links) La symbiose mycorhizienne est une interaction mutualiste formée entre les racines des plantes terrestres et des champignons du sol. Les changements morphoanatomiques associés au développement de cette symbiose sont accompagnés de modifications dans la régulation de l'expression génique. L'étude des profils transcriptomiques est donc fondamentale afin de caractériser les mécanismes moléculaires gouvernant la symbiose mycorhizienne. Le développement récent des approches de transcriptomique à haut débit offre de nouvelles perspectives pour la compréhension de ces mécanismes. Le travail entrepris dans le cadre de ce projet de thèse visait à caractériser in silico le transcriptome symbiotique du champignon ectomycorhizien Tuber melanosporum et du champignon endomycorhizien Glomus intraradices. Il s'agissait de mettre en place les outils et les protocoles bioinformatiques permettant l'exploitation des données transcriptomiques issues des nouvelles technologies de séquençage, afin de caractériser les transcrits exprimés par les symbiotes et d'identifier les gènes régulés au cours de la symbiose. Ce travail original a permis de souligner l'existence de traits communs aux profils d'expression des champignons mycorhiziens. De plus, la caractérisation du transcriptome de G. intraradices a permis d'établir le premier répertoire de gènes à l'échelle du génome pour un champignon endomycorhizien. Cette étude de génomique contribue à l'amélioration des connaissances sur les processus moléculaires qui sous-tendent la symbiose mycorhizienne et constitue une ressource unique pour de futures recherches sur les réseaux de gènes contrôlant la symbiose / Mycorrhizal symbiosis is a mutualistic interaction involving roots of terrestrial plants and soil fungi. Morphological changes associated with the development of this symbiosis are accompanied by changes in gene expression. The study of transcriptomic profiles is thus essential to characterize the molecular mechanisms that govern the mycorrhizal symbiosis. The recent development of high-throughput transcriptomic approaches provides new insights for the understanding of these mechanisms. The work undertaken during this thesis aimed to characterize in silico the transcriptome of the ectomycorrhizal fungus Tuber melanosporum and the endomycorrhizal fungus Glomus intraradices. In order to characterize transcripts expressed by the symbionts and to identify genes regulated during symbiosis, bioinformatic tools and protocols were implemented to process transcriptomic data derived from new sequencing technologies. This work has allowed to highlight common features in the expression profiles of mycorrhizal fungi. In addition, characterization of the G. intraradices transcriptome has allowed to establish the first genome-wide repertoire of genes for an endomycorrhizal fungus. The study helps to improve knowledge about the molecular processes underlying the mycorrhizal symbiosis and provides a unique resource for future research on the gene networks controlling symbiosis Symbiose Champignons Mycorhize Transcriptomique Bioinformatique 454 RNA-Seq 572.802 85 579.617 85
3	Εφαρμογή του αλγορίθμου BLAST στην αναγνώριση μεταλλάξεων γονιδιακών ακολουθιών / Application of the BLAST algorithm in the recognition of mutations in biological sequences Ντάλλα, Μαρία 03 October 2011 (has links) Το πρόβλημα της ευθυγράμμισης βιολογικών ακολουθιών, δηλαδή πρωτεϊνών και γονιδιακών ακολουθιών, είναι από τα πιο απαιτητικά στην επίλυση και ταυτόχρονα πιο εφαρμόσιμα προβλήματα που σχετίζονται με την επιστήμη της βιοπληροφορικής. Από την ευθυγράμμιση βιολογικών ακολουθιών προκύπτει ένας σημαντικός όγκος πληροφορίας που δίνει απαντήσεις σε εξελικτικά ερωτήματα αλλά, κυρίως, βρίσκει εφαρμογή σε πληθώρα τομέων, όπως η διάγνωση και η θεραπεία ασθενειών. Πρόκειται για ένα θέμα που τράβηξε την προσοχή της παγκόσμιας κοινότητας της πληροφορικής μόλις στο δεύτερο μισό του περασμένου αιώνα, επομένως είναι ένα πεδίο με αρκετό χώρο για έρευνα. Στην παρούσα εργασία, αφού δοθεί το απαραίτητο βιολογικό υπόβαθρο, παρουσιάζονται αρχικά οι βασικότεροι αλγόριθμοι που έχουν παρουσιαστεί μέχρι τώρα ως προτάσεις για την εκτέλεση ευθυγραμμίσεων, εξηγούνται οι βασικές δομικές και λειτουργικές διαφορές τους και δίνεται μια πρώτη εκτίμηση της αποτελεσματικότητάς τους, όπως αυτή αντλείται από τη βιβλιογραφία. Στη συνέχεια, το ενδιαφέρον επικεντρώνεται στον αλγόριθμο τοπικής ευθυγράμμισης BLAST. Αναλύεται η λειτουργία του βήμα προς βήμα, παρουσιάζονται οι κυριότερες εκδόσεις του, οι είσοδοι και οι έξοδοί του καθώς και το μαθηματικό υπόβαθρο στο οποίο βασίζεται η υλοποίησή του. Στόχος του πειραματικού τμήματος της εργασίας είναι να εξετάσει κατά πόσο ο BLAST επιτυγχάνει να ταυτοποιήσει, και με τι σφάλμα, μια μεταλλαγμένη ακολουθία, τόσο ως προς το γονίδιο από το οποίο προέρχεται όσο και ως προς το είδος της και τις πιθανές της συνέπειες στον οργανισμό στον οποίο εκφράζεται. Με βάση το γονίδιο BRCA1 του Homo Sapiens, παράγεται μια σειρά μεταλλάξεων, οι οποίες μεταφράζονται. Το σύνολο των παραγομένων βιολογικών ακολουθιών, νουκλεοτιδιακών και αμινοξεϊκών, τίθεται προς αναζήτηση με χρήση του BLAST σε κατάλληλες βάσεις δεδομένων, προκειμένου να ελεγχθεί η ευαισθησία του σε μεταλλάξεις διαφορετικού τύπου και έκτασης. Αποδεικνύεται ότι παρότι ο BLAST επιτυγχάνει με πολύ μικρό σφάλμα την ταυτοποίηση του γονιδίου, ακόμα και σε περιπτώσεις έντονης μετάλλαξης της αρχικής ακολουθίας, ωστόσο η κατανομή των αποτελεσμάτων είναι πολύ πιο ασαφής ως προς την ταυτοποίηση του είδους της μετάλλαξης. / The goal of the present thesis is the examination of the sensitivity of the local alignment algorithm BLAST, on a set of mutated biological sequences. The algorithm's sensitivity is to be measured with regard to three basic criteria: - identification of the relation to the original gene - identification of the mutation type - prediction of possible influence of the organism in question In the first, theoretical part of the thesis, a general biological background is offered, followed by a sufficient presentation on both the history and the latest achievements in the field of sequence alignment. The main topic introduced is the structure and functionality of BLAST, together with its principal editions, its inputs and outputs and the mathematical foundation standing below it. In the experimental part of the thesis, the BRCA1 gene is picked out of the Homo Sapiens Genome; its sequence suffers a number of mutations of different type and extent. Each produced mutation is translated into the corresponding protein. The entire set of biological sequences produced is going through a BLAST Search to test the sensitivity of the algorithm according to the mutation under examination. Analyzing the results, it is safe to claim that BLAST succeeds to recognize the gene from which the mutated sequences are produced, including extremely low error in the process. On the other hand, the identification of the mutation type is certainly of significantly lower sensitivity. Thus the main proposal extracted is the implementation of a patterns recognition system, which will integrate artificial intelligence methods to connect patterns encountered within the input sequence, with diseases reported in the appropriate documentation. Μετάλλαξη 572.802 85 Alignment of biological sequences Mutation BLAST FASTA
4	Ανάπτυξη ολοκληρωμένου συστήματος εξόρυξης και οπτικοποίησης γνώσης από βιολογικά δεδομένα Γκαντούνα, Βασιλική 25 January 2012 (has links) Στα τέλη του 20ου αιώνα, οι παράλληλες εξελίξεις και η ανάπτυξη καινοτόμων μεθόδων και εργαλείων σε διαφορετικές ερευνητικές περιοχές είχε ως αποτέλεσμα την εμφάνιση των λεγόμενων "αναδυόμενων τεχνολογιών" (emerging technologies). Σε αυτό το πλαίσιο λοιπόν, των αναδυόμενων τεχνολογιών, εμφανίστηκε στο προσκήνιο η επιστήμη της Βιοπληροφορικής (Bioinformatics) η οποία αποτελεί την τομή των επιστημών της βιολογίας και της πληροφορικής. Η ραγδαία ανάπτυξη της τεχνολογίας έχει οδηγήσει στην εκρηκτική αύξηση του ρυθμού παραγωγής βιολογικών δεδομένων, γεγονός που καθιστά επιτακτική την ανάγκη της αποδοτικής και αποτελεσματικής διαχείρισης τους. Για την κάλυψη αυτής ακριβώς της ανάγκης δημιουργήθηκαν οι βιολογικές βάσεις δεδομένων που έχουν σήμερα εξαιρετική δυναμική και περιθώρια εφαρμογών. Οι βασικοί τομείς έρευνας στο πλαίσιο των βιολογικών βάσεων δεδομένων μπορούν να ταξινομηθούν σε τρεις μεγάλες κατηγορίες. Η πρώτη κατηγορία αφορά στην όσο το δυνατόν πιο αποδοτική οργάνωση των βιολογικών δεδομένων ώστε να είναι δυνατή η αποτελεσματική αποθήκευση τους. Αυτός ακριβώς είναι και ο λόγος δημιουργίας των βιολογικών βάσεων δεδομένων. Η δεύτερη κατηγορία αφορά στην ανάπτυξη εργαλείων και μεθόδων που επιτρέπουν την ανάλυση και την επεξεργασία των βιολογικών δεδομένων έτσι ώστε να διευκολυνθεί η διαδικασία ανακάλυψης γνώσης από αυτά. Σε αυτή την κατηγορία, σημαντικό ρόλο παίζουν οι τεχνικές εξόρυξης γνώσης οι οποίες εφαρμόζονται πάνω σε μεγάλες συλλογές βιολογικών δεδομένων και συνήθως οδηγούν στην ανακάλυψη νέων σχέσεων και προτύπων που κρύβονται ανάμεσα στα δεδομένα. Τέλος, η τρίτη κατηγορία αφορά στην ανάπτυξη εργαλείων που διευκολύνουν την διαδικασία της βιολογικής ερμηνείας των αποτελεσμάτων της εξόρυξης. Εδώ, ουσιαστικό ρόλο κατέχουν οι τεχνικές οπτικοποίησης της παραγόμενης γνώσης για την όσο το δυνατόν πιο κατανοητή παρουσίαση των συμπερασμάτων στον άνθρωπο ο οποίος στην συνέχεια θα επιλέξει ποια από αυτά είναι πραγματικά χρήσιμα. Η δημιουργία ενός ολοκληρωμένου συστήματος που θα αποτελεί τον απότοκο της τεχνολογικής σύζευξης των τεχνικών των τριών παραπάνω κατηγοριών σε συνδυασμό με την ανάγκη αξιοποίησης μιας μέχρι πρότινος ανεκμετάλλευτης μεγάλης συλλογής βιολογικών δεδομένων αποτέλεσαν το κίνητρο για την εκπόνηση της παρούσας διπλωματικής εργασίας. Στόχος της εργασίας είναι η ανάπτυξη ενός ολοκληρωμένου συστήματος το οποίο χρησιμοποιώντας την τεχνολογία Microsoft PivotViewer θα απεικονίζει την παραπάνω συλλογή δεδομένων προσφέροντας ένα υψηλό επίπεδο αναπαράστασης και θα καταγράφει τις συχνότητες εμφάνισης των μεταλλάξεων και άλλων γενετικών παραλλαγών ανά πληθυσμιακές ομάδες σε παγκόσμια κλίμακα. Το σύστημα αυτό θα μπορεί να λειτουργήσει ως ένα σύγχρονο εκπαιδευτικό και διαγνωστικό εργαλείο για την πληθυσμιακή μελέτη της παθογένειας και της θεραπείας ασθενειών που οφείλονται σε κάποια γενετική διαταραχή. Ο χρήστης διαμέσου ενός εύχρηστου και φιλικού περιβάλλοντος διεπαφής θα μπορεί να εστιάσει από μια μεγάλη συλλογή δεδομένων σε ένα εξειδικευμένο υποσύνολό της που ενδεχομένως σχετίζεται με μία συγκεκριμένη ασθένεια, μία συγκεκριμένη μελέτη ή έναν συγκεκριμένο πληθυσμό παρατηρώντας έτσι τα δεδομένα αυτά από μια διαφορετική οπτική γωνία που ενδεχομένως να τον βοηθήσει να ανακαλύψει νέα πρότυπα και σχέσεις ανάμεσα τους αξιόλογης βιολογικής σημασίας. / In the late 20th century, parallel advances and the development of innovative methods and tools in different research areas resulted in the appearance of the so-called "emerging technologies". In the framework of emerging technologies, the science of Bioinformatics came to the fore which is the intersection of the sciences of biology and informatics. The rapid growth of technology has led to the explosive increase in the rate of production of biological data, which dictates the need for efficient and effective data management. Biological databases have been created to satisfy exactly this need and they have extremely dynamic and potential applications today. The main research areas in biological databases can be classified into three broad categories. The first category concerns the better organization of the biological data so as to enable efficient storage. This is the reason for the development of the biological databases. The second category concerns the development of tools and methods that allow analysis and processing of biological data to facilitate the process of discovering knowledge from them. In this category, data mining techniques play an important role. They are applied over large collections of biological data and often lead to the discovery of new relationships and patterns that lie between the data. Finally, the third category involves the development of tools that facilitate the process of understanding and visualizing the biological meaning of the data mining results. Here, the visualization techniques have an essential role in presenting the data mining results in a meaningful way to the scientists who will eventually decide which of these results are really useful and reliable. The development of an integrated system which will be the result of the technological coupling of the three above categories in conjunction with the need of utilization a previously unexploited large collection of biological data was the motivation for the elaboration of this thesis. This work aims to develop an integrated system which represents the above collection providing a high level visualization and records the frequencies of causative genetic variations worldwide by utilizing the Microsoft PivotViewer technology. This system can serve as a modern educational and diagnostic tool for the population-based study of the pathogenesis and treatment of diseases caused by a genetic disorder. The user through a user-friendly interface can zoom in from the massive amounts of data to particular disease-specific, study-specific, or population-specific data so that he can begin observing the data from a different perspective that may enable him to discover new patterns and relationships between them of remarkable biological importance. Εξόρυξη δεδομένων 572.802 85 Biological databases Data mining

1

Page generated in 0.0203 seconds