Global ETD Search

221	Cache structures based on the execution stack for high level languages / Borgwardt, Peter Arthur. January 1981 (has links) Thesis (Ph. D.)--University of Washington, 1981. / Vita. Bibliography: leaves [173]-177.
222	Exploiting flow relationships to improve the performance of distributed applications Shang, Hao. January 2006 (has links) Dissertation (Ph.D.)--Worcester Polytechnic Institute. / Keywords: Aggregation flow relationship performance TCP time. Includes bibliographical references. (p.203-213)
223	Algorithms and data structures for cache-efficient computation theory and experimental evaluation / Chowdhury, Rezaul Alam. January 1900 (has links) Thesis (Ph. D.)--University of Texas at Austin, 2007. / Vita. Includes bibliographical references.
224	Automatic generation of interfaces using constraints. / Ege, Raimund K. January 1987 (has links) Thesis (Ph. D.)--Oregon Graduate Center, 1987.
225	Linking Moving Object Databases with Ontologies King, Kraig January 2007 (has links) (PDF) No description available. Data structures (Computer Science) Database management Image processing Indexing -- Computer programs
226	Αποτελεσματικοί αλγόριθμοι και δομές δεδομένων με εφαρμογές στην ανάκτηση πληροφορίας και στις τεχνολογίες διαδικτύου Αντωνίου, Δημήτρης 23 May 2011 (has links) Αντικείμενο της παρούσας διδακτορικής διατριβής είναι η μελέτη και τροποποίηση βασικών δομών δεδομένων με σκοπό τη δημιουργία νέων και την τροποποίηση υπαρχουσών λύσεων, με εφαρμογές στην Ανάκτηση Πληροφορίας, τη Βιοπληροφορική και το Διαδίκτυο. Αρχικά, δίνεται έμφαση στην ανάπτυξη και πειραματική επιβεβαίωση αλγοριθμικών τεχνικών για τη σχεδίαση αυτοοργανώμενων δομών δεδομένων (self-organizing data structures). Μέχρι σήμερα, ο μόνος πιθανός υποψήφιος αλγόριθμος αναζήτησης σε δένδρο που μπορεί να είναι Ο(1)-ανταγωνιστικός είναι το splay δένδρο (splay tree) που παρουσιάστηκε από τους Sleator και Tarjan [1]. Επιπρόσθετα, μελετώνται διάφορες εναλλακτικές τεχνικές αυτοοργάνωσης ([2],[3],[4],[5],[6]) και γίνεται επιβεβαίωση των πάνω ορίων που ισχύουν για την απόδοση των splay trees και για αυτές. Η ανάπτυξη των διάφορων αλγοριθμικών αυτών τεχνικών βρίσκει εφαρμογές πάνω στη συμπίεση δεδομένων. Οι αλγόριθμοι συμπίεσης δεδομένων μπορούν να βελτιώσουν την αποδοτικότητα με την οποία τα δεδομένα αποθηκεύονται ή μεταφέρονται, μέσω της μείωσης του ποσού της πλεονάζουσας πληροφορίας. Η χρήση αυτών των αλγορίθμων τόσο στην κρυπτογράφηση όσο και στην επεξεργασία εικόνας είναι αποδοτική και έχει μεγάλο ερευνητικό ενδιαφέρον. Γενικότερα, οι αυτοοργανώμενες δομές δεδομένων χρίζουν ιδιαίτερης προσοχής στους on-line αλγόριθμους. Αναλυτικότερα, στην παρούσα διατριβή, εφαρμόζεται συμπίεση σε βιολογικά δεδομένα αλλά και σε κείμενα τόσο με χρήση του κλασσικού splay δέντρου [10] αλλά και της log log n ανταγωνιστικής παραλλαγής του. Επιπλέον, παρουσιάζονται τυχαιοποιημένες εκδόσεις των παραπάνω δομών και εφαρμόζονται και αυτές στη συμπίεση δεδομένων. Οι log log n ανταγωνιστικές δομές έχουν καλύτερη απόδοση όσον αφορά την πολυπλοκότητά τους σε σχέση με την κλασσική splay δομή. Το γεγονός αυτό επιβεβαιώνεται πειραματικά, όπου η επιτυγχανόμενη συμπίεση είναι στις περισσότερες των περιπτώσεων καλύτερη από την αντίστοιχη της κλασικής δομής . Επιπλέον, ιδιαίτερο ερευνητικό ενδιαφέρον βρίσκει η εφαρμογή βασικών δομών δεδομένων στο διαδίκτυο. Επιδιώκουμε την ανάπτυξη και θεωρητική επιβεβαίωση αλγορίθμων για προβλήματα όπως η ανάθεση «καυτών συνδέσμων» (hot links [7]), η αναδιοργάνωση ιστοσελίδων και η ανάκτηση πληροφορίας ([8],[9]). Σε πρώτο στάδιο, προτείνονται ευριστικοί αλγόριθμοι με σκοπό την ανάθεση «καυτών συνδέσμων» (hotlinks) και τη βελτίωση της τοπολογίας ενός ιστότοπου ([12],[13],[14]). Σκοπός του αλγορίθμου είναι η προώθηση των δημοφιλών ιστοσελίδων ενός ιστότοπου, μέσω της ανάθεσης συνδέσμων προς αυτές, από ιστοσελίδες οι οποίες είναι σχετικές με αυτές ως προς το περιεχόμενο αλλά και ταυτόχρονα συντελούν στη μείωση της απόστασής τους από την αρχική σελίδα. Παρουσιάζεται το μοντέλο του αλγορίθμου, καθώς και μετρικές οι οποίες χρησιμοποιούνται για την ποσοτική αξιολόγηση της αποδοτικότητας του αλγορίθμου σε σχέση με ειδικά χαρακτηριστικά ενός ιστότοπου, όπως η εντροπία του. Σε δεύτερο στάδιο, γίνεται μελέτη τεχνικών προσωποποίησης ιστοσελίδων [11]. Συγκεκριμένα, σκοπός είναι η υλοποίηση ενός αλγορίθμου, ο οποίος θα ανακαλύπτει την αυξημένη ζήτηση μίας κατηγορίας ιστοσελίδων Α από έναν χρήστη και αξιοποιώντας την καταγεγραμμένη συμπεριφορά άλλων χρηστών, θα προτείνει κατηγορίες σελίδων οι οποίες προτιμήθηκαν από χρήστες οι οποίοι ομοίως παρουσίασαν αυξημένο ενδιαφέρον προς την κατηγορία αυτή. Αναλύεται το φαινόμενο της έξαρσης επισκεψιμότητας (burst) και η αξιοποίηση του στο πεδίο της εξατομίκευσης ιστοσελίδων. Ο αλγόριθμος υλοποιείται με τη χρήση δύο δομών δεδομένων, των Binary heaps και των Splay δέντρων, και αναλύεται η χρονική και χωρική πολυπλοκότητά του. Επιπρόσθετα, γίνεται πειραματική επιβεβαίωση της ορθής και αποδοτικής εκτέλεσης του αλγορίθμου. Αξίζει να σημειωθεί πως ο προτεινόμενος αλγόριθμος λόγω της φύσης του, χρησιμοποιεί χώρο, ο οποίος επιτρέπει τη χρησιμοποίηση του στη RAM. Τέλος, ο προτεινόμενος αλγόριθμος δύναται να βρει εφαρμογή σε εξατομίκευση σελίδων με βάση το σημασιολογικό τους περιεχόμενο σε αντιστοιχία με το διαχωρισμό τους σε κατηγορίες. Σε τρίτο στάδιο, γίνεται παρουσίαση πρωτότυπης τεχνικής σύστασης ιστοσελίδων [15] με χρήση Splay δέντρων. Σε αυτή την περίπτωση, δίνεται ιδιαίτερο βάρος στην εύρεση των σελίδων που παρουσιάζουν έξαρση επισκεψιμότητας και στη σύστασή τους στους χρήστες ενός ιστότοπου. Αρχικά, τεκμηριώνεται η αξία της εύρεσης μιας σελίδας, η οποία δέχεται ένα burst επισκέψεων. H έξαρση επισκεψιμότητας (burst) ορίζεται σε σχέση τόσο με τον αριθμό των επισκέψεων, όσο και με το χρονικό διάστημα επιτέλεσής τους. Η εύρεση των σελίδων επιτυγχάνεται με τη μοντελοποίηση ενός ιστότοπου μέσω ενός splay δέντρου. Με την τροποποίηση του δέντρου μέσω της χρήσης χρονοσφραγίδων (timestamps), ο αλγόριθμος είναι σε θέση να επιστρέφει σε κάθε χρονική στιγμή την ιστοσελίδα που έχει δεχθεί το πιο πρόσφατο burst επισκέψεων. Ο αλγόριθμος αναλύεται όσον αφορά τη χωρική και χρονική του πολυπλοκότητα και συγκρίνεται με εναλλακτικές λύσεις. Μείζονος σημασίας είναι η δυνατότητα εφαρμογής του αλγορίθμου και σε άλλα φαινόμενα της καθημερινότητας μέσω της ανάλογης μοντελοποίησης. Παραδείγματος χάρη, στην περίπτωση της απεικόνισης ενός συγκοινωνιακού δικτύου μέσω ενός γράφου, ο αλγόριθμος σύστασης δύναται να επιστρέφει σε κάθε περίπτωση τον κυκλοφοριακό κόμβο ο οποίος παρουσιάζει την πιο πρόσφατη συμφόρηση. Τέλος, όσον αφορά το πεδίο της ανάκτησης πληροφορίας, η διατριβή επικεντρώνεται σε μία πρωτότυπη και ολοκληρωμένη μεθοδολογία με σκοπό την αξιολόγηση της ποιότητας ενός συστήματος λογισμικού βάσει του Προτύπου Ποιότητας ISO/IEC-9126. Το κύριο χαρακτηριστικό της είναι ότι ολοκληρώνει την αξιολόγηση ενός συστήματος λογισμικού ενσωματώνοντας την αποτίμηση όχι μόνο των χαρακτηριστικών που είναι προσανατολισμένα στο χρήστη, αλλά και εκείνων που είναι πιο τεχνικά και αφορούν τους μηχανικούς λογισμικού ενός συστήματος. Σε αυτή τη διατριβή δίνεται βάρος στην εφαρμογή μεθόδων εξόρυξης δεδομένων πάνω στα αποτελέσματα της μέτρησης μετρικών οι οποίες συνθέτουν τα χαρακτηριστικά του πηγαίου κώδικα, όπως αυτά ορίζονται από το Προτύπο Ποιότητας ISO/IEC-9126 [16][17]. Ειδικότερα εφαρμόζονται αλγόριθμοι συσταδοποίησης με σκοπό την εύρεση τμημάτων κώδικα με ιδιαίτερα χαρακτηριστικά, που χρήζουν προσοχής. / In this dissertation we take an in-depth look at the use of effective and efficient data structures and algorithms in the fields of data mining and web technologies. The main goal is to develop algorithms based on appropriate data structures, in order to improve the performance at all levels of web applications. In the first chapter the reader is introduced to the main issues studied dissertation. In the second chapter, we propose novel randomized versions of the splay trees. We have evaluated the practical performance of these structures in comparison with the original version of splay trees and with their log log n-competitive variations, in the application field of compression. Moreover, we show that the Chain Splay tree achieves O(logn) worst-case cost per query. In order to evaluate performance, we utilize plain splay trees, the log log n-competitive variations, the proposed randomized version with the Chain Splay technique to compress data. It is observed experimentally that the compression achieved in the case of the log log n-competitive technique is, as expected, more efficient than the one of the plain splay trees. The third chapter focuses on hotlinks assignment techniques. Enhancing web browsing experience is an open issue frequently dealt using hotlinks assignment between webpages, shortcuts from one node to another. Our aim is to provide a novel, more efficient approach to minimize the expected number of steps needed to reach expected pages when browsing a website. We present a randomized algorithm, which combines the popularity of the webpages, the website structure, and for the first time to the best authors’ knowledge, the similarity of context between pages in order to suggest the placement of suitable hotlinks. We verify experimentally that users need less page transitions to reach expected information pages when browsing a website, enhanced using the proposed algorithm. In the fourth chapter we investigate the problem of web personalization. The explosive growth in the size and use of the World Wide Web continuously creates new great challenges and needs. The need for predicting the users’ preferences in order to expedite and improve the browsing though a site can be achieved through personalizing of the Websites. Recommendation and personalization algorithms aim at suggesting WebPages to users based on their current visit and past users’ navigational patterns. The problem that we address is the case where few WebPages become very popular for short periods of time and are accessed very frequently in a limited temporal space. Our aim is to deal with these bursts of visits and suggest these highly accessed pages to the future users that have common interests. Hence, in this paper, we propose a new web personalization technique, based on advanced data structures. The data structures that are used are the Splay tree (1) and Binary heaps (2). We describe the architecture of the technique, analyze the time and space complexity and prove its performance. In addition, we compare both theoretically and experimentally the proposed technique to another approach to verify its efficiency. Our solution achieves O(P2) space complexity and runs in k log P time, where k is the number of pages and P the number of categories of WebPages. Extending this algorithm, we propose an algorithm which efficiently detects bursts of visits to webpages. As an increasing number of Web sites consist of multiple pages, it is more difficult for the visitors to rapidly reach their own target. This results in an urgent need for intelligent systems that effectively support the users’ navigation to high demand Web content. In many cases, due to specific conditions, web pages become very popular and receive excessively large number of hits. Therefore, there is a high probability that these web pages will be of interest to the majority of the visitors at a given time. The data structure that is used for the purposes of the recommendation algorithm is the Splay tree. We describe the architecture of the technique, analyze the time and space complexity and show its performance. The dissertation’s last chapter elaborates on how to use clustering for the evaluation of a software system’s maintainability according to the ISO/IEC-9126 quality standard. More specifically it proposes a methodology that combines clustering and multicriteria decision aid techniques for knowledge acquisition by integrating groups of data from source code with the expertise of a software system’s evaluators. A process for the extraction of elements from source code and Analytical Hierarchical Processing for assigning weights to these data are provided; k-Attractors clustering algorithm is then applied on these data, in order to produce system overviews and deductions. The methodology is evaluated on Apache Geronimo, a large Open Source Application Server, results are discussed and conclusions are presented together with directions for future work. Αλγόριθμοι Δομές δεδομένων Ανάκτηση πληροφορίας 005.73 Algorithms Data structures Data mining Web technologies
227	Function-based Algorithms for Biological Sequences Mohanty, Pragyan Paramita 01 December 2015 (has links) AN ABSTRACT OF THE DISSERTATION OF PRAGYAN P. MOHANTY, for the Doctor of Philosophy degree in ELECTRICAL AND COMPUTER ENGINEERING, presented on June 11, 2015, at Southern Illinois University Carbondale. TITLE: FUNCTION-BASED ALGORITHMS FOR BIOLOGICAL SEQUENCES MAJOR PROFESSOR: Dr. Spyros Tragoudas Two problems at two different abstraction levels of computational biology are studied. At the molecular level, efficient pattern matching algorithms in DNA sequences are presented. For gene order data, an efficient data structure is presented capable of storing all gene re-orderings in a systematic manner. A common characteristic of presented methods is the use of binary decision diagrams that store and manipulate binary functions. Searching for a particular pattern in a very large DNA database, is a fundamental and essential component in computational biology. In the biological world, pattern matching is required for finding repeats in a particular DNA sequence, finding motif and aligning sequences etc. Due to immense amount and continuous increase of biological data, the searching process requires very fast algorithms. This also requires encoding schemes for efficient storage of these search processes to operate on. Due to continuous progress in genome sequencing, genome rearrangements and construction of evolutionary genome graphs, which represent the relationships between genomes, become challenging tasks. Previous approaches are largely based on distance measure so that relationship between more phylogenetic species can be established with some specifically required rearrangement operations and hence within certain computational time. However because of the large volume of the available data, storage space and construction time for this evolutionary graph is still a problem. In addition, it is important to keep track of all possible rearrangement operations for a particular genome as biological processes are uncertain. This study presents a binary function-based tool set for efficient DNA sequence storage. A novel scalable method is also developed for fast offline pattern searches in large DNA sequences. This study also presents a method which efficiently stores all the gene sequences associated with all possible genome rearrangements such as transpositions and construct the evolutionary genome structure much faster for multiple species. The developed methods benefit from the use of Boolean functions; their compact storage using canonical data structure and the existence of built-in operators for these data structures. The time complexities depend on the size of the data structures used for storing the functions that represent the DNA sequences and/or gene sequences. It is shown that the presented approaches exhibit sub linear time complexity to the sequence size. The number of nodes present in the DNA data structure, string search time on these data structures, depths of the genome graph structure, and the time of the rearrangement operations are reported. Experiments on DNA sequences from the NCBI database are conducted for DNA sequence storage and search process. Experiments on large gene order data sets such as: human mitochondrial data and plant chloroplast data are conducted and depth of this structure was studied for evolutionary processes on gene sequences. The results show that the developed approaches are scalable. Algorithms Binary Decision Diagrams Boolean Functions Computing-DATA Structures Information Storage and Retrieval Pattern Matching
228	Analyse statique de programmes manipulant des tableaux / Analysis of programs using arrays Perrelle, Valentin 21 February 2013 (has links) L’analyse statique de programmes est un domaine crucial en compilation, en optimisation, et en validation de logiciels. Les structures de données complexes (tableaux, listes, graphes...), omniprésentes dans les programmes, posent des problèmes difficiles, du fait qu’elles représentent des ensembles de données de taille importante ou inconnue, et que l’adressage des données dans ces ensembles est calculé (indexation, indirection). La plupart des travaux sur l’analyse des structures de données concernent la vérification de la correction des accès aux données (vérification que les indices d’un tableau sont dans les bornes, que les pointeurs ne sont pas nuls, “shape analysis”). L’analyse du contenu des structures de données est encore peu abordée. A Verimag, ce domaine a été abordé récemment, et a donné lieu à de premiers résultats sur l’analyse de tableaux unidimensionnels. Une méthode d’analyse de programmes simples a été proposée [1], qui permet de découvrir des propriétés des contenus de tableaux, comme par exemple que le résultat d’un programme de tri est bien un tableau trié. Un autre type de propriétés, dites “non positionnelles” a aussi été considéré [2], qui concerne le contenu global d’un tableau, indépendamment de son rangement: par exemple, on montre que le résultat d’un tri est une permutation du tableau initial. Ces premiers résultats sont très encourageants, mais encore embryonnaires. L’objectif du travail de thèse proposé est de les étendre dans plusieurs directions. Notre analyse de propriétés positionnelles est capable de découvrir des relations point- à-point entre des “tranches” de tableaux (ensembles de cellules consécutives). Les extensions envisagées concernent les tableaux multidimensionnels, les ensembles de cellules non nécessairement consécutives, et les structures de données plus générales. Concernant les propriétés non positionnelles, les premiers résultats sont limités aux égalités de contenus de tableaux. Ils doivent être étendus à des relations plus complexes (inclusions, sommes disjointes...) et à d’autres structures de données. Ce travail prend place dans le projet ASOPT (“Analyse statique et optimisation”), accepté dans le programme Arpège de l’ANR en 2008. Références : [1] N. Halbwachs, M. Péron. Discovering properties about arrays in simple programs. ACM Conference on Programming Language Design and Implementation, PLDI 2008. Tucson (Az.), juin 2008. [2] V. Perrelle. Analyse statique du contenu de tableaux, propriétés non positionnelles. Rapport de M2R, Master Parisien de Recherche en Informatique, septembre 2008. / Static analysis is key area in compilation, optimization and software validation. The complex data structures (arrays, dynamic lists, graphs...) are ubiquitous in programs, and can be challenging, because they can be large or of unbounded size and accesses are computed. (through indexing or indirections). Whereas the verification of the validity of the array accesses was one of the initial motivations of abstract interpretation, the discovery of properties about array contents was only adressed recently. Most of the analyses of array contents are based on a partitioning of the arrays. Then, they try to discover properties about each fragment of this partition. The choice of this partition is a difficult problem and each method have its flaw. Moreover, classical representations of array partitions induce an exponential complexity for these analyzes. In this thesis, we generalize the concept of array partitioning into the concept of "fragmentation" which allow overlapping fragments, handling potentially empty fragments and selecting specialized relations. On the other hand, we propose an abstraction of these fragmentations in terms of graphs called "slices diagrams" as well as the operations to manipulate them and ensuring a polynomial complexity. Finally, we propose a new criterion to compute a semantic fragmentation inspired by the existing ones which attempt to correct their flaws. These methods have been implemented in a static analyzer. Experimentations shows that the analyzer can efficiently and precisly prove some challenging exemples in the field of static analysis of programs manipulating arrays. Interprétation abstraite Tableaux Structures de données Synthèse de propriétés Abstract interpretation Arrays Data structures Invariant synthesis 004
229	Um Interpretador Gráfico de Comandos baseado na JVM como ferramenta de ensino de Programação, Algoritmos e Estruturas de Dados Sousa, Tiago Davi Neves de 29 July 2013 (has links) Made available in DSpace on 2015-05-14T12:36:39Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 3945382 bytes, checksum: a1134c3a0dae913ca53f16674f95b37d (MD5) Previous issue date: 2013-07-29 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / In disciplines of Programming, Data Structures and Algorithms of Computer Science courses, tools that permit the visualization of the the data structures changing throughout the execution of a program by the students are very useful because they assist that the students learn how the algorithms operate over the data structures. Many tools were proposed since the pioneer work of [Brown e Sedgewick 1984]. In some of them, the graphical visualization of the data structures through the animations can only be done by the users programming and in others there are a lack of resources that forbid their use in the whole pedagogical process. Thus, in this work an Interpreter for the IGED (Graphical Interpreter of Data Structures) teaching tool was developed. This Interpreter was designed based in the JVM and enable that codes implementing various algorithms in an object oriented language be executed by the tool so that it generates as output the graphical visualization of the data structures. The architecture of the Interpreter developed in this work and its components were detailed and the functional requirements it may have as a teaching tool, being useful for other disciplines of Computer Science, were defined. Furthermore, was justified why an own implementation of an Interpreter for the IGED was done if there are JVM implementations available and widely used. In the experiments, was demonstrated that the Interpreter may execute code with useful characteristics for these disciplines. / Em disciplinas de Programação, Estruturas de Dados e Algoritmos de cursos de Computação, ferramentas que permitam que os alunos possam visualizar as alterações nas estruturas de dados ao longo da execução de um programa são de grande utilidade, já que elas auxiliam que os aprendizes entendam como os algoritmos operam sobre as estruturas de dados. Diversas ferramentas foram propostas desde o trabalho pioneiro de [Brown e Sedgewick 1984]. Em algumas delas, as visualizações gráficas das estruturas através de animações só podem ser feitas através de programação pelos usuários e outras carecem de recursos que as impossibilitam de serem empregadas em todo o processo pedagógico. Assim, neste trabalho foi desenvolvido um Interpretador para a ferramenta de ensino IGED (Interpretador Gráfico de Estruturas de Dados). Esse Interpretador foi projetado baseado na JVM e possibilita que códigos que implementam vários algoritmos em uma linguagem de programação orientada a objetos sejam executados pela ferramenta de forma que esta gere como saída as visualizações gráficas das estruturas de dados. A arquitetura do Interpretador desenvolvido neste trabalho e seus componentes foram detalhados e requisitos funcionais que ele pode ter como ferramenta de ensino, sendo útil para outras disciplinas de Computação, foram definidos. Além disso, foi justificado porque houve uma implementação própria de um Interpretador para o IGED se já existem implementações da JVM disponíveis e amplamente utilizadas. Nos experimentos, foi demonstrado que o Interpretador pode executar códigos com características que são úteis para essas disciplinas. Interpretadores Programação Educação Estruturas de Dados Interpreters Programming Education Data Structures
230	Indexação multimídia escalável e busca por similaridade em alta dimensionalidade / Scalable multimedia indexing and similarity search in high dimensionality Akune, Fernando Cesar, 1976- 08 January 2011 (has links) Orientador: Ricardo da Silva Torres / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-19T04:44:52Z (GMT). No. of bitstreams: 1 Akune_FernandoCesar_M.pdf: 1241917 bytes, checksum: b220cb9e6aac3f8136585dedd9cd0da9 (MD5) Previous issue date: 2011 / Resumo: A disseminação de grandes coleções de arquivos de imagens, músicas e vídeos tem aumentado a demanda por métodos de indexação e sistemas de recuperação de informações multimídia. No caso de imagens, os sistemas de busca mais promissores são os sistemas baseados no conteúdo, que ao invés de usarem descrições textuais, utilizam vetores de características, que são representações de propriedades visuais, como cor, textura e forma. O emparelhamento dos vetores de características da imagem de consulta e das imagens de uma base de dados é implementado através da busca por similaridade. A sua forma mais comum é a busca pelos k vizinhos mais próximos, ou seja, encontrar os k vetores mais próximos ao vetor da consulta. Em grandes bases de imagens, um índice é indispensável para acelerar essas consultas. O problema é que os vetores de características podem ter muitas dimensões, o que afeta gravemente o desempenho dos métodos de indexação. Acima de 10 dimensões, geralmente é preciso recorrer aos métodos aproximados, sacrificando a eficácia em troca da rapidez. Dentre as diversas soluções propostas, existe uma abordagem baseada em curvas fractais chamadas curvas de preenchimento do espaço. Essas curvas permitem mapear pontos de um espaço multidimensional em uma única dimensão, de maneira que os pontos próximos na curva correspondam a pontos próximos no espaço. O grande problema dessa alternativa é a existência de regiões de descontinuidade nas curvas, pontos próximos dessas regiões não são mapeados próximos na curva. A principal contribuição deste trabalho é um método de indexação de vetores de características de alta dimensionalidade, que utiliza uma curva de preenchimento do espaço e múltiplos representantes para os dados. Esse método, chamado MONORAIL, gera os representantes explorando as propriedades geométricas da curva. Isso resulta em um ganho na eficácia da busca por similaridade, quando comparado com o método de referência. Outra contribuição não trivial deste trabalho é o rigor experimental usado nas comparações: os experimentos foram cuidadosamente projetados para garantir resultados estatisticamente significativos. A escalabilidade do MONORAIL é testada com três bases de dados de tamanhos diferentes, a maior delas com mais de 130 milhões de vetores / Abstract: The spread of large collections of images, videos and music has increased the demand for indexing methods and multimedia information retrieval systems. For images, the most promising search engines are content-based, which instead of using textual annotations, use feature vectors to represent visual properties such as color, texture, and shape. The matching of feature vectors of query image and database images is implemented by similarity search. Its most common form is the k nearest neighbors search, which aims to find the k closest vectors to the query vector. In large image databases, an index structure is essential to speed up those queries. The problem is that the feature vectors may have many dimensions, which seriously affects the performance of indexing methods. For more than 10 dimensions, it is often necessary to use approximate methods to trade-off effectiveness for speed. Among the several solutions proposed, there is an approach based on fractal curves known as space-filling curves. Those curves allow the mapping of a multidimensional space onto a single dimension, so that points near on the curve correspond to points near on the space. The great problem with that alternative is the existence of discontinuity regions on the curves, where points near on those regions are not mapped near on the curve. The main contribution of this dissertation is an indexing method for high-dimensional feature vectors, using a single space-filling curve and multiple surrogates for each data point. That method, called MONORAIL, generates surrogates by exploiting the geometric properties of the curve. The result is a gain in terms of effectiveness of similarity search, when compared to the baseline method. Another non-trivial contribution of this work is the rigorous experimental design used for the comparisons. The experiments were carefully designed to ensure statistically sound results. The scalability of the MONORAIL is tested with three databases of different sizes, the largest one with more than 130 million vectors / Mestrado / Ciência da Computação / Mestre em Ciência da Computação Indexação Estruturas de dados (Computação) Banco de dados Sistemas multimídia Indexing Data structures (Computing) Database Multimedia systems

Search results