Spelling suggestions: "subject:"data indexing"" "subject:"mata indexing""
1 |
Biological sequence indexing using persistent JavaPustułka-Hunt, Elżbieta Katarzyna January 2001 (has links)
No description available.
|
2 |
Implementation Of X-tree With 3d Spatial Index And Fuzzy Secondary IndexKeskin, Sinan 01 December 2010 (has links) (PDF)
Multidimensional datasets are getting more extensively used in Geographic
Information Systems (GIS) applications in recent years. Due to large volume of these
datasets efficient querying becomes a significant problem. For this purpose, before
creating index structure with these enormous datasets, choosing an efficient index
structure is an urgent necessity.
The aim of this thesis is to develop an efficient, flexible and extendible index
structure which comprises 3D spatial data in primary index and fuzzy attributes in
secondary index. These primary and secondary indexes are handled in a coupled
structure. Firstly, a 3D spatial primary index is built by using X-tree structure, and
then a fuzzy secondary index is overlaid over the X-tree structure. The coupled
structure is shown more efficient on a certain class of queries than uncoupled index
structures comprising 3D spatial data in primary index and fuzzy attributes in
secondary index separately. In uncoupled index structure, we provided 3D spatial
primary index by using X-tree index structure and fuzzy secondary index by using
BPlusTree index structure.
|
3 |
Dynamic cubing for hierarchical multidimensional data spaceAhmed, Usman 18 February 2013 (has links) (PDF)
Data warehouses are being used in many applications since quite a long time. Traditionally, new data in these warehouses is loaded through offline bulk updates which implies that latest data is not always available for analysis. This, however, is not acceptable in many modern applications (such as intelligent building, smart grid etc.) that require the latest data for decision making. These modern applications necessitate real-time fast atomic integration of incoming facts in data warehouse. Moreover, the data defining the analysis dimensions, stored in dimension tables of these warehouses, also needs to be updated in real-time, in case of any change. In this thesis, such real-time data warehouses are defined as dynamic data warehouses. We propose a data model for these dynamic data warehouses and present the concept of Hierarchical Hybrid Multidimensional Data Space (HHMDS) which constitutes of both ordered and non-ordered hierarchical dimensions. The axes of the data space are non-ordered which help their dynamic evolution without any need of reordering. We define a data grouping structure, called Minimum Bounding Space (MBS), that helps efficient data partitioning of data in the space. Various operators, relations and metrics are defined which are used for the optimization of these data partitions and the analogies among classical OLAP concepts and the HHMDS are defined. We propose efficient algorithms to store summarized or detailed data, in form of MBS, in a tree structure called DyTree. Algorithms for OLAP queries over the DyTree are also detailed. The nodes of DyTree, holding MBS with associated aggregated measure values, represent materialized sections of cuboids and tree as a whole is a partially materialized and indexed data cube which is maintained using online atomic incremental updates. We propose a methodology to experimentally evaluate partial data cubing techniques and a prototype implementing this methodology is developed. The prototype lets us experimentally evaluate and simulate the structure and performance of the DyTree against other solutions. An extensive study is conducted using this prototype which shows that the DyTree is an efficient and effective partial data cubing solution for a dynamic data warehousing environment.
|
4 |
Generalizing association rules in n-ary relations : application to dynamic graph analysisNguyen, Thi Kim Ngan 23 October 2012 (has links) (PDF)
Pattern discovery in large binary relations has been extensively studied. An emblematic success in this area concerns frequent itemset mining and its post-processing that derives association rules. In this case, we mine binary relations that encode whether some properties are satisfied or not by some objects. It is however clear that many datasets correspond to n-ary relations where n > 2. For example, adding spatial and/or temporal dimensions (location and/or time when the properties are satisfied by the objects) leads to the 4-ary relation Objects x Properties x Places x Times. Therefore, we study the generalization of association rule mining within arbitrary n-ary relations: the datasets are now Boolean tensors and not only Boolean matrices. Unlike standard rules that involve subsets of only one domain of the relation, in our setting, the head and the body of a rule can include arbitrary subsets of some selected domains. A significant contribution of this thesis concerns the design of interestingness measures for such generalized rules: besides a frequency measures, two different views on rule confidence are considered. The concept of non-redundant rules and the efficient extraction of the non-redundant rules satisfying the minimal frequency and minimal confidence constraints are also studied. To increase the subjective interestingness of rules, we then introduce disjunctions in their heads. It requires to redefine the interestingness measures again and to revisit the redundancy issues. Finally, we apply our new rule discovery techniques to dynamic relational graph analysis. Such graphs can be encoded into n-ary relations (n ≥ 3). Our use case concerns bicycle renting in the Vélo'v system (self-service bicycle renting in Lyon). It illustrates the added-value of some rules that can be computed thanks to our software prototypes.
|
5 |
Vyhledávání ve videu / Video RetrievalČerný, Petr January 2012 (has links)
This thesis summarizes the information retrieval theory, the relational model basic and focuses on the data indexing in relational database systems. The thesis focuses on multimedia data searching. It includes description of automatic multimedia data content extraction and multimedia data indexing. Practical part discusses design and solution implementation for improving query effectivity for multidimensional vector similarity which describes multimedia data. Thesis final part discusses experiments with this solution.
|
6 |
Generalizing association rules in n-ary relations : application to dynamic graph analysis / Généralisation des règles d'association dans des relations n-aires : application à l'analyse de graphes dynamiquesNguyen, Thi Kim Ngan 23 October 2012 (has links)
Le calcul de motifs dans de grandes relations binaires a été très étudié. Un succès emblématique concerne la découverte d'ensembles fréquents et leurs post-traitements pour en dériver des règles d'association. Il s'agit de calculer des motifs dans des relations binaires qui enregistrent quelles sont les propriétés satisfaites par des objets. En fait, de nombreux jeux de données se présentent naturellement comme des relations n-aires (avec n > 2). Par exemple, avec l'ajout de dimensions spatiales et/ou temporelles (lieux et/ou temps où les propriétés sont enregistrées), la relation binaire Objets x Propriétés est étendue à une relation 4-aire Objets x Propriétés x Lieux x Temps. Nous avons généralisé le concept de règle d'association dans un tel contexte multi-dimensionnel. Contrairement aux règles usuelles qui n'impliquent que des sous-ensembles d'un seul domaine de la relation, les prémisses et les conclusions de nos règles peuvent impliquer des sous-ensembles arbitraires de certains domaines. Nous avons conçu des mesures de fréquence et de confiance pour définir la sémantique de telles règles et c'est une contribution significative de cette thèse. Le calcul exhaustif de toutes les règles qui ont des fréquences et confiances suffisantes et l'élimination des règles redondantes ont été étudiés. Nous proposons ensuite d'introduire des disjonctions dans les conclusions des règles, ce qui nécessite de retravailler les définitions des mesures d'intérêt et les questions de redondance. Pour ouvrir un champ d'application original, nous considérons la découverte de règles dans des graphes relationnels dynamiques qui peuvent être codés dans des relations n-aires (n ≥ 3). Une application à l'analyse des usages de bicyclettes dans le système Vélo'v (système de Vélos en libre-service du Grand Lyon) montre quelques usages possibles des règles que nous savons calculer avec nos prototypes logiciels. / Pattern discovery in large binary relations has been extensively studied. An emblematic success in this area concerns frequent itemset mining and its post-processing that derives association rules. In this case, we mine binary relations that encode whether some properties are satisfied or not by some objects. It is however clear that many datasets correspond to n-ary relations where n > 2. For example, adding spatial and/or temporal dimensions (location and/or time when the properties are satisfied by the objects) leads to the 4-ary relation Objects x Properties x Places x Times. Therefore, we study the generalization of association rule mining within arbitrary n-ary relations: the datasets are now Boolean tensors and not only Boolean matrices. Unlike standard rules that involve subsets of only one domain of the relation, in our setting, the head and the body of a rule can include arbitrary subsets of some selected domains. A significant contribution of this thesis concerns the design of interestingness measures for such generalized rules: besides a frequency measures, two different views on rule confidence are considered. The concept of non-redundant rules and the efficient extraction of the non-redundant rules satisfying the minimal frequency and minimal confidence constraints are also studied. To increase the subjective interestingness of rules, we then introduce disjunctions in their heads. It requires to redefine the interestingness measures again and to revisit the redundancy issues. Finally, we apply our new rule discovery techniques to dynamic relational graph analysis. Such graphs can be encoded into n-ary relations (n ≥ 3). Our use case concerns bicycle renting in the Vélo'v system (self-service bicycle renting in Lyon). It illustrates the added-value of some rules that can be computed thanks to our software prototypes.
|
7 |
Dynamic cubing for hierarchical multidimensional data space / Cube de données dynamique pour un espace de données hiérarchique multidimensionnelAhmed, Usman 18 February 2013 (has links)
De nombreuses applications décisionnelles reposent sur des entrepôts de données. Ces entrepôts permettent le stockage de données multidimensionnelles historisées qui sont ensuite analysées grâce à des outils OLAP. Traditionnellement, les nouvelles données dans ces entrepôts sont chargées grâce à des processus d’alimentation réalisant des insertions en bloc, déclenchés périodiquement lorsque l’entrepôt est hors-ligne. Une telle stratégie implique que d’une part les données de l’entrepôt ne sont pas toujours à jour, et que d’autre part le système de décisionnel n’est pas continuellement disponible. Or cette latence n’est pas acceptable dans certaines applications modernes, tels que la surveillance de bâtiments instrumentés dits "intelligents", la gestion des risques environnementaux etc., qui exigent des données les plus récentes possible pour la prise de décision. Ces applications temps réel requièrent l’intégration rapide et atomique des nouveaux faits dans l’entrepôt de données. De plus, ce type d’applications opérant dans des environnements fortement évolutifs, les données définissant les dimensions d’analyse elles-mêmes doivent fréquemment être mises à jour. Dans cette thèse, de tels entrepôts de données sont qualifiés d’entrepôts de données dynamiques. Nous proposons un modèle de données pour ces entrepôts dynamiques et définissons un espace hiérarchique de données appelé Hierarchical Hybrid Multidimensional Data Space (HHMDS). Un HHMDS est constitué indifféremment de dimensions ordonnées et/ou non ordonnées. Les axes de l’espace de données sont non-ordonnés afin de favoriser leur évolution dynamique. Nous définissons une structure de regroupement de données, appelé Minimum Bounding Space (MBS), qui réalise le partitionnement efficace des données dans l’espace. Des opérateurs, relations et métriques sont définis pour permettre l’optimisation de ces partitions. Nous proposons des algorithmes pour stocker efficacement des données agrégées ou détaillées, sous forme de MBS, dans une structure d’arbre appelée le DyTree. Les algorithmes pour requêter le DyTree sont également fournis. Les nœuds du DyTree, contenant les MBS associés à leurs mesures agrégées, représentent des sections matérialisées de cuboïdes, et l’arbre lui-même est un hypercube partiellement matérialisé maintenu en ligne à l’aide des mises à jour incrémentielles. Nous proposons une méthodologie pour évaluer expérimentalement cette technique de matérialisation partielle ainsi qu’un prototype. Le prototype nous permet d’évaluer la structure et la performance du DyTree par rapport aux autres solutions existantes. L’étude expérimentale montre que le DyTree est une solution efficace pour la matérialisation partielle d’un cube de données dans un environnement dynamique. / Data warehouses are being used in many applications since quite a long time. Traditionally, new data in these warehouses is loaded through offline bulk updates which implies that latest data is not always available for analysis. This, however, is not acceptable in many modern applications (such as intelligent building, smart grid etc.) that require the latest data for decision making. These modern applications necessitate real-time fast atomic integration of incoming facts in data warehouse. Moreover, the data defining the analysis dimensions, stored in dimension tables of these warehouses, also needs to be updated in real-time, in case of any change. In this thesis, such real-time data warehouses are defined as dynamic data warehouses. We propose a data model for these dynamic data warehouses and present the concept of Hierarchical Hybrid Multidimensional Data Space (HHMDS) which constitutes of both ordered and non-ordered hierarchical dimensions. The axes of the data space are non-ordered which help their dynamic evolution without any need of reordering. We define a data grouping structure, called Minimum Bounding Space (MBS), that helps efficient data partitioning of data in the space. Various operators, relations and metrics are defined which are used for the optimization of these data partitions and the analogies among classical OLAP concepts and the HHMDS are defined. We propose efficient algorithms to store summarized or detailed data, in form of MBS, in a tree structure called DyTree. Algorithms for OLAP queries over the DyTree are also detailed. The nodes of DyTree, holding MBS with associated aggregated measure values, represent materialized sections of cuboids and tree as a whole is a partially materialized and indexed data cube which is maintained using online atomic incremental updates. We propose a methodology to experimentally evaluate partial data cubing techniques and a prototype implementing this methodology is developed. The prototype lets us experimentally evaluate and simulate the structure and performance of the DyTree against other solutions. An extensive study is conducted using this prototype which shows that the DyTree is an efficient and effective partial data cubing solution for a dynamic data warehousing environment.
|
8 |
An Efficient, Extensible, Hardware-aware Indexing KernelSadoghi Hamedani, Mohammad 20 June 2014 (has links)
Modern hardware has the potential to play a central role in scalable data management systems. A realization of this potential arises in the context of indexing queries, a recurring theme in real-time data analytics, targeted advertising, algorithmic trading, and data-centric workflows, and of indexing data, a challenge in multi-version analytical query processing. To enhance query and data indexing, in this thesis, we present an efficient, extensible, and hardware-aware indexing kernel. This indexing kernel rests upon novel data structures and (parallel) algorithms that utilize the capabilities offered by modern hardware, especially abundance of main memory, multi-core architectures, hardware accelerators, and solid state drives.
This thesis focuses on presenting our query indexing techniques to cope with processing queries in data-intensive applications that are susceptible to ever increasing data volume and velocity. At the core of our query indexing kernel lies the BE-Tree family of memory-resident indexing structures that scales by overcoming the curse of dimensionality through a novel two-phase space-cutting technique, an effective Top-k processing, and adaptive parallel algorithms to operate directly on compressed data (that exploits the multi-core architecture). Furthermore, we achieve line-rate processing by harnessing the unprecedented degrees of parallelism and pipelining only available through low-level logic design using FPGAs. Finally, we present a comprehensive evaluation that establishes the superiority of BE-Tree in comparison with state-of-the-art algorithms.
In this thesis, we further expand the scope of our indexing kernel and describe how to accelerate analytical queries on (multi-version) databases by enabling indexes on the most recent data. Our goal is to reduce the overhead of index maintenance, so that indexes can be used effectively for analytical queries without being a heavy burden on transaction throughput. To achieve this end, we re-design the data structures in the storage hierarchy to employ an extra level of indirection over solid state drives. This indirection layer dramatically reduces the amount of magnetic disk I/Os that is needed for updating indexes and localizes the index maintenance. As a result, by rethinking how data is indexed, we eliminate the dilemma between update vs. query performance and reduce index maintenance and query processing cost substantially.
|
9 |
An Efficient, Extensible, Hardware-aware Indexing KernelSadoghi Hamedani, Mohammad 20 June 2014 (has links)
Modern hardware has the potential to play a central role in scalable data management systems. A realization of this potential arises in the context of indexing queries, a recurring theme in real-time data analytics, targeted advertising, algorithmic trading, and data-centric workflows, and of indexing data, a challenge in multi-version analytical query processing. To enhance query and data indexing, in this thesis, we present an efficient, extensible, and hardware-aware indexing kernel. This indexing kernel rests upon novel data structures and (parallel) algorithms that utilize the capabilities offered by modern hardware, especially abundance of main memory, multi-core architectures, hardware accelerators, and solid state drives.
This thesis focuses on presenting our query indexing techniques to cope with processing queries in data-intensive applications that are susceptible to ever increasing data volume and velocity. At the core of our query indexing kernel lies the BE-Tree family of memory-resident indexing structures that scales by overcoming the curse of dimensionality through a novel two-phase space-cutting technique, an effective Top-k processing, and adaptive parallel algorithms to operate directly on compressed data (that exploits the multi-core architecture). Furthermore, we achieve line-rate processing by harnessing the unprecedented degrees of parallelism and pipelining only available through low-level logic design using FPGAs. Finally, we present a comprehensive evaluation that establishes the superiority of BE-Tree in comparison with state-of-the-art algorithms.
In this thesis, we further expand the scope of our indexing kernel and describe how to accelerate analytical queries on (multi-version) databases by enabling indexes on the most recent data. Our goal is to reduce the overhead of index maintenance, so that indexes can be used effectively for analytical queries without being a heavy burden on transaction throughput. To achieve this end, we re-design the data structures in the storage hierarchy to employ an extra level of indirection over solid state drives. This indirection layer dramatically reduces the amount of magnetic disk I/Os that is needed for updating indexes and localizes the index maintenance. As a result, by rethinking how data is indexed, we eliminate the dilemma between update vs. query performance and reduce index maintenance and query processing cost substantially.
|
10 |
Algoritmos de remoção para a estrutura de indexação Onion-treeMarrach, Debora Gonçalves Rodrigues 27 August 2013 (has links)
Made available in DSpace on 2016-06-02T19:06:10Z (GMT). No. of bitstreams: 1
5601.pdf: 3183108 bytes, checksum: 0ac17d1e4d1f1556e3258bf2bd169cf2 (MD5)
Previous issue date: 2013-08-27 / The Onion-tree is an efficient metric access method based on main memory for similarity search. The Onion-tree has already provided algorithms for insertion and processing of similarity queries (range query and k-nearest neighbors query). However, in the literature no algorithm has been proposed for removing elements in Onion-tree. For this index be incorporated into a database management system, it is necessary the proposal and implementation of at least one algorithm of deletion. This master's research focused primarily on the implementation and performance evaluation of the algorithms proposed for logical deletion in (CARÉLO et al., 2011). The proposal presented in (CARÉLO et al., 2011) led to the implementation of three algorithms, called LogicalDelete, ReplaceReducing and ReplaceGrowing. The first algorithm applies the logic deletion, while the other two algorithms are specializations adding special treatment for the deletion of elements in internal nodes with children exclusively leaf. The ReplaceReducing algorithm allows the reduction of the radius of the node that contains de deleted element. On the other hand, the ReplaceGrowing algorithm allows increasing this radius. In addition, algorithms have been proposed and evaluated for physical deletion that can be applied at any level of the Onion-tree. The algorithm ReorgAll rearranges all the elements in the hierarchy of the node that contains de deleted element, by physically removing the elements and reinserting them using the insertion algorithm, and algorithm PromoteNode, which extends the algorithm ReorgAll, promotes, when exists conditions for such operation, other node to replace the one that contains the deleted element. Experimental evaluation of the algorithms LogicalDelete, ReplaceReducing and ReplaceGrowing showed that the algorithm LogicalDelete is more cost effective than the algorithms ReplaceReducing and ReplaceGrowing in query processing after the deletion of elements. Experimental evaluation of physical removal algorithms showed that the promotion of a node to replace the removed node has advantages over the simple reorganization of the hierarchy of the node that contains the deleted element. Besides presenting lower cost of deletion of elements, the algorithm PromoteNode also outperformed the algorithm ReorgAll in query processing after removing elements. When compared with the logic deletion algorithm, for a large amount of deletion operations, the algorithms ReorgAll and PromoteNode produced performance gain of 21.6% in range query processing. However, in the same comparison, these algorithms have a much higher cost of deletion. / A Onion-tree é um método de acesso métrico eficiente baseado em memória primária para pesquisa por similaridade. Esta estrutura de indexação já provê algoritmos para a inserção de elementos e o processamento de consultas por similaridade dos tipos Range Query (consulta por abrangência) e KNN (consulta aos k-vizinhos mais próximos). Entretanto, ainda não foi proposto na literatura um algoritmo para a remoção de elementos na Onion-tree. Para que a Onion-tree possa ser efetivamente incorporada a um Sistema Gerenciador de Banco de Dados, portanto, é necessário a proposta e a implementação de, pelo menos, um algoritmo de remoção. Esta pesquisa de mestrado se concentrou primeiramente na implementação e na avaliação de desempenho do algoritmo de remoção lógica proposto em (CARÉLO et al., 2011). A proposta feita em (CARÉLO et al., 2011) deu origem à implementação de três algoritmos de remoção lógica, denominados LogicalDelete, ReplaceReducing e ReplaceGrowing. O algoritmo LogicalDelete aplica a remoção lógica, enquanto os algoritmos ReplaceReducing e ReplaceGrowing são especializações da remoção lógica, adicionando tratamento especial para a remoção de elementos em nós internos com filhos exclusivamente folha. O algoritmo ReplaceReducing permite a diminuição do raio do nó que sofreu a remoção. De forma antagônica, o algoritmo ReplaceGrowing permite o aumento deste raio. Adicionalmente, foram propostos e avaliados algoritmos de remoção física que podem ser aplicados em qualquer nível da estrutura da Onion-tree: O algoritmo ReorgAll reorganiza todos os elementos da hierarquia do nó que sofreu a remoção, removendo-os fisicamente e reinserindo-os no índice usando o algoritmo de inserção de elementos; e o algoritmo PromoteNode, o qual estende o algoritmo ReorgAll, promovendo, quando houver condições para tal, outro nó em substituição àquele que sofreu a remoção. Os testes experimentais dos algoritmos de remoção LogicalDelete, ReplaceReducing e ReplaceGrowing mostraram que o algoritmo LogicalDelete tem melhor relação custo/benefício que os algoritmos ReplaceReducing e ReplaceGrowing no processamento de consultas por abrangência após a remoção de elementos. Os testes experimentais dos algoritmos de remoção física mostraram que a promoção de um nó, em substituição ao nó removido, efetuada pelo algoritmo PromoteNode apresenta vantagens em relação a simples reorganização da hierarquia que sofreu a remoção. Além de apresentar menor custo de remoção dos elementos no índice, o algoritmo PromoteNode também apresenta desempenho superior no processamento de consultas por abrangência após a remoção de elementos. Quando comparados com o algoritmo de remoção lógica, para uma grande quantidade de operações de remoção, os algoritmos ReorgAll e PromoteNode produziram melhora de 21,6% no desempenho do processamento de consultas por abrangência. Porém, na mesma comparação, estes algoritmos apresentaram custo de remoção muito maior.
|
Page generated in 0.0938 seconds