Global ETD Search

161	A Formalism for Visual Query Interface Design Huo, Jiwen 27 October 2008 (has links) The massive volumes and the huge variety of large knowledge bases make information exploration and analysis difficult. An important activity is data filtering and selection, in which both querying and visualization play important roles. Interfaces for data exploration environments normally include both, integrating them as tightly as possible. But many features of information exploration environments, such as visual representation of queries, visualization of query results, interactive data selection from visualizations, have only been studied separately. The intrinsic connections between them have not been described formally. The lack of formal descriptions inhibits the development of techniques that produce new representations for queries, and natural integration of visual query specification with query result visualization. This thesis describes a formalism that describes the basic components of information exploration and and their relationships in information exploration environments. The key aspect of the formalism is that it unifies querying and visualization within a single framework, which provides a foundation for designing and analysing visual query interfaces. Various innovative designs of visual query representations can be derived from the formalism. Simply comparing them with existing ones is not enough, it is more important to discover why one visual representation is better or worse than another. To do this it is necessary to understand users’ cognitive activities, and to know how these cognitive activities are enhanced or inhibited by different presentations of a query so that novel interfaces can be created and improved based on user testing. This thesis presents a new experimental methodology for evaluating query representations, which uses stimulus onset asynchrony to separate different aspects of query comprehension. This methodology was used to evaluate a new visual query representation based on Karnaugh maps, and showing that there are two qualitatively different approaches to comprehension: deductive and inductive. The Karnaugh map representation scales extremely well with query complexity, and the experiment shows that its good scaling properties occur because it strongly facilitates inductive comprehension. formalism visual query interface visualization experiment methodology Computer Science
162	CGU: A common graph utility for DL Reasoning and Conjunctive Query Optimization Palacios Villa, Jesus Alejandro January 2005 (has links) We consider the overlap between reasoning involved in <em>conjunctive query optimization</em> (CQO) and in tableaux-based approaches to reasoning about subsumption in <em>description logics</em> (DLs). In both cases, an underlying graph is created, searched and modified. This process is determined by a given <em>query</em> and <em>database schema</em> in the first case and by a given <em>description</em> and <em>terminology</em> in the second. The opportunities for overlap derive from an abundance of reductions of various schema languages to terminologies for common DL dialects, and from the fact that descriptions can in turn be viewed as queries that compute a single column. <br /><br /> Our main contributions are as follows. We present the design and implementation of a common graph utility that integrates the requirements for both CQO and DL reasoning. We then verify this model by also presenting the design and implementation for two drivers, one that implements a query optimizer for a conjunctive query language extended with descriptions, and one that implements a complete DL reasoner for a feature based DL dialect. Computer Science Description Logics Databases Conjunctive Query Optimization Tableaux Algorithms
163	Evaluation of Shortest Path Query Algorithm in Spatial Databases Lim, Heechul January 2003 (has links) Many variations of algorithms for finding the shortest path in a large graph have been introduced recently due to the needs of applications like the Geographic Information System (GIS) or Intelligent Transportation System (ITS). The primary subjects of those algorithms are materialization and hierarchical path views. Some studies focus on the materialization and sacrifice the pre-computational costs and storage costs for faster computation of a query. Other studies focus on the shortest-path algorithm, which has less pre-computation and storage but takes more time to compute the shortest path. The main objective of this thesis is to accelerate the computation time for the shortest-path queries while keeping the degree of materialization as low as possible. This thesis explores two different categories: 1) the reduction of the I/O-costs for multiple queries, and 2) the reduction of search spaces in a graph. The thesis proposes two simple algorithms to reduce the I/O-costs, especially for multiple queries. To tackle the problem of reducing search spaces, we give two different levels of materializations, namely, the <i>boundary set distance matrix</i> and <i>x-Hop sketch graph</i>, both of which materialize the shortest-path view of the boundary nodes in a partitioned graph. Our experiments show that a combination of the suggested solutions for 1) and 2) performs better than the original Disk-based SP algorithm [7], on which our work is based, and requires much less storage than <i>HEPV</i> [3]. Computer Science Shortest Path Query Spatial Database Pruning Algorithm
164	A Formalism for Visual Query Interface Design Huo, Jiwen 27 October 2008 (has links) The massive volumes and the huge variety of large knowledge bases make information exploration and analysis difficult. An important activity is data filtering and selection, in which both querying and visualization play important roles. Interfaces for data exploration environments normally include both, integrating them as tightly as possible. But many features of information exploration environments, such as visual representation of queries, visualization of query results, interactive data selection from visualizations, have only been studied separately. The intrinsic connections between them have not been described formally. The lack of formal descriptions inhibits the development of techniques that produce new representations for queries, and natural integration of visual query specification with query result visualization. This thesis describes a formalism that describes the basic components of information exploration and and their relationships in information exploration environments. The key aspect of the formalism is that it unifies querying and visualization within a single framework, which provides a foundation for designing and analysing visual query interfaces. Various innovative designs of visual query representations can be derived from the formalism. Simply comparing them with existing ones is not enough, it is more important to discover why one visual representation is better or worse than another. To do this it is necessary to understand users’ cognitive activities, and to know how these cognitive activities are enhanced or inhibited by different presentations of a query so that novel interfaces can be created and improved based on user testing. This thesis presents a new experimental methodology for evaluating query representations, which uses stimulus onset asynchrony to separate different aspects of query comprehension. This methodology was used to evaluate a new visual query representation based on Karnaugh maps, and showing that there are two qualitatively different approaches to comprehension: deductive and inductive. The Karnaugh map representation scales extremely well with query complexity, and the experiment shows that its good scaling properties occur because it strongly facilitates inductive comprehension. formalism visual query interface visualization experiment methodology Computer Science
165	Query Optimization in Dynamic Environments El-Helw, Amr January 2012 (has links) Most modern applications deal with very large amounts of data. Having to deal with such huge amounts of data is in itself a challenge. This challenge is complicated even more by the fact that, in many cases, this data is constantly changing and evolving. For instance, relational databases that handle the data of day-to-day transactional applications often have tables with very high data change rates. It is not uncommon to even have temporary or volatile tables that get created from scratch and completely dropped over the course of one query workload. This dissertation focuses on optimizing structured queries over dynamic and constantly changing data sets. Our work address this issue, and some of the challenges related to it. We address the issue of database statistics becoming stale and inaccurate due to constantly changing data. We introduce ways to automatically analyze the existing statistics and recommend and collect the necessary statistics to optimize a single query or a query workload. We introduce a mechanism to automate the recommendation and collection of statistical views for a given query workload. We also compare two methods of using these statistical views in selectivity estimation. We evaluate our methods and techniques with experimental studies using prototypes that we built into commercial database systems. query optimization statistics just-in-time statviews Computer Science
166	NAAK-Tree: An Index for Querying Spatial Approximate Keywords Liou, Yen-Guo 11 July 2012 (has links) ¡@¡@In recent years, the geographic information system (GIS) databases develop quickly and play a significant role in many applications. Many of these applications allow users to find objects with keywords and spatial information at the same time. Most researches in the spatial keyword queries only consider the exact match between the database and query with the textual information. Since users may not know how to spell the exact keyword, they make a query with the approximate-keyword, instead of the exact keyword. Therefore, how to process the approximate-keyword query in the spatial database becomes an important research topic. Alsubaiee et al. have proposed the Location-Based-Approximate-Keyword-tree (LBAK-tree) index structure which is to augment a tree-based spatial index with approximate-string indexes such as a gram-based index. However, the LBAK-tree index structure is the R-tree based index structure. The nodes of the R-tree have to be split and be reinserted when they get full. Due to this condition, it can not index the spatial attribute and the textual attribute at the same time. It stores the keywords in the nodes after the R-tree is already built. Based on the R-tree, it has to search all the children in a node to insert a new item and answer a query. Moreover, after they find the needed keywords by using the approximate index, they probe the nodes by checking the intersection of the similar keyword sets and the keywords stored in the nodes. However, the higher level the node is, the larger the number of keywords stored in the node is. It takes long time to check the intersections. And the LBAK-tree checks all the intersections even if there exits one of the intersections which is already an empty set. Therefore, in this thesis, we propose the Nine-Area-Approximate-Keyword-tree (NAAK-tree) index structure to process the spatial approximate-keyword query. We do not have to partition the space to construct the spatial index. We do not have to reinsert the children when split the nodes, so we can deal with the keywords at the same time. We can use the spatial number to find out the nodes that satisfy the spatial condition of the query. And we augment the NAAK-tree with signatures to speed up the query of the textual condition. We use the union of the bit strings of each keyword in a node to represent them in the node. Therefore, we can efficiently filter out the nodes that there is no keyword corresponding to the query by checking the signatures just one time without checking all the keywords stored in the nodes. Based on our NAAK-tree, if there exits one empty set in the similar keywords sets, we do not check all the similar keywords sets. From our simulation results, we show that the NAAK-tree is more efficient than the LBAK-tree to build the index and answer the spatial approximate-keyword query. Signature Index Structure Approximate-Keyword Spatial Database Range Query
167	A Count-Based Partition Approach to the Design of the Range-Based Bitmap Indexes for Data Warehouses Lin, Chien-Hsiu 29 July 2004 (has links) Data warehouses contain data consolidated from several operational databases and provide the historical, and summarized data which is more appropriate for analysis than detail, individual records. On-Line Analytical Processing (OLAP) provides advanced analysis tools to extract information from data stored in a data warehouse. Fast response time is essential for on-line decision support. A bitmap index could reach this goal in read-mostly environments. When data has high cardinality, we prefer to use the Range-Based Index (RBI), which divides the attributes values into several partitions and a bitmap vector is used to represent a range. With RBI, however, the number of records assigned to different ranges can be highly unbalanced, resulting in different search times of disk accesses for different queries. Wu et al proposed an algorithm for RBI, DBEC, which takes the data distribution into consideration. But the DBEC strategy could not guarantee to get the partition result with the given number of bitmap vectors, PN. Moreover, for different data records with the same value, they may be partitioned into different bitmap vectors which takes long disk I/O time. Therefore, we propose the IPDF, CP, CP* strategies for constructing the dynamic range-based indexes concerning with the case that data has high cardinality and is not uniformly distributed. The IPDF strategy decides each partition according to the Probability Density Function (p.d.f.). The CP strategy sorts the data and partitions them into PN groups for every w continuous records. The CP* strategy is an improved version of the CP strategy by adjusting the cutting points such that data records with the same value will be assigned into the same partition. On the other hand, we could take the history of users' queries into consideration. Based on the greedy approach, we propose the GreedyExt and GreedyRange strategies. The GreedyExt strategy is used for answering exact queries and the GreedyRange strategy is used for answering range queries. The two strategies decide the set of queries to construct the bitmap vectors such that the average response time of answering queries could be reduced. Moreover, a bitmap index consists of a set of bitmap vectors and the size of the bitmap index could be much larger than the capacity of the disk. We propose the FZ strategy to compress each bitmap vector to reduce the size of the storage space and provide efficient bitwise operations without decompressing these bitmap vectors. Finally, from our performance analysis, the performance of the CP* strategy could be better than the CP strategy in terms of the number of disk accesses. From our simulation, we show that the ranges divided by the IPDF and CP* strategies are more uniform than those divided by the DBEC strategy. The GreedyExt and GreedyRange strategies could provide fast response time in most of situations. Moreover, the FZ strategy could reduce the storage space more than the WAH strategy. bitmap index range query data warehouse compress OLAP
168	A HyBrid Approach-Based Signature Extraction Method for Similarity Yeh, Wei-Horng 18 July 2001 (has links) A symbolic image database system is a system in which a large amount of image data and their related information are represented by both symbolic images and physical images. How to perceive spatial relationships among the components in a symbolic image is an important criterion to find a match between the symbolic image of the scene object and the one being store as a modal in the symbolic image database. Spatial reasoning techniques have been applied to pictorial database, in particular those using 2D strings as an index representation have been successful. In this thesis, we extend the existing three levels of type-i similarity to more levels to aid similarity retrieval more precisely. There are 13 spatial operators which were introduced by Lee and Hsu to completely represent spatial relationships in 1D space. But, they just combined the 13 spatial relationships on x- and y-axis to represent the spatial relationships in 2D space by 13 times 13 = 169 spatial relationships. However, the 169 spatial relationships are still not sufficient to show all kinds of spatial relationships between any two objects in 2D space. For example, the directional relationships, like North or South West, exist in 2D space and is difficult to be deducted from those 13 spatial operators. Thus, we add the nine directional relationships to the 169 spatial relationships in 2D space. In this way, we can distinguish up to 289 spatial relationships in 2D space. Moreover, in our proposed strategy, we also take care of the problem caused by the MBRs. In most of the previous approaches for iconic indexing, for simplifying the concerns, they apply the MBRs of two objects to define the spatial relationship between them. The topological relationships, however, between objects can be quite different from the spatial relationship of their respective $MBR$s. Therefore, sometimes, it is hard to correctly describe the spatial relationship of the objects in terms of the relationships between their corresponding MBRs. To improve this drawback resulted from MBRs, we adopting the concept of topological relationships in our proposed strategy. Good access methods for large image databases are important for efficient retrieval. The signature files can be viewed as a preselection searching filter to prune off the unsatisfied images. In order to solve the ambiguity of the MBRs and to present the spatial relationships in two dimensional space completely, we propose a hybrid approach-based signature extraction method for similarity retrieval. From our simulation study, we show that our approach can provide a higher rate of a correct match and requires a smaller storage cost than Lee et al.'s 2D B-based signature approach. In some case, the correct match rate based on our proposed strategy can be up to 42.18%, while it is just 16.66% in Lee et al.'s strategy. Moreover, the worst case of the storage cost required in our proposed strategy is 1686 bits. But, it always needs 2015 bits in Lee et al.'s strategy. pictorial query similarity retrieval spatial reasoning image database
169	A Recursive Relative Prefix Sum Approach to Range Queries in Data Warehouses Wu¡@, Fa-Jung 07 July 2002 (has links) Data warehouses contain data consolidated from several operational databases and provide the historical, and summarized data which is more appropriate for analysis than detail, individual records. On-Line Analytical Processing (OLAP) provides advanced analysis tools to extract information from data stored in a Data Warehouse. OLAP is designed to provide aggregate information that can be used to analyze the contents of databases and data warehouses. A range query applies an aggregation operation over all selected cells of an OLAP data cube where the selection is specified by providing ranges of values for numeric dimensions. Range sum queries are very useful in finding trends and in discovering relationships between attributes in the database. There is a method, prefix sum method, promises that any range sum query on a data cube can be answered in constant time by precomputing some auxiliary information. However, it is hampered by its update cost. For today's applications, interactive data analysis applications which provide current or "near current" information will require fast response time and have reasonable update time. Since the size of a data cube is exponential in the number of its dimensions, rebuilding the entire data cube can be very costly and is not realistic. To cope with this dynamic data cube problem, several strategies have been proposed. They all use specific data structures, which require extra storage cost, to response range sum query fast. For example, the double relative prefix sum method makes use of three components: a block prefix array, a relative overlay array and a relative prefix array to store auxiliary information. Although the double relative prefix sum method improves the update cost, it increases the query time. In the thesis, we present a method, called the recursive relative prefix sum method, which tries to provide a compromise between query and update cost. In the recursive relative prefix sum method with k levels, we use a relative prefix array and k relative overlay arrays. From our performance study, we show that the update cost of our method is always less than that of the prefix sum method. In most of cases, the update cost of our method is less than that of the relative prefix sum method. Moreover, in most of cases, the query cost of our method is less than that of the double relative prefix sum method. Compared with the dynamic data cube method, our method has lower storage cost and shorter query time. Consequently, our recursive relative prefix sum method has a reasonable response time for ad hoc range queries on the data cube, while at the same time, greatly reduces the update cost. In some applications, however, updating in some regions may happen more frequently than others. We also provide a solution, called the weighted relative prefix sum} method, for this situation. Therefore, this method can also provide a compromise between the range sum query cost and the update cost, when the update probabilities of different regions are considered. data warehouse aggregation operation range sum query update data cube
170	Design and Analysis of Nearest Neighbor Search Strategies Chen, Hue-Ling 10 July 2002 (has links) With the proliferation of wireless communications and rapid advances in technologies, algorithms for efficiently answering queries about large number of spatial data are needed. Spatial data consists of spatial objects including data of higher dimension. Neighbor finding is one of the most important spatial operations in the field of spatial data structures. In recent years, many researchers have focused on finding efficient solutions to the nearest neighbor problem (NN) which involves determining the point in a data set that is the nearest to a given query point. It is frequently used in Geographical Information Systems (GIS). A block B is said to be the neighbor of another block A, if block B has the same property as block A has and covers an equal-sized neighbor of block A. Jozef Voros has proposed a neighbor finding strategy on images represented by quadtrees, in which the four equal-sized neighbors (the east, west, north, and south directions) of block A can be found. However, based on Voros's strategy, the case that the nearest neighbor occurs in the diagonal directions (the northeast, northwest, southeast, and southwest directions) will be ignored. Moreover, there is no total ordering that preserve proximity when mapping a spatial data from a higher dimensional space to a 1D-space. One way of effecting such a mapping is to utilize space-filling curves. Space-filling curves pass through every point in a space and give a one-one correspondence between the coordinate and the 1D-sequence number of the point. The Peano curve, proposed by Orenstein, which maps the 1D-coordinate of a point by simply interleaving the bits of the X and Y coordinates in the 2D-space, can be easily used in neighbor finding. But with the data ordered by the RBG curve or the Hilbert curve, the neighbor finding would be complex. The RBG curve achieves savings in random accesses on the disk for range queries and the Hilbert curve achieves the best clustering for range queries. Therefore, in this thesis, we first show the missing case in the Voros's strategy and show the ways to find it. Next, we show that the Peano curve is the best mapping function used in the nearest neighbor finding. We also show the transformation rules between the Peano curve and the other curves such that we can efficiently find the nearest neighbor, when the data is linearly ordered by the other curves. From our simulation, we show that our proposed two strategies can work correctly and faster than the conventional strategies in nearest neighbor finding. Finally, we present a revised version of NA-Trees, which can work for exact match queries and range queries from a large, dynamic index, where an exact match query means finding the specific data object in a spatial database and a range query means reporting all data objects which are located in a specific range. By large, we mean that most of the index must be stored in secondary memory. By dynamic, we mean that insertions and deletions are intermixed with queries, so that the index cannot be built beforehand. nearest neighbor NA-tree spatial query quadtree space-filling curve

Search results