Global ETD Search

251	Important Extrema of Time Series: Theory and Applications Gandhi, Harith Suman 23 March 2004 (has links) We describe techniques for fast compression of time series and hierarchical indexing of compressed series based on the assignment of importance levels to the extrema of time series and their derivatives. We formalize the distance functions used in compression and retrieval techniques. We describe retrieval techinques that use the developed compression and indexing techniques for fast retreval of series from a database that match a given pattern. minima maxima indexing trees derivative series distance American Studies Arts and Humanities
252	Facilitating Retrieval of Sound Recordings for Use By Professionals Treating Children with Asperger's Syndrome Dena L Belvin 1 August 2007 (has links) Since the 1970s, music librarians have been discussing the challenges of cataloging music media. In the 1990s, they began work on a Music Thesaurus to provide a multi-faceted approach to indexing, cataloging, and retrieving music media. In 1999 Indiana University proposed a digital music library, to allow for better indexing and retrieval in addition to content-based music retrieval. In 2000, a commercial venture, The Music Genome Project ©, began cataloging sound recordings of popular music by hundreds of musical characteristics and has created a user interface that allows listeners to enter the title and artist of a certain piece of music and receive recommendations for similar music to then purchase via Pandora.com. The following paper will address the question: how might current analyzing and classifying methods be used to provide additional indexing that facilitates retrieval and use of sound recordings by special populations, specifically professionals treating children with Asperger’s syndrome?
253	Upper and Lower Bounds for Text Upper and Lower Bounds for Text Indexing Data Structures Golynski, Alexander 10 December 2007 (has links) The main goal of this thesis is to investigate the complexity of a variety of problems related to text indexing and text searching. We present new data structures that can be used as building blocks for full-text indices which occupies minute space (FM-indexes) and wavelet trees. These data structures also can be used to represent labeled trees and posting lists. Labeled trees are applied in XML documents, and posting lists in search engines. The main emphasis of this thesis is on lower bounds for time-space tradeoffs for the following problems: the rank/select problem, the problem of representing a string of balanced parentheses, the text retrieval problem, the problem of computing a permutation and its inverse, and the problem of representing a binary relation. These results are divided in two groups: lower bounds in the cell probe model and lower bounds in the indexing model. The cell probe model is the most natural and widely accepted framework for studying data structures. In this model, we are concerned with the total space used by a data structure and the total number of accesses (probes) it performs to memory, while computation is free of charge. The indexing model imposes an additional restriction on the storage: the object in question must be stored in its raw form together with a small index that facilitates an efficient implementation of a given set of queries, e.g. finding rank, select, matching parenthesis, or an occurrence of a given pattern in a given text (for the text retrieval problem). We propose a new technique for proving lower bounds in the indexing model and use it to obtain lower bounds for the rank/select problem and the balanced parentheses problem. We also improve the existing techniques of Demaine and Lopez-Ortiz using compression and present stronger lower bounds for the text retrieval problem in the indexing model. The most important result of this thesis is a new technique for cell probe lower bounds. We demonstrate its strength by proving new lower bounds for the problem of representing permutations, the text retrieval problem, and the problem of representing binary relations. (Previously, there were no non-trivial results known for these problems.) In addition, we note that the lower bounds for the permutations problem and the binary relations problem are tight for a wide range of parameters, e.g. the running time of queries, the size and density of the relation. theoretical computer science data structures lower bounds text indexing rank/select problem representation of permutations Computer Science
254	Upper and Lower Bounds for Text Upper and Lower Bounds for Text Indexing Data Structures Golynski, Alexander 10 December 2007 (has links) The main goal of this thesis is to investigate the complexity of a variety of problems related to text indexing and text searching. We present new data structures that can be used as building blocks for full-text indices which occupies minute space (FM-indexes) and wavelet trees. These data structures also can be used to represent labeled trees and posting lists. Labeled trees are applied in XML documents, and posting lists in search engines. The main emphasis of this thesis is on lower bounds for time-space tradeoffs for the following problems: the rank/select problem, the problem of representing a string of balanced parentheses, the text retrieval problem, the problem of computing a permutation and its inverse, and the problem of representing a binary relation. These results are divided in two groups: lower bounds in the cell probe model and lower bounds in the indexing model. The cell probe model is the most natural and widely accepted framework for studying data structures. In this model, we are concerned with the total space used by a data structure and the total number of accesses (probes) it performs to memory, while computation is free of charge. The indexing model imposes an additional restriction on the storage: the object in question must be stored in its raw form together with a small index that facilitates an efficient implementation of a given set of queries, e.g. finding rank, select, matching parenthesis, or an occurrence of a given pattern in a given text (for the text retrieval problem). We propose a new technique for proving lower bounds in the indexing model and use it to obtain lower bounds for the rank/select problem and the balanced parentheses problem. We also improve the existing techniques of Demaine and Lopez-Ortiz using compression and present stronger lower bounds for the text retrieval problem in the indexing model. The most important result of this thesis is a new technique for cell probe lower bounds. We demonstrate its strength by proving new lower bounds for the problem of representing permutations, the text retrieval problem, and the problem of representing binary relations. (Previously, there were no non-trivial results known for these problems.) In addition, we note that the lower bounds for the permutations problem and the binary relations problem are tight for a wide range of parameters, e.g. the running time of queries, the size and density of the relation. theoretical computer science data structures lower bounds text indexing rank/select problem representation of permutations Computer Science
255	Semantic Routed Network for Distributed Search Engines Biswas, Amitava 2010 May 1900 (has links) Searching for textual information has become an important activity on the web. To satisfy the rising demand and user expectations, search systems should be fast, scalable and deliver relevant results. To decide which objects should be retrieved, search systems should compare holistic meanings of queries and text document objects, as perceived by humans. Existing techniques do not enable correct comparison of composite holistic meanings like: "evidences on role of DR2 gene in development of diabetes in Caucasian population", which is composed of multiple elementary meanings: "evidence", "DR2 gene", etc. Thus these techniques can not discern objects that have a common set of keywords but convey different meanings. Hence we need new methods to compare composite meanings for superior search quality. In distributed search engines, for scalability, speed and efficiency, index entries should be systematically distributed across multiple index-server nodes based on the meaning of the objects. Furthermore, queries should be selectively sent to those index nodes which have relevant entries. This requires an overlay Semantic Routed Network which will route messages, based on meaning. This network will consist of fast response networking appliances called semantic routers. These appliances need to: (a) carry out sophisticated meaning comparison computations at high speed; and (b) have the right kind of behavior to automatically organize an optimal index system. This dissertation presents the following artifacts that enable the above requirements: (1) An algebraic theory, a design of a data structure and related techniques to efficiently compare composite meanings. (2) Algorithms and accelerator architectures for high speed meaning comparisons inside semantic routers and index-server nodes. (3) An overlay network to deliver search queries to the index nodes based on meanings. (4) Algorithms to construct a self-organizing, distributed meaning based index system. The proposed techniques can compare composite meanings ~105 times faster than an equivalent software code and existing hardware designs. Whereas, the proposed index organization approach can lead to 33% savings in number of servers and power consumption in a model search engine having 700,000 servers. Therefore, using all these techniques, it is possible to design a Semantic Routed Network which has a potential to improve search results and response time, while saving resources.
256	Clustering Multilingual Documents: A Latent Semantic Indexing Based Approach Lin, Chia-min 09 February 2006 (has links) Document clustering automatically organizes a document collection into distinct groups of similar documents on the basis of their contents. Most of existing document clustering techniques deal with monolingual documents (i.e., documents written in one language). However, with the trend of globalization and advances in Internet technology, an organization or individual often generates/acquires and subsequently archives documents in different languages, thus creating the need for multilingual document clustering (MLDC). Motivated by its significance and need, this study designs a Latent Semantic Indexing (LSI) based MLDC technique. Our empirical evaluation results show that the proposed LSI-based multilingual document clustering technique achieves satisfactory clustering effectiveness, measured by both cluster recall and cluster precision. Document clustering Latent semantic analysis Latent semantic indexing Text mining Document management Multilingual document clustering
257	Automatic Image Annotation By Ensemble Of Visual Descriptors Akbas, Emre 01 August 2006 (has links) (PDF) Automatic image annotation is the process of automatically producing words to de- scribe the content for a given image. It provides us with a natural means of semantic indexing for content based image retrieval. In this thesis, two novel automatic image annotation systems targeting di&amp / #64256 / erent types of annotated data are proposed. The &amp / #64257 / rst system, called Supervised Ensemble of Visual Descriptors (SEVD), is trained on a set of annotated images with prede&amp / #64257 / ned class labels. Then, the system auto- matically annotates an unknown sample depending on the classi&amp / #64257 / cation results. The second system, called Unsupervised Ensemble of Visual Descriptors (UEVD), assumes no class labels. Therefore, the annotation of an unknown sample is accomplished by unsupervised learning based on the visual similarity of images. The available auto- matic annotation systems in the literature mostly use a single set of features to train a single learning architecture. On the other hand, the proposed annotation systems utilize a novel model of image representation in which an image is represented with a variety of feature sets, spanning an almost complete visual information comprising color, shape, and texture characteristics. In both systems, a separate learning entity is trained for each feature set and these entities are gathered under an ensemble learning approach. Empirical results show that both SEVD and UEVD outperform some of the state-of-the-art automatic image annotation systems in equivalent experimental setups. QA General 15707
258	An Ontology-based Retrieval System Using Semantic Indexing Kara, Soner 01 July 2010 (has links) (PDF) In this thesis, we present an ontology-based information extraction and retrieval system and its application to soccer domain. In general, we deal with three issues in semantic search, namely, usability, scalability and retrieval performance. We propose a keyword-based semantic retrieval approach. The performance of the system is improved considerably using domain-specific information extraction, inference and rules. Scalability is achieved by adapting a semantic indexing approach. The system is implemented using the state-of-the-art technologies in SemanticWeb and its performance is evaluated against traditional systems as well as the query expansion methods. Furthermore, a detailed evaluation is provided to observe the performance gain due to domain-specific information extraction and inference. Finally, we show how we use semantic indexing to solve simple structural ambiguities. QA Computer Software 76.75-76.765
259	Exploiting Information Extraction Techniques For Automatic Semantic Annotation And Retrieval Of News Videos In Turkish Kucuk, Dilek 01 February 2011 (has links) (PDF) Information extraction (IE) is known to be an effective technique for automatic semantic indexing of news texts. In this study, we propose a text-based fully automated system for the semantic annotation and retrieval of news videos in Turkish which exploits several IE techniques on the video texts. The IE techniques employed by the system include named entity recognition, automatic hyperlinking, person entity extraction with coreference resolution, and event extraction. The system utilizes the outputs of the components implementing these IE techniques as the semantic annotations for the underlying news video archives. Apart from the IE components, the proposed system comprises a news video database in addition to components for news story segmentation, sliding text recognition, and semantic video retrieval. We also propose a semi-automatic counterpart of system where the only manual intervention takes place during text extraction. Both systems are executed on genuine video data sets consisting of videos broadcasted by Turkish Radio and Television Corporation. The current study is significant as it proposes the first fully automated system to facilitate semantic annotation and retrieval of news videos in Turkish, yet the proposed system and its semi-automated counterpart are quite generic and hence they could be customized to build similar systems for video archives in other languages as well. Moreover, IE research on Turkish texts is known to be rare and within the course of this study, we have proposed and implemented novel techniques for several IE tasks on Turkish texts. As an application example, we have demonstrated the utilization of the implemented IE components to facilitate multilingual video retrieval. QA Computer Software 76.75-76.765
260	Design and Implementation of Query Processing Strategies for Video Data Yang, Wen-Haur 09 July 2002 (has links) Traditional database systems only support textual and numerical data. Video data stored in these database systems can only be retrieved through their video identifiers, titles or descriptions. In the video data, frame-by-frame object change is one of the most obvious information. Each video contains temporal and spatial relationships between content objects. The temporal relationships can be specified between frame sequences and the spatial relationships can be specified by the relationships between objects in a single frame. The difficulty in designing a content-based video database system is how to store and describe the relationships between moving objects completely. Many researches on content-based video retrieval represented the content of video as a set of frames, but they either left out the temporal ordering of frames in the shot or only stored the relationships between objects in a single frame. According to these observations, we conclude that a content-based video database system requires video indexing, query processing and a convenient user interface to fit the requirements and characteristics of videos. In this thesis, we design and implement a query processing strategy for video data. In the proposed strategy, we consider three query types: the exact object match, the spatial-temporal object retrieval and the motion query, where a exact object match is to find the video files which contain the specific objects, a spatial-temporal objects retrieval is to retrieve the object pairs that satisfy some spatial-temporal relationships and a motion query is to find the set of frames which contain the object movements. Moreover, we consider three design issues: the video indexing, the video query processing and the video query interface. When there are a large number of videos in a video database and each video contains many shots, frames and objects, the processing time for content retrieval is tremendous. Thus, we need a proper video indexing strategy to speed up the searching time. In order to fulfill the spatial-temporal relationships of objects between different frames, we give the indexes both in the spatial and temporal axes. In the temporal index file structure, we propose the shot-based B+-tree to index the temporal data. In the spatial index file structure, we use R-tree to store not only the relationships between objects in one frame, but also the relationships of one object when the object first and last appears in the shot. Based on this strategy, we can describe the status of a moving object in details. For the part of query processing, we propose a signature file structure to filter out the videos that absolutely can not be the answer. After that, in order to determine whether the answer exists in the candidate videos, we use a multi-dimensional string, called binary string, to represent the spatial-temporal relationships between objects. Then, the video query processing problem will become a binary string matching problem. Finally, we design and implement an user-friendly user interface. Our system is performed on a Pentium III machine with one CPU clock rate of 550 MHz, 256 MB of main memory, running under Windows 2000 Professional edition, used Access 2000 database and coded in Delphi 6 with about 10,000 lines. From our experience, we show that the proposed system can support an efficient query processing, a fast searching capabilities and an user-friendly user interface. Video Query Processing Video Data Spatial-Temporal Relationships Video Indexing shot-based B+-tree

Search results