As a result of the rapid growth of the volume of electronic data, text compression and indexing techniques are receiving more and more attention. These two issues are usually treated as independent problems, but approaches of combining them have recently attracted the attention of researchers. In this thesis, we review and test some of the more effective and some of the more theoretically interesting techniques. Various compression and indexing techniques are presented, and we also present two compressed text indices. Based on these techniques, we implement an compressed full-text index, so that compressed texts can be indexed to support fast queries without decompressing the whole texts. The experiments show that our index is compact and supports fast search.
Teo, Yong Meng, March, Verdi, Wang, Xianbing
This paper presents a DHT-based grid resource indexing and discovery (DGRID) approach. With DGRID, resource-information data is stored on its own administrative domain and each domain, represented by an index server, is virtualized to several nodes (virtual servers) subjected to the number of resource types it has. Then, all nodes are arranged as a structured overlay network or distributed hash table (DHT). Comparing to existing grid resource indexing and discovery schemes, the benefits of DGRID include improving the security of domains, increasing the availability of data, and eliminating stale data. / Singapore-MIT Alliance (SMA)
14 February 2007
Text categorization to automatically assign documents into the appropriate pre-defined category or categories is essential to facilitating the retrieval of desired documents efficiently and effectively from a huge text depository, e.g., the world-wide web. Most techniques, however, suffer from the feature selection problem and the vocabulary mismatch problem. A few research works have addressed on text categorization via text summarization to reduce the size of documents, and consequently the number of features to consider, while some proposed using latent semantic indexing (LSI) to reveal the true meaning of a term via its association with other terms. Few works, however, have studied the joint effect of text summarization and the semantic dimension reduction technique in the literature. The objective of this research is thus to propose a practical approach, SBDR to deal with the above difficulties in text categorization tasks. Two experiments are conducted to validate our proposed approach. In the first experiment, the results show that text summarization does improve the performance in categorization. In addition, to construct important sentences, the association terms of both noun-noun and noun-verb pairs should be considered. Results of the second experiment indicate slight better performance with the approach of adopting LSI exclusively (i.e. no summarization) than that with SBDR (i.e. with summarization). Nonetheless, the minor accuracy reduction can be largely compensated for the computational time saved using LSI with text summarized. The feasibility of the SBDR approach is thus justified.
20 July 2010
Related problems of string indexing and sequence analysis have been widely studied for a long time. Recently, researchers turn to consider extended versions of these problems, which provides more realistic applications. In this dissertation, we focus on three problems of recent interest, which are (1)the indexing problem for scaled strings, (2)the merged longest common subsequence problem and its variant with blocks, and (3)the sequence alignment problem with weighted constraints. The indexing problem for scaled strings asks one to preprocess a text string T, so that the matched positions of a pattern string P in T, with some scales £\ applied to P, can be reported efficiently. In this dissertation, we propose efficient algorithms for indexing real scaled strings, discretely scaled strings, and proportionally scaled strings. Our indexing algorithms achieve either significant improvements to previous results, or the best known results. The merged longest common subsequence (merged LCS) problem aims to detect the interleaving relationship between sequences, which has important applications to genomic and signal comparison. In this dissertation, we propose improved algorithms for finding the merged LCS. Our algorithms for finding the merged LCS are also more efficient than the previous results, especially for large alphabets. Finally, the sequence alignment problem with weighted constraints is a newly proposed problem in this dissertation. For this new problem, we first propose an efficient solution, and then show that the concept of weighted constraints can be further used to solve many constraint-related problems on sequences. Therefore, our results in this dissertation have significant contributions to the field of string indexing and sequence analysis.
Yick, (Winnie) Yuki B. Haungs, Michael L.
Thesis (M.S.)--California Polytechnic State University, 2009. / Mode of access: Internet. Title from PDF title page; viewed on Jan. 6, 2010. Major professor: Dr. Michael Haungs. "Presented to the faculty of California Polytechnic State University, San Luis Obispo." "In partial fulfillment of the requirements for the degree [of] Master of Science in Computer Science." "Aug 2009." Includes bibliographical references (p. 76-78).
A term co-occurrence based framework for understanding LSI [i.e. latent semantic indexing] : theory and practice /Kontostathis, April. January 2003 (has links)
Thesis (Ph. D.)--Lehigh University, 2004. / Includes vita. Includes bibliographical references (leaves 94-103).
Wu, Man-kit, Edward.
Thesis (M. Phil.)--University of Hong Kong, 2010. / Includes bibliographical references (leaves 69-72). Also available in print.
Parmar, Sonal D.
Thesis (M.L.A.) -- University of Texas at Arlington, 2008.
Business magazines present and desirable index coverage : based on a study of commercial, financial, trad, and industrial papers selected through examination and use in the Cleveland Public Library : [a thesis submitted in partial fulfillment of the requirements toward a Master's degree in Library Science] /Hanson, Agnes O. January 1941 (has links)
Thesis (M.L.S.)--University of Michigan, 1941.
Zhu, Weizhong. Allen, Robert B.
Thesis (Ph.D.)--Drexel University, 2009. / Includes abstract and vita. Includes bibliographical references (leaves 115-121).
Page generated in 0.0931 seconds