20 July 2010
Related problems of string indexing and sequence analysis have been widely studied for a long time. Recently, researchers turn to consider extended versions of these problems, which provides more realistic applications. In this dissertation, we focus on three problems of recent interest, which are (1)the indexing problem for scaled strings, (2)the merged longest common subsequence problem and its variant with blocks, and (3)the sequence alignment problem with weighted constraints. The indexing problem for scaled strings asks one to preprocess a text string T, so that the matched positions of a pattern string P in T, with some scales £\ applied to P, can be reported efficiently. In this dissertation, we propose efficient algorithms for indexing real scaled strings, discretely scaled strings, and proportionally scaled strings. Our indexing algorithms achieve either significant improvements to previous results, or the best known results. The merged longest common subsequence (merged LCS) problem aims to detect the interleaving relationship between sequences, which has important applications to genomic and signal comparison. In this dissertation, we propose improved algorithms for finding the merged LCS. Our algorithms for finding the merged LCS are also more efficient than the previous results, especially for large alphabets. Finally, the sequence alignment problem with weighted constraints is a newly proposed problem in this dissertation. For this new problem, we first propose an efficient solution, and then show that the concept of weighted constraints can be further used to solve many constraint-related problems on sequences. Therefore, our results in this dissertation have significant contributions to the field of string indexing and sequence analysis.
Biswas, Subal C.
A 'Subject Indexing Language' (SIL) is an artificial language used for formulating names of subjects. Although classificationists have sought for universals in many fields of study such as, philosophy, biology, general systems theory, etc., the search for a deep structure of SILs formally began with Ranganathan's idea of 'absolute syntax' and was brought to the present by G. Bhattacharyya and D. Austin. Whereas Bhattacharyya's deep structure of SIL is primarily based on classificatory principles (parallel to 'absolute syntax'), the deep structure proposed by Austin has a linguistic connotation. The present study describes and compares two such deep structurebased SILs, viz., PRECIS (PREserved Context Index System) and DSIS (Deep Structure Indexing System), a recent computerized version of POPSI (POstulate-based Permuted Subject Indexing), developed by F. J. Devadason at Documentation Research and Training Centre, Bangalore, India. Both also belong to the category of SILs typified as 'string indexing' languages. The study involves: i) writing of a suitable DSIS index entry generation program, ii) using both PRECIS (in-house) and DSIS programs to index a collection of representative sample documents from the soft sciences, iii) analyzing and comparing their respective syntactic and semantic aspects in terms of both linguistic and classificatory principles, and iv) applying some measures of efficiency and effectiveness. It was realized that certain modifications in the existing DSIS string manipulation algorithms are necessary to make the program fully operational. Although, no attempts have been made to quantify the measures of effectiveness and efficiency as such, suggestions have been provided as to what these probably would be. Some indications of their searching difficulties for a prospective searcher have been put forward as well.
Page generated in 0.1188 seconds