Strings are very simple yet very applicable data structures. Their applicability ranges from modelling DNA, to modelling protein sequences, to information retrieval, to web page searches, and many more. Due to their simplicity, there are few structural properties that could be exploited for analysis of string algorithms and their auxiliary data structures. Thus, from the beginning, researchers paid utmost attention to periodic properties of strings, such as runs which are maximal fractional periodicities.Though first conjectured in 1999 by Kolpakov and Kucherov, the runs conjecture that there are fewer runs than the length of the string was only settled in 2015 by Bannai et al. via specific Lyndon roots referred to as L-roots. This method allows mapping of runs to the starting points of its L-roots that form mutually disjoint subsets of the indices of the string. This relationship between runs and maximal Lyndon factors (substrings) of a string is not coincidental, as Bannai et al. used the
knowledge of all maximal Lyndon factors with respect to an order and its inverse to compute all runs in linear time. Thus, computing the all maximal Lyndon factors efficiently becomes of importance.
In this thesis, we review the fundamental properties of Lyndon strings,including the famous Lyndon factorization and its linear solution due to Duval. In addition to that we explore a new and conceptually simple data structure called Lyndon array and its relationship to the suffix array.
Finally, we discuss 2015 Baier's algorithm for sorting suffixes that identifies and sorts in phase 2 the maximal Lyndon factors in O(log ∑)steps for a string of length n over an alphabet ∑. We examine the fact that Baier's algorithm sorts the suffixes by sorting the maximal Lyndon factors, and present a different, potentially faster algorithm for phase 2. Our goal was to gather all the relevant well known and some unpublished facts about Lyndon strings and
their relationship to runs. In addition we present a novel O(n log(n)) recursive algorithm for computing Lyndon arrays that may be competitive with Baier's for strings with large alphabets. / Thesis / Doctor of Philosophy (PhD)
Identifer | oai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/22749 |
Date | January 2018 |
Creators | Paracha, Asma |
Contributors | Franek, Frantisek, Computing and Software |
Source Sets | McMaster University |
Language | English |
Detected Language | English |
Type | Thesis |
Page generated in 0.0024 seconds