<p> A string is a sequence of various simple elements. The most straightforward examples of strings are English words-concatenations of the 26 letters of the English alphabet. A repetition in a string x is a nonempty substring of the form x[i..j] = u^k, k ≥ 2. The study of repetitions in strings is as old as the study of strings themselves. Furthermore, the identification of repetitions in a given finite string still remains an important topic in a variety of contexts: pattern-matching, computational biology, data compression, cryptology, and many other areas. </p> <p> A run in a string xis a substring in the form x[i..j] = u^kv, k ≥ 2 where v is a prefix of u, u is not a repetition itself, and this substring x[i..j] is neither left-extendible nor right-extendible. The notion of runs thus captures the notion of leftmost maximal repetitions and allows for a succinct notation [M89]. The maximal number of runs over all strings of length n is denoted as p(n). To determine the properties of the function p(n) is an important aspect of the research in periodicities in strings. </p> <p> Prior to the asymptotic lower bound presented by Franek and myself in [FY06] (presented here in Chapter 2), there had been no known non-trivial lower bound for p(n), asymptotic or otherwise. A result suggesting a possible lower bound was presented by Franek, Simpson and Smyth in 2003, introducing a construction of a sequence of strings {xn}~=0, so that limn→∞ r(Xn)/[Xn] = 3/(1+√5) ≈ 0.927 [FSS03]. Theirmethod was extended to provide a true asymptotic lower bound in [FY06]. In the first part of Chapter 2, the recursive construction of the sequence of strings from [FSS03] is presented with all details not discussed in either [FSS03] or [FY06]. In the second part of Chapter 2, a construction of the lower bound is presented with all details. This part represents my original contribution to the research. </p> <p> I designed a new approach to generate strings that are "rich in runs" other then the one used in [FSS03] and [FY06]. A similar approach as in Chapter 2 is used to construct a lower bound for p(n) using the alternate construction of sequences of strings. This new construction method gives, interestingly, sequences with the same limit as in [FY06], thus giving some support to the conjecture that limn→∞ p(n)/n = 3/(1+√5) stated in [FSS03]. This method is presented in Chapter 3. The whole Chapter 3 thus represents another part of my original contribution to the research. </p> <p> It had been known since the 1980's that the number of repetitions in a string of length n is at most of the order O(n log n). A remarkable result by Kolpakov and Kucherov in 2000 showed that p(n) was in fact bounded by a function linear in n [KK00]. Their approach only. provided the existence of such a function, not the concrete values of its constants. Recently, Rytter improved the upper bound of p(n) to 5n. [R06). The paper by Rytter was published in a conference proceedings and as such lacked many details in some areas and was bit too vague. In Chapter 4 I present Rytter's proof with all relevant details filled in. Through a private communication I learned at the time of writing of this thesis that the upper bound had been improved by Rytter, and independently by Smyth, Simpson, and Puglisi to 3.5n. The latest upper bound is supposed to be now as low as 1.5n. However, none of the upper bounds better than 5n has been published yet. </p> <p> In the last chapter I discuss my conclusions and point out the directions for the future research. </p> / Thesis / Master of Science (MSc)
Identifer | oai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/21355 |
Date | January 2007 |
Creators | Yang, Qian |
Contributors | Franek, Frantisek, Computer Science |
Source Sets | McMaster University |
Language | English |
Detected Language | English |
Page generated in 0.0016 seconds