Return to search

On the number of distinct squares in strings

<p>We investigate the problem of the maximum number of distinct primitively rooted squares in a string. In comparison to considering general strings, the number of distinct symbols in the string is introduced as an additional parameter of the problem. Let S(d,n) = max {s(x) | x is a (d,n)-string}, where s(x) denotes the number of distinct primitively rooted squares in a string x and a (d,n)-string denotes a string of length n with exactly d distinct symbols.</p> <p>Inspired by the d-step approach which was instrumental in Santos' tackling of the Hirsch conjecture, we introduce a (d,n-d) table with entries S(d,n) where d is the index for the rows and n-d is the index for the columns. We examine the properties of the S(d,n) function in the context of (d,n-d) table and conjecture that the value of S(d,n) is no more than n-d. We present several equivalent properties with the conjecture. We discuss the significance of the main diagonal of the (d,n-d) table, i.e. the square-maximal (d, 2d)-strings for their relevance to the conjectured bound for all strings. We explore their structural properties under both assumptions, complying or not complying with the conjecture, with the intention to derive a contradiction. The result yields novel properties and statements equivalent with the conjecture with computational application to the determination of the values S(d,n).</p> <p>To further populate the (d,n-d) table, we design and implement an efficient computational framework for computing S(d,n). Instead of generating all possible (d,n)-strings as the brute-force approach needs to do, the computational effort is significantly reduced by narrowing down the search space for square-maximal strings. With an easily accessible lower bound obtained either from the previously computed values inductively or by an effective heuristic search, only a relatively small set of candidate strings that might possibly exceed the lower bound is generated. To this end, the notions of s-cover and the density of a string are introduced and utilized. In special circumstances, the computational efficiency can be further improved by starting the s-cover with a double square structure. In addition, we present an auxiliary algorithm that returns the required information including the number of distinct squares for each generated candidate string. This algorithm is a modified version of FJW algorithm, an implementation based on Crochemore's partition algorithm, developed by Franek, Jiang and Weng. As of writing of this thesis, we have been able to obtain the maximum number of distinct squares in binary strings till the length of 70.</p> / Doctor of Philosophy (PhD)

Identiferoai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/13945
Date04 1900
CreatorsJiang, Mei
ContributorsFranek, Frantisek, Deza, Antoine, Fred Hoppe, Nedialko Nedialkov, Computing and Software
Source SetsMcMaster University
Detected LanguageEnglish
Typethesis

Page generated in 0.0026 seconds