• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Efficient Implementation & Application of Maximal String Covering Algorithm / MAXIMAL COVER ALGORITHM IMPLEMENTATION

Koponen, Holly January 2022 (has links)
This thesis describes the development and application of the new software MAXCOVER that computes maximal covers and non-extendible repeats (a.k.a. “maximal repeats”). A string is a finite array x[1..n] of elements chosen from a set of totally ordered symbols called an alphabet. A repeat is a substring that occurs at least twice in x. A repeat is left/right extendible if every occurrence is preceded/followed by the same symbol; otherwise, it is non-left/non-right extendible (NLE/NRE). A non-extendible (NE) repeat is both NLE and NRE. A repeat covers a position i if x[i] lies within the repeat. A maximal cover (a.k.a. “optimal cover”) is a repeat that covers the most positions in x. For simplicity, we first describe a quadratic O(n2) implementation of MAXCOVER to compute all maximal covers of a given string based on the pseudocode given in [1]. Then, we consider the logarithmic O(n log n) pseudocode in [1], in which we identify several errors. We leave a complete correction and implementation for future work. Instead, we propose two improved quadratic algorithms that, shown through experiments, will execute in linear time for the average case. We perform a benchmark evaluation of MAXCOVER’s performance and demonstrate its value to biologists in the protein context [2]. To do so, we develop an extension of MAXCOVER for the closely related task of computing NE repeats. Then, we compare MAXCOVER to the repeat-match feature of the well-known MUMmer software [3] (600+ citations). We determine that MAXCOVER is an order-of-magnitude faster than MUMmer with much lower space requirements. We also show that MAXCOVER produces a more compact, exact, and user-friendly output that specifies the repeats. Availability: Open source code, binaries, and test data are available on Github at https://github.com/hollykoponen/MAXCOVER. Currently runs on Linux, untested on other OS. / Thesis / Master of Science (MSc) / This thesis deals with a simple yet essential data structure called a string, a sequence of symbols drawn from an alphabet. For example, a DNA sequence is a string comprised of four letters. We describe a new software called MAXCOVER that identifies maximal covers of a given string x (a repeating substring that ‘covers’ the most positions in x). This software is based on the algorithms in [1]. We propose two new algorithms that perform faster in practice. We also extended MAXCOVER for the closely related task of computing non-extendible repeats. We compare this extension to the well-known MUMmer software (600+ citations). We find that MAXCOVER is many times faster than MUMmer with much lower space requirements and produces a more compact, exact and user-friendly output.

Page generated in 0.2538 seconds