Return to search

Algorithms and Variations on the Positional Burrows-Wheeler Transform and Their Applications

In this dissertation, we develop algorithms and variations on the Positional Burrows-Wheeler Transform (PBWT). The PBWT is a data structure that stores M binary strings of length N while allowing efficient search. We develop the dynamic-PBWT (d-PBWT). The d-PBWT is a variation of the PBWT that allows its relevant algorithms to run with unchanged time complexity, but also allows efficient insertion and deletion of haplotypes. We provide insertion and deletion algorithms on the PBWT with average case O(N) time complexity. We also improve upon the query algorithms for the PBWT. Durbin described a set maximal match query algorithm on the PBWT and claimed O(N) time complexity. Naseri et al. described a long match query algorithm on the PBWT using additional data structures (LEAP arrays) in claimed O(N+c) time complexity, where c is the number of matches outputted. We showed these bounds to be incorrect in the worst case and provided set maximal match and long match query algorithms that do have these time complexities. Furthermore, we develop a new formulation of haplotype threading, the Minimal Positional Substring Cover (MPSC). We solve the MPSC in O(N) time. Then, we solve variants of the MPSC problem: leftmost MPSC, rightmost MPSC, and set maximal match only MPSC. Using these variants to bound the solution space, we are able to represent all possible MPSCs in efficiently. Then, we solve variants that may be more biologically useful: length maximal MPSC, h-MPSC, and L-MPSC. All the MPSC problems are solved in O(N) time given a PBWT of the reference panel. Finally, we show the biological usefulness of the MPSC formulation using an imputation benchmark.

Identiferoai:union.ndltd.org:ucf.edu/oai:stars.library.ucf.edu:etd2020-2650
Date01 January 2023
CreatorsSanaullah, Ahsan
PublisherSTARS
Source SetsUniversity of Central Florida
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceElectronic Theses and Dissertations, 2020-

Page generated in 0.0025 seconds