Return to search

Efficient recovery algorithms with restricted access to strings

We design efficient algorithms for computational problems over strings in several models where the algorithms have limited access to the input. These models, and algorithms developed respecting these constraints, are becoming increasingly relevant due to the rapidly increasing size of datasets in myriad applications.

Our first problem of interest is \emph{trace reconstruction}. This is an important problem in learning theory and coding theory, and has applications in computational biology. In this problem, the goal is to recover an unknown string given independent samples (\emph{traces}) of it generated via a probabilistic noise process called the deletion channel. We give state-of-the-art algorithms for this problem in several settings.

Then we consider the problem of estimating the \emph{longest increasing subsequence (LIS)} of a given string in sublinear time, given query access to the string. While the LIS of a string can be computed exactly in near-linear time, the optimal complexity of approximating the LIS length, especially when the LIS is much less than the string length, is still open. We significantly improve upon prior work in terms of both approximation and time complexity in this regime. The runtime of our algorithm essentially matches the trivial query complexity lower bound as a function of the length of the LIS.

Finally, we consider the problem of local decoding, or random access, on compressed strings. The Burrows-Wheeler Transform (BWT) is an important preprocessing step in lossless text compression that rearranges a string into runs of identical characters (by exploiting context regularities), resulting in highly compressible strings. However, the decoding process of the BWT is inherently sequential, and prevents fast random access to the original string. We design a succinct data structure for locally decoding short substrings (and answering several other queries) of a given string under its compressed BWT efficiently.

Identiferoai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/mjyw-9p25
Date January 2022
CreatorsSinha, Sandip
Source SetsColumbia University
LanguageEnglish
Detected LanguageEnglish
TypeTheses

Page generated in 0.0023 seconds