Return to search

Better Selection of K-mers for Compression of DNA Sequences using Huffman Encoding

The genome of an organism contains all hereditary information encoded in DNA.Genome databases are rapidly increasing. This increase in the amount of DNA data demands an increasing need to compress the DNA data in less space for faster transmission and research activities. General text compression algorithms don’t utilize the specific characteristics of a DNA sequence. There are various tools developed using different algorithms and approaches. Many of these tools include the implementation of Huffman encoding to incorporate the characteristics of DNA sequences. Huffman-based techniques center on the idea of selecting repeated sequences to form a skewed Huffman tree. The algorithm also lies around constructing multiple Huffman trees when encoding. These implementations have demonstrated an improvement in the compression ratios compared to the standard Huffman tree. This research suggests few improvements over one of these algorithms to select the repeat sequences to obtain better compression ratios.

Identiferoai:union.ndltd.org:siu.edu/oai:opensiuc.lib.siu.edu:theses-3873
Date01 September 2021
CreatorsAgrahari, Manoj Kumar
PublisherOpenSIUC
Source SetsSouthern Illinois University Carbondale
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceTheses

Page generated in 0.0023 seconds