Global ETD Search

Return to search

Better Selection of K-mers for Compression of DNA Sequences using Huffman Encoding

The genome of an organism contains all hereditary information encoded in DNA.Genome databases are rapidly increasing. This increase in the amount of DNA data demands an increasing need to compress the DNA data in less space for faster transmission and research activities. General text compression algorithms don’t utilize the specific characteristics of a DNA sequence. There are various tools developed using different algorithms and approaches. Many of these tools include the implementation of Huffman encoding to incorporate the characteristics of DNA sequences. Huffman-based techniques center on the idea of selecting repeated sequences to form a skewed Huffman tree. The algorithm also lies around constructing multiple Huffman trees when encoding. These implementations have demonstrated an improvement in the compression ratios compared to the standard Huffman tree. This research suggests few improvements over one of these algorithms to select the repeat sequences to obtain better compression ratios.

Identifer	oai:union.ndltd.org:siu.edu/oai:opensiuc.lib.siu.edu:theses-3873
Date	01 September 2021
Creators	Agrahari, Manoj Kumar
Publisher	OpenSIUC
Source Sets	Southern Illinois University Carbondale
Detected Language	English
Type	text
Format	application/pdf
Source	Theses

Page generated in 0.0023 seconds

Better Selection of K-mers for Compression of DNA Sequences using Huffman Encoding

Description

Links & Downloads

Tags

Additional Fields