The genome of an organism contains all hereditary information encoded in DNA.Genome databases are rapidly increasing. This increase in the amount of DNA data demands an increasing need to compress the DNA data in less space for faster transmission and research activities. General text compression algorithms don’t utilize the specific characteristics of a DNA sequence. There are various tools developed using different algorithms and approaches. Many of these tools include the implementation of Huffman encoding to incorporate the characteristics of DNA sequences. Huffman-based techniques center on the idea of selecting repeated sequences to form a skewed Huffman tree. The algorithm also lies around constructing multiple Huffman trees when encoding. These implementations have demonstrated an improvement in the compression ratios compared to the standard Huffman tree. This research suggests few improvements over one of these algorithms to select the repeat sequences to obtain better compression ratios.
Identifer | oai:union.ndltd.org:siu.edu/oai:opensiuc.lib.siu.edu:theses-3873 |
Date | 01 September 2021 |
Creators | Agrahari, Manoj Kumar |
Publisher | OpenSIUC |
Source Sets | Southern Illinois University Carbondale |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Theses |
Page generated in 0.0018 seconds