Return to search

Kompresia biologických sekvencií / Compression of biological sequences

Volumes of data obtained from the next generation sequencing platforms is growing faster than the available capacity of storage media. Sequencers mainly produce short reads of DNA. However, output of the sequencing machines also contains other information, for example information about read reliability/quality. This data must be archived even after successful complete genome assembly. Standard file format used for this type of data is format SAM (Sequence Alignment/Mapping Format) and its binary compressed version BAM. In this thesis we describe the construction of a better lossless compression scheme for compression of files in the SAM/BAM format. This compression scheme provides better compression ratios than the BAM format. In addition, random access to data in the compressed file is retained. Implementation of this compression scheme is platform independent and allows simple configuration of the compression process. Implementation also offers easy extensibility. Thanks to this, we will be able to respond to changes in current sequencing platforms as well as to changes in the SAM format.

Identiferoai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:306492
Date January 2012
CreatorsŠurín, Tomáš
ContributorsMráz, František, Dvořák, Tomáš
Source SetsCzech ETDs
LanguageSlovak
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/masterThesis
Rightsinfo:eu-repo/semantics/restrictedAccess

Page generated in 0.0022 seconds