Return to search

Information Theory for Biological Sequence Classification: A Novel Feature Extraction Technique Based on Tsallis Entropy

In recent years, there has been an exponential growth in sequencing projects due to
accelerated technological advances, leading to a significant increase in the amount of data and
resulting in new challenges for biological sequence analysis. Consequently, the use of techniques
capable of analyzing large amounts of data has been explored, such as machine learning (ML)
algorithms. ML algorithms are being used to analyze and classify biological sequences, despite the
intrinsic difficulty in extracting and finding representative biological sequence methods suitable for
them. Thereby, extracting numerical features to represent sequences makes it statistically feasible
to use universal concepts from Information Theory, such as Tsallis and Shannon entropy. In this
study, we propose a novel Tsallis entropy-based feature extractor to provide useful information to
classify biological sequences. To assess its relevance, we prepared five case studies: (1) an analysis
of the entropic index q; (2) performance testing of the best entropic indices on new datasets; (3) a
comparison made with Shannon entropy and (4) generalized entropies; (5) an investigation of the
Tsallis entropy in the context of dimensionality reduction. As a result, our proposal proved to
be effective, being superior to Shannon entropy and robust in terms of generalization, and also
potentially representative for collecting information in fewer dimensions compared with methods
such as Singular Value Decomposition and Uniform Manifold Approximation and Projection.

Identiferoai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:92910
Date05 August 2024
CreatorsBonidia, Robson P., Avila Santos, Anderson P., de Almeida, Breno L. S., Stadler, Peter F., Nunes da Rocha, Ulisses, Sanches, Danilo S., de Carvalho, André C. P. L. F.
PublisherMDPI
Source SetsHochschulschriftenserver (HSSS) der SLUB Dresden
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/publishedVersion, doc-type:article, info:eu-repo/semantics/article, doc-type:Text
Rightsinfo:eu-repo/semantics/openAccess
Relation1099-4300, 10.3390/e24101398

Page generated in 0.0021 seconds