Global ETD Search

Return to search

An Extension of The Berry-Ravindran Algorithm for protein and DNA data

String matching algorithms are the algorithms used to search through different types of text in search of a certain pattern. Many of these algorithms achieve their impressive performance by analysing the pattern and saving that information. That information is then continuously used during the searching phase to know what parts of the text can be skipped. One such algorithm is the Berry-Ravindran. The Berry-Ravindran checks the two characters past the current try for a match and sees if those characters exist in the pattern. This thesis compares the Berry-Ravindran algorithm to new versions of itself that check three and four characters instead of two, along with the Boyer-Moore algorithm. Checking more characters improves the amount of the text that can be skipped by reducing the number of attempts needed but exponentially increases the pre-processing time. The improved performance in attempts does not necessarily mean a faster run-time because of the increased pre-processing time. The variable impacting the pre-processing time the biggest is the size of the alphabet that the text uses. This is researched by testing these algorithms with patterns ranging from 4 to 100 characters long on two different data sets. Protein data which has an alphabet size of 27 and DNA data which has an alphabet size of 4.

http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-201038

Berry-Ravindran

text matching

string matching

exact-matching string algorithms

Engineering and Technology

Teknik och teknologier

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:umu-201038
Date	January 2022
Creators	Riekkola, Jesper
Publisher	Umeå universitet, Institutionen för datavetenskap
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	UMNAD ; 1365

Page generated in 0.0028 seconds

An Extension of The Berry-Ravindran Algorithm for protein and DNA data

Description

Links & Downloads

Tags

Additional Fields