Return to search

ANALYSIS AND COMPARISON OF USEARCH AND DNACLUST SOFTWARE PACKAGES

Over the past several years, new DNA sequencing technologies have led to a great in-
crease in the quantity of biological sequence data that can be generated. Typically there
may be millions or even billions of short reads sequences of a few hundred base pairs
that are to some degree redundant: the data fall naturally into clusters of sequences
that are highly similar to each other. In order to reduce the time required for analysis
of the data, it therefore becomes of interest to compute representatives of these clusters,
based on some definition of similarity.
In this thesis we examine two clustering software packages, USEARCH and DNACLUST,
that seek to perform this clustering task efficiently. We provide an overview of the techniques used by these two packages; we compare and evaluate them both from a methodological and experimental perspective, and draw conclusions about their effectiveness and utility. / Thesis / Master of Science (MSc)

Identiferoai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/18471
Date11 1900
CreatorsShafqat, Raazia
ContributorsSmyth, W. F., Computer Science
Source SetsMcMaster University
LanguageEnglish
Detected LanguageEnglish
TypeThesis

Page generated in 0.0019 seconds