Global ETD Search

Return to search

ANALYSIS AND COMPARISON OF USEARCH AND DNACLUST SOFTWARE PACKAGES

Over the past several years, new DNA sequencing technologies have led to a great in-
crease in the quantity of biological sequence data that can be generated. Typically there
may be millions or even billions of short reads sequences of a few hundred base pairs
that are to some degree redundant: the data fall naturally into clusters of sequences
that are highly similar to each other. In order to reduce the time required for analysis
of the data, it therefore becomes of interest to compute representatives of these clusters,
based on some definition of similarity.
In this thesis we examine two clustering software packages, USEARCH and DNACLUST,
that seek to perform this clustering task efficiently. We provide an overview of the techniques used by these two packages; we compare and evaluate them both from a methodological and experimental perspective, and draw conclusions about their effectiveness and utility. / Thesis / Master of Science (MSc)

http://hdl.handle.net/11375/18471

Short reads

Clusters

Similarity

Identifer	oai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/18471
Date	11 1900
Creators	Shafqat, Raazia
Contributors	Smyth, W. F., Computer Science
Source Sets	McMaster University
Language	English
Detected Language	English
Type	Thesis

Page generated in 0.0017 seconds

ANALYSIS AND COMPARISON OF USEARCH AND DNACLUST SOFTWARE PACKAGES

Description

Links & Downloads

Tags

Additional Fields