Global ETD Search

Return to search

Phylogenetic tree reconstruction with protein linkage

Phylogenetic tree reconstruction for a set of species is an important problem for understanding the evolutionary history of the species. Existing algorithms usually represent each species as a binary string with each bit indicating whether a particular gene/protein exists in the species. Given the topology of a phylogenetic tree with each leaf representing a species (a binary string of equal length) and each internal node representing the hypothetical ancestor, the Fitch-Hartigan algorithm and the Sankoff algorithm are two polynomial-time algorithms which assign binary strings to internal nodes such that the total Hamming distance between adjacent nodes in the tree is minimized. However, these algorithms oversimplify the evolutionary process by considering only the number of protein insertions/deletions (Hamming distance) between two species and by assuming the evolutionary history of each protein is independent.
Since the function of a protein may depend on the existence of other proteins, the evolutionary history of these functionally dependent proteins should be similar, i.e. functionally dependent proteins should usually be present (or absent) in a species at the same time. Thus, in addition to the Hamming distance, the protein linkage distance for some pairs/sets of proteins: whole block linkage distance, partial block linkage distance, pairwise linkage distance is introduced. It is proved that the phylogenetic tree reconstruction problem to find the binary strings for the internal nodes of a phylogenetic tree that minimizes the sum of the Hamming distance and the linkage distance is NP-hard.
In this thesis, a general algorithm to solve the phylogenetic tree reconstruction with protein linkage problem which runs in O(4^m⋅n) time for whole/partial block linkage distance and O(4^m⋅⋅ (m+n)) time for pairwise linkage distance (compared to the straight-forward O(4^m⋅ m⋅ n) or O(4^m⋅ m^2⋅⋅ n) time algorithm) is introduced where n is the number of species and m is the length of the binary string (number of proteins). It is further shown, by experiments, that our algorithm using linkage information can construct more accurate trees (better matches with the trees constructed by biologists) than the algorithms using only Hamming distance. / published_or_final_version / Computer Science / Master / Master of Philosophy

Phylogeny.

Combinatorial analysis.

Identifer	oai:union.ndltd.org:HKU/oai:hub.hku.hk:10722/181488
Date	January 2012
Creators	Yu, Junjie., 于俊杰.
Contributors	Chin, FYL
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Source Sets	Hong Kong University Theses
Language	English
Detected Language	English
Type	PG_Thesis
Source	http://hub.hku.hk/bib/B49618167
Rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works., Creative Commons: Attribution 3.0 Hong Kong License
Relation	HKU Theses Online (HKUTO)

Page generated in 0.002 seconds

Phylogenetic tree reconstruction with protein linkage

Description

Links & Downloads

Tags

Additional Fields