Return to search

Construction of Distributed Method for Analyzing a Large Number of Sequence Data: Using Influenza A Virus Protein Sequences as Examples

Abstract
Analyzing the eight genomic protein segments of influenza A virus could provide a better understanding of this specific virus. Along with the progress of computer technology, numerous influenza A virus protein sequences were available in various internet data banks. However, analyzing a large number of protein sequences is a cumbersome work. Thus it is necessary to develop new tools with algorithmic method. This study used distributed method to develop a protein sequence clustering analysis software by JAVA programming language. The software could split a large number of protein sequences downloaded from NCBI into several files. Because of these individual files were calculated at the same time, therefore it could reduce the time in process of comparison and analysis. Finally, we used PRIMER 5 program to analyze these individual files and produce similarity analysis chart diagrams of MDS and UPGMA. In The similarity analysis chart diagrams indicated high homology in genomic protein segments of influenza A virus from year 1997 to 2006. The analysis also showed the genomic protein segments of influenza A virus are similar in Asia countries. However, the similarity between Asian countries and China is not significant. From analyzing the hosts, the genomic protein segments of influenza A virus are highly similar in species such like birds, chickens, ducks and pigs. Therefore, our data strongly support that the possibility of influenza A viruses can cross species to infect humans.

Identiferoai:union.ndltd.org:NSYSU/oai:NSYSU:etd-1101110-223403
Date01 November 2010
CreatorsTu, Guo-Hua
ContributorsChi-Hsin Hsu, Chan-Shing Lin, Jie-Min Kuo, Jong-Kang Liu
PublisherNSYSU
Source SetsNSYSU Electronic Thesis and Dissertation Archive
LanguageCholon
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-1101110-223403
Rightsunrestricted, Copyright information available at source archive

Page generated in 0.0021 seconds