Return to search

PEPDB Construction and Large-Scale Analysis of ESTS

The Protist EST program (PEP) aims to explore the diversity of eukaryotic genomes, in a systematic and comprehensive way. A main element of the PEP initiative is to establish a database, the Protist EST Database (PEPdb), which is the centerpiece of the PEP collaboration. The major functions of the PEPdb are management of the data generated by
PEP, analysis of these data, and to allow collected sequence information to be accessed via the Internet by PEP members or other users. In this project, a consistent and easy to use relational database was implemented. All information about PEP members, Publications, Libraries and ESTs can be stored in the database system. The operations are achieved by
a friendly user interface. This database stores about 10000 records and is displayed on the web site "http://info.biology.mcmaster.ca/ling/estHome.html" for demonstration. An analysis of ESTs from the ciliated protozoan Tetrahymena thermophila was undertaken. A total of 3740 non-redundant gene assemblies and singletons from TIGR were analyzed. These sequences have been compared against the NCBI non-redundant protein and nucleotide databases using BLASTX and BLASTN to identify putative genes. Of 850 highly significant matches with an expect value cut-off of 10^-20 , 35.5% represent genes previously cloned from T. thermophila, and 64.5% had significant similarity to genes from other organisms deposited in the NCBI. There are 26 sequences (3.1%) that matched signal transduction proteins, including Rac, Ras, MAPK, ERK1, PKC, cAMP and 14-3-3 (a protein involved in signal transduction, exocytosis and cell cycle regulation). This result indicates that T. thermophila likely encodes the MAPK/ERK signaling pathway. About 53
sequences (6.2%) matched to cytoskeleton proteins which were divided into two groups. The first group matched genes coding for microtubules, especially to tubulin genes. The other group matched to microfilament genes including one actin, three actin-related and one profilin proteins. There were no sequences similar to intermediate filaments. Comparison of the EST counts from one gene provide absolute estimates of mRNA expression levels. The most abundant genes represented are enolase, SerH3 and Tubulin. Among 850 highly significant similarities, 196 were restricted to the ciliophora. GRL and SerH are ciliate-specific genes. There were 508 sequences that had highly significant matches (expect
value < 10^-20) to human genes. Approximately 189 of them were present in humans but not found in the completely sequenced Saccharomyces cerevisiae. Based on Venn diagram analysis, T. thermophila contains abundant Eukaryotic specific proteins and many prokaryotic-like genes, and some metabolic enzymes in T. thermophila are also present in
plants. These results support the fact that T. thermophila is an excellent unicellular model system for gene discovery and functional analysis. / Thesis / Master of Science (MS)

Identiferoai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/23355
Date07 1900
CreatorsShen, Ling
ContributorsGolding, G. B., Biology
Source SetsMcMaster University
LanguageEnglish
Detected LanguageEnglish
TypeThesis

Page generated in 0.0024 seconds