Spelling suggestions: "subject:"sequence tais"" "subject:"sequence tag""
1 |
Identification and annotation of full-length genes in Atlantic salmon (Salmo salar)Leong, Jong S. 18 October 2011 (has links)
Large-scale expressed sequence tags (ESTs) in Atlantic salmon (Salmo salar) are examined to answer questions regarding salmonid transcriptomes. ESTs represent raw and incomplete gene sequences that need to be read, assembled and analyzed with computer software. The goal of this thesis was to develop an automatically curated and publicly accessible set of annotated full-length genes, representing a near-complete transcript set for Salmo salar. In turn, these genes provide the framework for studies in gene expression, conservation, and molecular evolution. The work presented here also touches on the results of a molecular evolution study, as an example of how full-length gene identification can be used to answer biological questions.
Previous to this study, a limited number of Atlantic salmon cDNA libraries and ESTs were known. To further the goal of determining complete gene sequences, highly enriched full-length cDNA libraries and full-length libraries were created and sequenced, resulting in the ability to identify a large number of full-length reference genes. Together, all libraries represent a diverse pool of transcriptome sequences for Salmo salar.
The goal of producing an accurate large-scale full-length gene set on a duplicated genome is not trivial. Complete systems for this objective do not readily exist. EST sequencing, EST assembly, and data storage, are just a few of the initial computational issues that are addressed. Once these issues are resolved, the multi-step workflow of full-length gene determination is described. The final challenge involving the development of a concise and universally accessible system for visualization is discussed. The resulting computational framework that has been developed is shown to be able to handle the intricacies and the size of a duplicated salmonid genome. It has been largely accepted that Atlantic salmon have undergone a recent genome duplication. Gene paralogs provide one source of evidence for this event. Analysis of paralogs revealed signatures of asymmetric evolution possibly due to relaxation of selective pressure.
This thesis provides a complete Bioinformatics analysis pipeline to analyze and to visualize a set of full-length reference genes for Atlantic salmon. Using full-length genes as a framework, the topic of molecular evolution was addressed to show evidence of asymmetrical evolution among gene duplicates. The full-length reference genes, along with ESTs and all putative transcripts, have been made publicly available. These results serve as a valuable genomic resource for next-generation sequencing and for all other salmonid research endeavours. / Graduate
|
2 |
Infidélité de transcription et carcinogénèse. Analyse bioinformatique et preuves de concept biologiques / Transcription infidelity and carcinogenesis. Bioinformatical analysis and biological proofs of principleBrulliard, Marie 09 July 2009 (has links)
L’un des enjeux de la lutte contre le cancer réside dans la compréhension de l’hétérogénéité de la maladie. Le but de notre travail a été d’explorer l’hétérogénéité des cellules cancéreuses du point de vue de la séquence d’ARN messager. Les ESTs (ou Expressed Sequence Tags) d’origine humaine ont été alignées aux séquences de référence ARNm. Les alignements ont été exploités de manière à mesurer les variations de séquence des ESTs issues de tissus tumoraux ou non tumoraux à chaque position de chaque transcrit. L’analyse statistique mise en place a consisté à identifier les positions pour lesquelles les variations de séquence, i.e. substitutions, insertions et délétions, sont différentes entre les ESTs d’origine tumorale et les ESTs d’origine non tumorale. L’étude bioinformatique s’est d’abord concentrée sur 17 transcrits abondamment exprimés avant d’être étendue à l’ensemble du transcriptome. Elle a ensuite été réalisée sur les ESTs murines. Les résultats montrent que l’hétérogénéité des transcrits cancéreux est plus grande que celle des tissus sains. Ainsi, l’infidélité de transcription est augmentée au cours de la carcinogénèse. Ce résultat bioinformatique a été validé par différentes approches biologiques. Tout d’abord, le clonage puis le séquençage d’un ARN provenant d’une tumeur pulmonaire humaine et présentant une délétion prédite de manière bioinformatique ont été réalisés, et ce, en l’absence de mutation somatique. Ensuite, l’identification par spectrométrie de masse d’un variant protéique issu de la traduction d’un ARN dont le codon stop est substitué en triplet codant a été possible. Enfin, l’intérêt de rechercher dans le sérum de patients cancéreux la présence d’anticorps dirigés contre des protéines issues de la traduction d’ARNm infidèles a été démontré. Ainsi, l’infidélité de transcription est un phénomène augmenté dans le cancer et responsable d’une partie de l’hétérogénéité des cellules cancéreuses. L’intérêt de cette découverte réside dans les perspectives nouvelles qu’elle offre en termes de compréhension des mécanismes de carcinogénèse et en termes de diagnostic de la maladie / One of the aim of the fight against cancer is to understand the heterogeneity of cancer cells. The goal of our work has been to explore cancer cell mRNA heterogeneity. ESTs (Expressed Sequence Tags) extracted from normal and cancer tissues have been aligned to mRNA reference sequences. This allowed identification of non-random sequence variations that occurred at statistically significant increased rates in cancer compared to normal libraries. This analysis first focused on 17 abundant transcripts and was next extended to whole human genome, as well as to that of Mus musculus. The results show an increase of transcription infidelity events in cancer tissues. Three types of events occur, i.e. base substitutions, deletions and insertions. Bioinformatics results have been validated through different biological methods. First, the cloning and sequencing of mRNA from lung cancer human with a deletion occurring at bioinformatically predicted position in absence of somatic mutation has been achieved. Then, mass spectrometry analysis confirmed the existence of protein variants resulting from translation of mRNA bypassing stop codon. Finally, we showed that transcription infidelity peptides contain specific epitopes of immunoglobulins ; detection of changes in immunoglobulins in patients with cancers opens a novel path toward early stage cancer diagnosis. This increased transcription infidelity in cancer contributes to the heterogeneity of cancer cells. This finding opens novel perspectives and strategies toward understanding carcinogenesis and diagnostic of the disease
|
3 |
Identification and characterization of a novel human liver-specific organic anion transporter (SLC22A7).January 2000 (has links)
Siu Shu Shun. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2000. / Includes bibliographical references (leaves 100-106). / Abstracts in English and Chinese. / Acknowledgements --- p.i / Contents --- p.ii / Abstract / 摘要 --- p.iv / Abbreviations --- p.vi / List of figures --- p.vii / List of tables --- p.x / Chapter Chapter 1: --- Introduction / Chapter 1.1 --- "Human EST sequencing project, the role and goal" --- p.1 / Chapter 1.2 --- Human liver cDNA sequencing --- p.2 / Chapter 1.3 --- The role of membrane-associated proteins in hepatocellular functions --- p.3 / Chapter 1.3.1 --- Outline of the liver function --- p.3 / Chapter 1.3.2 --- Basic structure of hepatocyte --- p.4 / Chapter 1.3.3 --- Category of membrane associated proteins --- p.5 / Chapter 1.4 --- Identification of human OAT2 gene --- p.7 / Chapter 1.5 --- The multispecific transporter family --- p.8 / Chapter 1.5.1 --- Classification --- p.8 / Chapter 1.5.2 --- The human OAT family --- p.9 / Chapter 1.6 --- The characteristics of rat multispecific OAT2 --- p.11 / Chapter 1.7 --- Clinical significance of organic anion transport proteins --- p.14 / Chapter Chapter 2: --- Materials and Methods / Chapter 2.1 --- Human liver EST sequencing project --- p.16 / Chapter 2.1.1 --- Plating out the adult human liver phage library --- p.16 / Chapter 2.1.2 --- PCR detection and amplification of the cDNA clone --- p.17 / Chapter 2.1.3 --- Automatic cDNA sequencing --- p.18 / Chapter 2.2 --- Cloning of hOAT2 gene into TA cloning vector pT-Adv --- p.19 / Chapter 2.2.1 --- Amplification of hOAT2 by PCR --- p.19 / Chapter 2.2.2 --- Ligation reaction --- p.19 / Chapter 2.2.3 --- Transformation of recombinant plasmid into competent cells --- p.20 / Chapter 2.3 --- Sequence analysis and structural prediction --- p.20 / Chapter 2.4 --- Cloning of the hOAT2 gene into the pQE30 expression vector --- p.21 / Chapter 2.4.1 --- PCR amplification and restriction endonuclease cutting --- p.21 / Chapter 2.4.2 --- Gene clean --- p.22 / Chapter 2.4.3 --- Preparation of bacterial competent cells --- p.23 / Chapter 2.5 --- Small scale synthesis of plasmid DNA --- p.24 / Chapter 2.6 --- Large scale synthesis of plasmid DNA --- p.25 / Chapter 2.7 --- Cloning of the hOAT2 gene into the pSecTag2B mammalian expression vector --- p.26 / Chapter 2.8 --- Cloning of the hOAT2 gene into the pEGFP-C2 fluorescent vector --- p.27 / Chapter 2.8.1 --- Tissue culture and transfection --- p.27 / Chapter 2.8.2 --- Fluorescence microscopy examination --- p.28 / Chapter 2.9 --- Chromosomal mapping of the hOAT2 gene --- p.29 / Chapter 2.9.1 --- Somatic cell hybrids mapping --- p.29 / Chapter 2.9.2 --- Radiation hybrids mapping --- p.29 / Chapter 2.10 --- Reverse Transcriptase Polymerase Chain Reaction (RT-PCR) --- p.30 / Chapter 2.11 --- Western hybridization --- p.32 / Chapter 2.11.1 --- Preparation of anti-hOAT2 antibodies --- p.32 / Chapter 2.11.1.1 --- Synthetic peptide conjugation --- p.32 / Chapter 2.11.1.2 --- Immunizing rabbit polyclonal antibodies for human OAT2 --- p.32 / Chapter 2.11.1.3 --- Purification of the rabbit polyclonal IgG antibodies --- p.33 / Chapter 2.11.2 --- Western blot analysis --- p.33 / Chapter 2.11.2.1 --- Protein isolation from rat liver --- p.33 / Chapter 2.11.2.2 --- Prote in preparation from cell lysate --- p.34 / Chapter 2.11.2.3 --- Quantitation of total proteins by Bradford protein assay --- p.35 / Chapter 2.11.2.4 --- Blotting and hybridization --- p.35 / Chapter Chapter 3: --- Results / Chapter 3.1 --- Catalogue of the 500 liver ESTs --- p.37 / Chapter 3.2 --- Nomenclature of human NLT gene --- p.47 / Chapter 3.3 --- Cloning and characterization of the hOAT2 sequence --- p.48 / Chapter 3.3.1 --- Isolation of hOAT2 cDNA from human liver cDNA library --- p.48 / Chapter 3.3.2 --- The primary and secondary structural analysis of hOAT2 --- p.53 / Chapter 3.3.3 --- Motif search and prediction --- p.61 / Chapter 3.3.4 --- Homology alignment --- p.64 / Chapter 3.4 --- Chromosomal mapping of hOAT2 gene --- p.67 / Chapter 3.4.1 --- Somatic cell hybrid mapping of hOA T2 gene --- p.67 / Chapter 3.4.2 --- Radiation hybrid mapping of hOA T2 gene --- p.69 / Chapter 3.4.3 --- Identification of partial human genomic sequence --- p.73 / Chapter 3.5 --- Detection of the hOAT2 gene expression in human tissues by RT- PCR assay --- p.76 / Chapter 3.6 --- Detection of subcellular localization of hOAT2 protein by conjugating fluorescence protein --- p.81 / Chapter 3.7 --- Immunodetection of protein extracts from cultured cells --- p.83 / Chapter Chapter 4: --- Discussion / Chapter 4.1 --- Characterization of the hepatocellular ESTs --- p.85 / Chapter 4.1.1 --- Classification and frequency distribution of the 500 ESTs --- p.85 / Chapter 4.1.2 --- The expression pattern of membrane associated proteins --- p.87 / Chapter 4.2 --- Tissue distribution and expression profiles of hOAT2 --- p.88 / Chapter 4.3 --- HOAT2 in fetal development --- p.89 / Chapter 4.4 --- Predicting the topology of membrane proteins --- p.90 / Chapter 4.5 --- Chromosomal mapping of human OAT2 --- p.91 / Chapter 4.6 --- Possible functions of hOAT2 --- p.93 / Chapter 4.6.1 --- Hepato-renal relation --- p.93 / Chapter 4.6.2 --- Substrate diversity --- p.95 / Chapter 4.7 --- Fluorescence detection for subcellular localization --- p.96 / Chapter 4.8 --- Conclusion --- p.97 / Chapter 4.9 --- Further aspects --- p.99 / References --- p.100 / Appendix --- p.107
|
4 |
Construction and analysis of high reproductive porcine oocyte cDNA librarySu, Yu-liang 27 July 2004 (has links)
The progress of studies on genes concerning the development and differentiation of early swine embryos have been delayed by limited paucity material. In order to identify the porcine ESTs associates with promoting its breeding efficiency, a cDNA library and ESTs database from oocytes of high reproductive swine is established. Oocytes were obtained from Duroc pig by superovulation which was performed by Taiwan Livestock Research Institute, Council of Agriculture. Total RNA was isolated from 50 mature oocytes, reverse transcription is then performed, followed by PCR based amplification of the cDNA. The amplified cDNA size ranges from 0.4 to 5 kb. The derived cDNA were ligated to a pCR2.1 vector, and the library has complexities of about 5.26¡Ñ104 independent clones. A total of 320 clones was picked and sequenced. By BLASTx analysis, among the 123 sequences, more than 43.07%¡]53/123¡^ mitochondrial proteins are found, 56.91¢H¡]70/123¡^ of the sequence were homologous to known transcripts from human, mouse, Drosophila. In nucleotide level analysis, 82.11¢H¡]101/123¡^ matched with the mitochondrial, ribosome genes and 17.89¢H¡]22/123¡^matched with other homologous genes by BLASTn. PCR analysis of the oocyte library for specific genes revealed transcripts for genes including homologous genes¡]2 pairs highly abundance and 2 pairs low abundance genes¡^, housekeeping genes¡]ACT£] and G3PDH¡^ and developmental genes¡]NEK2 and ZP1¡^. However, novel genes of swine are supposed to be the candidates for high productive phenotypes of swine. The library is a valuable resource for the isolation of clones representing genes active at the early stage. The ability to construct cDNA expression library from a few cells will allow gene expression analysis from oocyte biopsies and derived by nuclear transfer procedures.
|
5 |
Identification of genes encoding secreted proteins of schistosomesShah, Bindiya January 2000 (has links)
No description available.
|
6 |
EST expression and proteomic studies on mycelia of cordyceps militaris.January 2004 (has links)
Chan Ching-Man. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2004. / Includes bibliographical references (leaves 109-123). / Abstracts in English and Chinese. / Thesis committee --- p.i / Statement --- p.ii / Abstract --- p.iii / Acknowledgements --- p.vi / Abbreviations --- p.vii / Table of contents --- p.xi / List of figures --- p.xiv / List of tables --- p.xv / Chapter 1. --- Introduction --- p.1 / Chapter 2. --- Literature review --- p.3 / Chapter 2.1 --- History --- p.3 / Chapter 2.2 --- The living environment and life cycles of Cordyccps --- p.3 / Chapter 2.3 --- Chemical constituents of Cordyceps --- p.4 / Chapter 2.3.1 --- Determintaion of active ingredients in Cordyceps --- p.6 / Chapter 2.4 --- Therapeutic Functions --- p.8 / Chapter 2.4.1 --- Cardiovascular and circulatory functions --- p.8 / Chapter 2.4.1.1 --- Effects on Cholesterol and lipid metabolism --- p.8 / Chapter 2.4.1.2 --- Dilation of vasculature and cerebolature --- p.8 / Chapter 2.4.2 --- Respiratory functions --- p.9 / Chapter 2.4.3 --- Renai functions --- p.9 / Chapter 2.4.3.1 --- Effccts on chronic renal failure patients --- p.9 / Chapter 2.4.3.2 --- Protective effects on kidney toxicity --- p.9 / Chapter 2.4.4 --- Hepatic functions --- p.10 / Chapter 2.4.4.1 --- Effect on hepatitis B patients --- p.10 / Chapter 2.4.4.2 --- Energy state of liver --- p.10 / Chapter 2.4.5 --- Aging and senescence: Longevity enhancement --- p.11 / Chapter 2.4.5.1 --- Senescence --- p.11 / Chapter 2.4.5.2 --- Antioxidant effects --- p.11 / Chapter 2.4.6 --- Immune functions --- p.12 / Chapter 2.4.6.1 --- Enhancing immune system --- p.12 / Chapter 2.4.6.2 --- Anti-tumor effects --- p.12 / Chapter 2.4.7 --- Reproductive effects --- p.13 / Chapter 2.4.8 --- Hyperglycemic effects --- p.13 / Chapter 2.5 --- Cultivation --- p.15 / Chapter 2.5.1 --- Carbon and nitrogen sources --- p.15 / Chapter 2.5.2 --- Initial pH and temperature --- p.16 / Chapter 2.5.3 --- Bioelements --- p.16 / Chapter 2.5.4 --- Agitation intensity --- p.16 / Chapter 2.5.5 --- Aeration rate --- p.18 / Chapter 2.6 --- Fungal genetics --- p.19 / Chapter 2.6.1 --- EST approach --- p.19 / Chapter 2.7 --- Proteomic studies --- p.21 / Chapter 2.7.1 --- 2D gel electrophoresis --- p.21 / Chapter 2.7.2 --- Mass spectrometry --- p.22 / Chapter 2.7.3 --- Limitations and improvements --- p.22 / Chapter 2.7.4 --- Multiple spots for the same proteins --- p.24 / Chapter 2.7.5 --- Fungal proteomics --- p.24 / Chapter 2.7.5.1 --- Extraction method --- p.24 / Chapter 2.7.5.2 --- Combined uses of EST sequences and amino acid sequences --- p.26 / Chapter 2.7.5.3 --- Glycosylation --- p.26 / Chapter 3. --- Materials and methods --- p.28 / Chapter 3.1 --- Genomic Studies --- p.28 / Chapter 3.1.1 --- Strains and growth conditions --- p.28 / Chapter 3.1.2 --- Total RNA Extraction --- p.28 / Chapter 3.1.3 --- Isolation of mRNA --- p.30 / Chapter 3.1.4 --- cDNA Library Construction --- p.30 / Chapter 3.1.5 --- PCR Screening --- p.31 / Chapter 3.1.6 --- EST sequencing --- p.31 / Chapter 3.1.7 --- EST assembling and annotation --- p.32 / Chapter 3.2 --- Proteomic Studies --- p.33 / Chapter 3.2.1 --- Sample Preparation --- p.33 / Chapter 3.2.2 --- Quantitation --- p.34 / Chapter 3.2.3 --- 2-D PAGE --- p.34 / Chapter 3.2.4 --- In-gel digestion and peptide extraction --- p.36 / Chapter 3.2.5 --- MALDI-TOF MS Analysis --- p.37 / Chapter 3.3 --- Determination of adenosine using RP-HPLC --- p.38 / Chapter 4. --- Result --- p.39 / Chapter 4.1 --- Genomic studies --- p.39 / Chapter 4.1.1 --- cDNA library --- p.39 / Chapter 4.1.2 --- cDNA sequence analysis --- p.39 / Chapter 4.1.3 --- Functional annotation and analysis --- p.42 / Chapter 4.2 --- Proteomic --- p.67 / Chapter 4.2.1 --- 2D analysis and resolution --- p.67 / Chapter 4.2.2 --- Protein identification and annotation --- p.74 / Chapter 4.2.3 --- Image analysis --- p.81 / Chapter 4.3 --- Presence of adenosine (HPLC) --- p.84 / Chapter 5. --- Discussion and conclusion --- p.91 / Chapter 5.1 --- Genomic studies --- p.91 / Chapter 5.2 --- Presence of adenosine --- p.95 / Chapter 5.3 --- Proteomic --- p.96 / Chapter 5.3.1 --- Protein with increasing expression level --- p.97 / Chapter 5.3.1.1 --- MEI5 (Spot1084) --- p.97 / Chapter 5.3.1.2 --- Hsp70 and hsp60 (Spot 894 & 903) --- p.97 / Chapter 5.3.1.3 --- GRP 78 (Spot 1085) --- p.98 / Chapter 5.3.1.4 --- Ubiquitin (Spot1071) --- p.98 / Chapter 5.3.1.5 --- "Serine-tRNA ligase, glutaminyl-tRNA synthase (Spot 1037, 924)" --- p.99 / Chapter 5.3.1.6 --- 2-isopropylmalate synthase (Spot 862) --- p.99 / Chapter 5.3.1.7 --- Acyl-CoA oxidase 3 (Spot 882) --- p.100 / Chapter 5.3.1.8 --- ATP synthase beta chain (Spot 937) --- p.100 / Chapter 5.3.2 --- Proteins with decreasing expression --- p.101 / Chapter 5.3.2.1 --- 14-3-3 protein (spot 1080) --- p.101 / Chapter 5.3.2.2 --- Actin (Spot 945) --- p.101 / Chapter 5.3.2.3 --- GTP binding protein SPI1 (Spot1031) --- p.102 / Chapter 5.3.2.4 --- Hoclp (Spot 972) --- p.102 / Chapter 5.3.2.5 --- Rchl8p (Spot 983) --- p.103 / Chapter 5.3.2.6 --- Formaldehyde dehydrogenase (Spot 958) --- p.103 / Chapter 5.3.2.7 --- V-type ATPase (Spot 961) --- p.104 / Chapter 5.3.2.8 --- Glyceraldehyde 3-phosphate dehydrogenase (GAPDH) (Spot 987) --- p.104 / Chapter 5.3.2.9 --- Idp3p (Spot 929) --- p.105 / References
|
7 |
Computational Mining and Survey of Simple Sequence Repeats (SSRs) in Expressed Sequence Tags (ESTs) of Dicotyledonous PlantsKumpatla, Siva Prasad 07 1900 (has links)
Submitted to the faculty of the School of
Informatics in partial fulfillment of the requirements for the degree Master of Science in Bioinformatics in the School of Informatics,Indiana University July, 2004 / DNA markers have revolutionized the field of genetics by increasing the pace of genetic analysis. Simple sequence repeats (SSRs) are repetitions of nucleotide motifs of 1 to 5 bases and are currently the markers of choice in many plant and animal genomes due to their abundant distribution in the genomes, hypervariable nature and suitability for high-throughput analysis. While SSRs, once developed, are extremely valuable, their development is time consuming, laborious and expensive. Sequences from many genomes are continuously made freely available in the public databases and mining of these sources using computational approaches permits rapid and economical marker development. Expressed sequence tags (ESTs) are ideal candidates for mining SSRs not only because of their availability in large numbers but also due to the fact that they represent expressed genes. Large scale SSR mining efforts in plants to date focused on monocotyledonous plants. In this project, an efficient SSR identification tool was developed and used to mine SSRs from more than 53 dicotyledonous species. A total of 92,648 non-redundant ESTs or 6.0% of the 1.54 million dicotyledonous ESTs investigated in this study were found to contain SSRs. The frequency of non-redundant-ESTs containing SSRs among the species investigated ranged from 2.65% to 16.82%. More than 80% of the non-redundant ESTs having SSRs contained a single SSR repeat while others contained 2 or more SSRs. An extensive analysis of the occurrence and frequencies of various SSR types revealed that the A/T mononucleotide, AG/GA/CT/TC dinucleotide, AAG/AGA/GAA/CTT/TTC/TCT trinucleotide and TTTA and TTAA tetranucleotide repeats are the most abundant in dicotyledonous species. In addition, an analysis of the number of repeats across species revealed that majority of the
mononucleotide SSRs contained 15-25 repeats while majority of the di- and tri-nucleotide SSRs contained 5-10 repeats. By providing valuable information on the abundance of SSRs in ESTs of a large number of dicotyledonous species, this study demonstrates the potential of computational mining approach for rapid discovery of SSRs towards the development of markers for genetic analysis and related applications.
|
8 |
Uma abordagem para detecção e remoção de artefatos em sequencias ESTs / An approach to detect and remove artifacts in EST sequencesBaudet, Christian 12 January 2006 (has links)
Orientador: Zanoni Dias / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-08T07:27:54Z (GMT). No. of bitstreams: 1
Baudet_Christian_M.pdf: 13612079 bytes, checksum: 648d18039dc13dcd5a2f422cc7863666 (MD5)
Previous issue date: 2006 / Resumo: O sequenciamento de ESTs (Expressed Sequence Tag) [2] e uma tecnica que trabalha com bibliotecas de cDNAs tendo como objetivo a obtençao de uma boa aproximaçao para o ?ndice genico, que e a listagem de genes existentes no genoma do organismo estudado. Antes da serem analisadas, as sequencias obtidas do sequenciamento dos ESTs devem ser processadas para eliminaçao de artefatos. Artefatos sao trechos que nao pertencem ao organismo ou que possuem baixa qualidade ou baixa complexidade. Trechos de vetores, adaptadores e caudas poli-A podem ser citados como exemplos de artefatos. A eliminaçao dos artefatos deve ser feita para que a an'alise das sequencias produzidas no projeto nao seja prejudicada por estes ?ru?dos?. Por exemplo, artefatos presentes em sequencias freq¨uentemente produzem erros em processos de clusterizaçao, pois eles podem determinar se sequencias serao unidas em um mesmo cluster ou separadas em clusters diferentes. Observando a importancia da realizaçao de um bom processo de limpeza das sequencias, o trabalho desenvolvido nesta dissertaçao teve como principal objetivo a obtençao de um conjunto eficiente de procedimentos de detecçao e remoçao de artefatos. Este conjunto foi produzido a partir de uma nova estrategia de deteçao de artefatos. Normalmente, cada projeto de seq¨uenciamento possui seu proprio conjunto de procedimentos dividido em varias etapas. Estas etapas sao, em geral, ligadas entre si e o resultado de uma pode influenciar o resultado de outra. A nossa estrategia visa a realizaçao destas etapas de forma totalmente independente. Alem da avaliaçao desta nova estrategia, o trabalho tambem realizou um estudo mais detalhado sobre dois tipos de artefatos: baixa qualidade e derrapagem. Para cada um deles, algoritmos foram propostos e validados atraves de testes com conjuntos de seq¨u?encias produzidas em projetos reais de sequenciamento. O conjunto final de procedimentos, baseado nos estudos desenvolvidos durante a escrita deste texto, foi testado com as sequencias do projeto SUCEST [100, 103, 113] e mostrou bons resultados. O clustering produzido com as sequencias processadas por nossos metodos apresentou melhores consistencia interna e externa e menores taxas de redundancia quando comparado ao clustering original do projeto / Abstract: Expressed Sequence Tag (EST) Sequencing [2] is one technique that works with cDNA libraries. It aims to achieve a good approximation for the gene index of an organism. Before analyzing the sequences obtained by sequencing ESTs, they must be processed for artifact removal. An artifact is a sequence that does not belong to the studied organism or that has low quality or low complexity. As example of artifacts, we have adapters, poly- A tails, vectors, etc. Artifacts removal must be performed because their presence can produce ?noises? in the sequencing project data analysis. For example, artifact can join two sequences in a same cluster inappropriately or separate them in two different clusters when they should be put together. Motivated by the sequence cleaning process importance, our main objective in this work was to develop an efficient set of procedures to detect and to remove sequence artifacts. Usually, each EST sequencing project has its own procedure set divided in many steps. These steps are, in general, linked and the result of one given step might influence the result of the next one. Our strategy was to perform each step independently assuring that any execution order of those steps would lead to the same result. Additionally to the new strategy evaluation, this work also studied detailedly two type of artifacts: low quality and slippage. For each one, algorithms were proposed and validated through tests with sequences of real sequencing projects. The final set of procedure, developed in this work, was evaluated using the sequences of the SUCEST project [100, 103, 113] and produced good results. The resulting clustering from our method has better external and internal consistency and lower redundacy rate than those produced by the SUCEST project clustering / Mestrado / Ciência da Computação / Mestre em Ciência da Computação
|
9 |
HYMENOPTERAN MOLECULAR PHYLOGENETICS: FROM APOCRITA TO BRACONIDAE (ICHNEUMONOIDEA)Sharanowski, Barbara J. 01 January 2009 (has links)
Two separate phylogenetic studies were performed for two different taxonomic levels within Hymenoptera. The first study examined the utility of expressed sequence tags for resolving relationships among hymenopteran superfamilies. Transcripts were assembled from 14,000 sequenced clones for 6 disparate Hymenopteran taxa, averaging over 660 unique contigs per species. Orthology and gene determination were performed using modifications to a previously developed computerized pipeline and compared against annotated insect genomes. Sequences from additional taxa were added from public databases with a final dataset of 24 genes for 16 taxa.
The concatenated dataset recovered a robust and well-supported topology; however, there was extreme incongruity among individual gene trees. Analyses of sequences indicated strong compositional and transition biases, particularly in the third codon positions. The use of filtered supernetworks aided visualization of the existing congruent phylogenetic signal that existed across the individual gene trees. Additionally, treeness triangle plots indicated a strong residual signal in several gene trees and across codon positions in the concatenated dataset. However, most analyses of the concatenated dataset recovered expected relationships, known from other independent analyses. Thus, ESTs provide a powerful source of information for phylogenetic analysis, but results are sensitive to low taxonomic sampling and missing data.
The second study examined subfamilial relationships within the parasitoid family Braconidae, using over 4kb of sequence data for 139 taxa. Bayesian inference of the concatenated dataset recovered a robust phylogeny, particularly for early divergences within the family. There was strong evidence supporting two independent lineages within the family: one leading to the noncyclostomes and one leading to the cyclostomes. Ancestral state reconstructions were performed to test the theory of ectoparasitism as the ancestral condition for all taxa within the family. Results indicated an endoparasitic ancestor for the family and for the non-cyclostome lineage, with an early transition to ectoparasitism for the cyclostome lineage. However, reconstructions of some nodes were sensitive to outgroup coding and will also be impacted with increased biological knowledge.
|
10 |
Developing and using expressed sequence tags to study the predatory mite Phytoseiulus persimilis Athias-Henriot (Paraistiformes, Mesostigmata, Phytoseiidae)Weng Huang, Ju Lin January 1900 (has links)
Doctor of Philosophy / Department of Entomology / David C. Margolies / Yoonseong Park / The predatory mite Phytoseiulus persimilis (Acari, Phytoseiidae) is one of the most frequently released natural enemies for biological control of spider mites in greenhouse and outdoors crops. In this research, I utilized Expresses Sequence Tags (ESTs), the most cost effective approach for transcriptome exploration, to study three different aspects of this arachnid species for which there is little genomic information. I combined two EST datasets from different whole body cDNA libraries and analyzed by bioinformatics means. Approximately 54% of 10,256 uniESTs were annotated based on the homology to sequences in the National Center for Biotechnological Information (NCBI) database. A list of these uniESTs, sorted from most to least likelihood based on the expected value from the blast search in public databases, was used to create tools for each of the three studies. First, I described sixty-one genes encoding products known to be important in pesticide metabolism and in endocrinology, including cytochrome P450s, glutathione-S-transferases, acetylcholinesterase homologs, neuropetides and neurohormones. Findings on arachnid specific esterases and neuropetides, and possible benefits to pest management programs, were discussed. Next, I inferred divergence time for Acari and the point of divergence of two lineages within anactinotrichid mites, Ixodes scapularis and Phytoseiulus persimilis. I used expresses sequence tags from the predatory mite P. persimilis to pull out 74 orthologous amino acid sequences of invertebrates species: nine insect species, Daphnia pulex, Ixodes scapularis, and Caenorhabditis elegans. I estimated a similar origin for Chelicerata (578.1 ± 38.2 - 482.2 ± 7.2 Mya) as in other recent studies. However, divergence dating using amino acid sequences suggested a Devonian origin of anactinotrichid mites (487.6 ± 32.2 - 410.1 ± 6.1 Mya) based on four reference dates (two fossil records and two molecular clocks) and four amino acid substitution methods; this estimate is much earlier that those in the current literature. This discrepancy of divergence times may be due to the use of a global clock. Finally, I developed molecular markers from the EST dataset to examine inheritance in the haplodiploid system in P. persimilis. Biparental contribution of chromosomes is required among the predatory mites but the paternal chromosome set seems to be eliminated or loss (Paternal genome loss, PGL) in male offspring. However, genetic studies in other two phytoseiid species were suggested diploid males with PGL only in the germ cells. In the present study, haploid adult males of P. persimilis have been observed using five independent EST-derived markers. Single
mites derived from inter-population crosses were genotyped after whole genome amplification. The parahaploid genetic system in P. persimilis is supported by this study, in which both sexes arise from fertilized eggs but the paternal chromosome set is subsequently lost in males.
|
Page generated in 0.075 seconds