Global ETD Search

11	Measuring deviation from a deeply conserved consensus in protein multiple sequence alignments Mokin, Sergey. January 1900 (has links) Thesis (M.Sc.). / Written for the Dept. of Biology. Title from title page of PDF (viewed 2008/12/07). Includes bibliographical references.
12	Protein sequence constraints Lavelle, Daniel Thor. January 2009 (has links) Thesis (Ph. D.)--University of Virginia, 2009. / Title from title page. Includes bibliographical references. Also available online through Digital Dissertations.
13	Generalized pattern matching applied to genetic analysis. / 通用性模式匹配在基因序列分析中的應用 / CUHK electronic theses & dissertations collection / Digital dissertation consortium / Tong yong xing mo shi pi pei zai ji yin xu lie fen xi zhong de ying yong January 2011 (has links) Approximate pattern matching problem is, given a reference sequence T, a pattern (query) Q, and a maximum allowed error e, to find all the substrings in the reference, such that the edit distance between the substrings and the pattern is smaller than or equal to the maximum allowed error. Though it is a well-studied problem in Computer Science, it gains a resurrection in Bioinformatics in recent years, largely due to the emergence of the next-generation high-throughput sequencing technologies. This thesis contributes in a novel generalized pattern matching framework, and applies it to solve pattern matching problems in general and alternative splicing detection (AS) in particular. AS is to map a large amount of next-generation sequencing short reads data to a reference human genome, which is the first and an important step in analyzing the sequenced data for further Biological analysis. The four parts of my research are as follows. / In the first part of my research work, we propose a novel deterministic pattern matching algorithm which applies Agrep, a well-known bit-parallel matching algorithm, to a truncated suffix array. Due to the linear cost of Agrep, the cost of our approach is linear to the number of characters processed in the truncated suffix array. We analyze the matching cost theoretically, and .obtain empirical costs from experiments. We carry out experiments using both synthetic and real DNA sequence data (queries) and search them in Chromosome-X of a reference human genome. The experimental results show that our approach achieves a speed-up of several magnitudes over standard Agrep algorithm. / In the fourth part, we focus on the seeding strategies for alternative splicing detection. We review the history of seeding-and-extending (SAE), and assess both theoretically and empirically the seeding strategies adopted in existing splicing detection tools, including Bowtie's heuristic and ABMapper's exact seedings, against the novel complementary quad-seeding strategy we proposed and the corresponding novel splice detection tool called CS4splice, which can handle inexact seeding (with errors) and all 3 types of errors including mismatch (substitution), insertion, and deletion. We carry out experiments using short reads (queries) of length 105bp comprised of several data sets consisting of various levels of errors, and align them back to a reference human genome (hg18). On average, CS4splice can align 88. 44% (recall rate) of 427,786 short reads perfectly back to the reference; while the other existing tools achieve much smaller recall rates: SpliceMap 48.72%, MapSplice 58.41%, and ABMapper 51.39%. The accuracies of CS4splice are also the highest or very close to the highest in all the experiments carried out. But due to the complementary quad-seeding that CS4splice use, it takes more computational resources, about twice (or more) of the other alternative splicing detection tools, which we think is practicable and worthy. / In the second part, we define a novel generalized pattern (query) and a framework of generalized pattern matching, for which we propose a heuristic matching algorithm. Simply speaking, a generalized pattern is Q 1G1Q2 ... Qc--1Gc--1 Qc, which consists of several substrings Q i and gaps Gi occurring in-between two substrings. The prototypes of the generalized pattern come from several real Biological problems that can all be modeled as generalized pattern matching problems. Based on a well-known seeding-and-extending heuristic, we propose a dual-seeding strategy, with which we solve the matching problem effectively and efficiently. We also develop a specialized matching tool called Gpattern-match. We carry out experiments using 10,000 generalized patterns and search them in a reference human genome (hg18). Over 98.74% of them can be recovered from the reference. It takes 1--2 seconds on average to recover a pattern, and memory peak goes to a little bit more than 1G. / In the third part, a natural extension of the second part, we model a real biological problem, alternative splicing detection, into a generalized pattern matching problem, and solve it using a proposed bi-directional seeding-and-extending algorithm. Different from all the other tools which depend on third-party tools, our mapping tool, ABMapper, is not only stand-alone but performs unbiased alignments. We carry out experiments using 427,786 real next-generation sequencing short reads data (queries) and align them back to a reference human genome (hg18). ABMapper achieves 98.92% accuracy and 98.17% recall rate, and is much better than the other state-of-the-art tools: SpliceMap achieves 94.28% accuracy and 78.13% recall rate;while TopHat 88.99% accuracy and 76.33% recall rate. When the seed length is set to 12 in ABMapper, the whole searching and alignment process takes about 20 minutes, and memory peak goes to a little bit more than 2G. / Ni, Bing. / Adviser: Kwong-Sak Leung. / Source: Dissertation Abstracts International, Volume: 73-06, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2011. / Includes bibliographical referencesTexture mapping (leaves 151-161). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. Combinatorial analysis Computational biology Computer algorithms DNA--Analysis--Data processing Genetics--Methodology Matching theory Proteins--Analysis--Data processing Computational Biology--methods Sequence Analysis, DNA Sequence Analysis, Protein
14	Graphical representation of biological sequences and its applications. / CUHK electronic theses & dissertations collection / Digital dissertation consortium January 2010 (has links) Among all existing alignment-free methods for comparing biological sequences, the sequence graphical representation provides a simple approach to view, sort, and compare gene structures. The aim of graphical representation is to display DNA or protein sequences graphically so that we can easily find out visually how similar or how different they are. Of course, only the visual comparison of sequences is not enough for the follow-up research work. We need more accurate comparison. This leads us to develop the application of the graphical representation for biological sequences. / In this thesis, we have two main contributions: (1) We construct a protein map with the help of our proposed new graphical representation for protein sequences. Each protein sequence can be represented as a point in this map, and cluster analysis of proteins can be performed for comparison between the points. This protein map can be used to mathematically specify the similarity of two proteins and predict properties of an unknown protein based on its amino acid sequence. (2) We construct a novel genome space with biological geometry, which is a subspace in RN . In this space each point corresponds to a genome. The natural distance between two points in the genome space reflects the biological distance between these two genomes. Our genome space will provide a new powerful tool for analyzing the classification of genomes and their phylogenetic relationships. / Yu, Chenglong. / Adviser: Luk Hing Sun. / Source: Dissertation Abstracts International, Volume: 72-04, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2010. / Includes bibliographical references (leaves 59-64). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. Amino acid sequence--Mathematical models Computational biology Nucleotide sequence--Mathematical models Base Sequence Computational Biology Mathematics Sequence Alignment Sequence Analysis, DNA Sequence Analysis, Protein
15	Analyses of All Possible Point Mutations within a Protein Reveals Relationships between Function and Experimental Fitness: A Dissertation Roscoe, Benjamin P. 25 March 2014 (has links) The primary amino acid sequence of a protein governs its specific cellular functions. Since the cracking of the genetic code in the late 1950’s, it has been possible to predict the amino acid sequence of a given protein from the DNA sequence of a gene. Nevertheless, the ability to predict a protein’s function from its primary sequence remains a great challenge in biology. In order to address this problem, we combined recent advances in next generation sequencing technologies with systematic mutagenesis strategies to assess the function of thousands of protein variants in a single experiment. Using this strategy, my dissertation describes the effects of most possible single point mutants in the multifunctional Ubiquitin protein in yeast. The effects of these mutants on the essential activation of ubiquitin by the ubiquitin activating protein (E1, Uba1p) as well as their effects on overall yeast growth were measured. Ubiquitin mutants defective for E1 activation were found to correlate with growth defects, although in a non-linear fashion. Further examination of select point mutants indicated that E1 activation deficiencies predict downstream defects in Ubiquitin function, resulting in the observed growth phenotypes. These results indicate that there may be selective pressure for the activity of the E1enzyme to selectively activate ubiquitin protein variants that do not result in functional downstream defects. Additionally, I will describe the use of similar techniques to discover drug resistant mutants of the oncogenic protein BRAFV600E in human melanoma cell lines as an example of the widespread applicability of our strategy for addressing the relationship between protein function and biological fitness. Amino Acid Sequence Mutagenesis Point Mutation Protein Sequence Analysis Ubiquitin Dissertations, UMMS Sequence Analysis, Protein Biochemistry Cellular and Molecular Physiology Molecular Biology Molecular Genetics
16	Identification, Characterization and Evolution of Membrane-bound Proteins / Höglund, Pär J., January 2008 (has links) Diss. (sammanfattning) Uppsala : Uppsala universitet, 2008. / Härtill 6 uppsatser.

Page generated in 0.0852 seconds