Global ETD Search

11	Two dimensional cellular automata and pseudorandom sequence generation Sh, Umer Khayyam 13 November 2019 (has links) Maximum linear feedback shift registers (LFSRs) based on primitive polynomials are commonly used to generate maximum length sequences (m-sequences). An m-sequence is a pseudorandom sequence that exhibits ideal randomness properties like balance, run and autocorrelation but has low linear complexity. One-dimensional Cellular Automata (1D CA) have been used to generate m-sequences and pseudorandom sequences that have high linear complexity and good randomness. This thesis considers the use of two-dimensional Cellular Automata (2D CA) to generate m-sequences and psuedorandom sequences that have high linear complexity and good randomness. The properties of these sequences are compared with those of the corresponding m-sequences and the best sequences generated by 1D CAs. / Graduate cellular automata 2D CA pseudorandom sequence random sequence m-sequence
12	On multiple sequence alignment Wang, Shu, January 1900 (has links) Thesis (Ph. D.)--University of Texas at Austin, 2007. / Vita. Includes bibliographical references.
13	Novel RNA and protein sequences involved in dimerization and packaging of HIV-1 genomic RNA Russell, Rodney S. January 2004 (has links) No description available. Sequence Analysis, Protein. Base Sequence. HIV-1.
14	Contrasting sequence groups by emerging sequences Deng, Kang 11 1900 (has links) Group comparison per se is a fundamental task in many scientific endeavours but is also the basis of any classifier. Comparing groups of sequence data is a relevant task. To contrast sequence groups, we define Emerging Sequences (ESs) as subsequences that are frequent in sequences of one group and less frequent in another, and thus distinguishing sequences of different classes. There are two challenges to distinguish sequence classes by ESs: the extraction of ESs is not trivially efficient and only exact matches of sequences are considered. In our work we address those problems by a suffix tree-based framework and a sliding window matching mechanism. A classification model based on ESs is also proposed. Evaluating against several other learning algorithms, the experiments on two datasets show that our similar ESs-based classification model outperforms the baseline approaches. With the ESs' high discriminative power, our proposed model achieves satisfactory F-measures on classifying sequences. Emerging Sequences Sequence Classification Sequence Similarity
15	Iterative de Bruijn graph assemblers for second-generation sequencing reads Peng, Yu, 彭煜 January 2012 (has links) The recent advance of second-generation sequencing technologies has made it possible to generate a vast amount of short read sequences from a DNA (cDNA) sample. Current short read assemblers make use of the de Bruijn graph, in which each vertex is a k-mer and each edge connecting vertex u and vertex v represents u and v appearing in a read consecutively, to produce contigs. There are three major problems for de Bruijn graph assemblers: (1) branch problem, due to errors and repeats; (2) gap problem, due to low or uneven sequencing depth; and (3) error problem, due to sequencing errors. A proper choice of k value is a crucial tradeoff in de Bruijn graph assemblers: a low k value leads to fewer gaps but more branches; a high k value leads to fewer branches but more gaps. In this thesis, I first analyze the fundamental genome assembly problem and then propose an iterative de Bruijn graph assembler (IDBA), which iterates from low to high k values, to construct a de Bruijn graph with fewer branches and fewer gaps than any other de Bruijn graph assembler using a fixed k value. Then, the second-generation sequencing data from metagenomic, single-cell and transcriptome samples is investigated. IDBA is then tailored with special treatments to handle the specific issues for each kind of data. For metagenomic sequencing data, a graph partition algorithm is proposed to separate de Bruijn graph into dense components, which represent similar regions in subspecies from the same species, and multiple sequence alignment is used to produce consensus of each component. For sequencing data with highly uneven depth such as single-cell and metagenomic sequencing data, a method called local assembly is designed to reconstruct missing k-mers in low-depth regions. Then, based on the observation that short and relatively low-depth contigs are more likely erroneous, progressive depth on contigs is used to remove errors in both low-depth and high-depth regions iteratively. For transcriptome sequencing data, a variant of the progressive depth method is adopted to decompose the de Bruijn graph into components corresponding to transcripts from the same gene, and then the transcripts are found in each component by considering the reads and paired-end reads support. Plenty of experiments on both simulated and real data show that IDBA assemblers outperform the existing assemblers by constructing longer contigs with higher completeness and similar or better accuracy. The running time of IDBA assemblers is comparable to existing algorithms, while the memory cost is usually less than the others. / published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy Nucleotide sequence - Data processing. Sequence alignment (Bioinformatics)
16	CSA-X: Modularized Constrained Multiple Sequence Alignment 2015 October 1900 (has links) Imposing additional constraints on multiple sequence alignment (MSA) algorithms can often produce more biologically meaningful alignments. Hence, various constrained multiple sequence alignment (CMSA) algorithms have been developed in the literature, where researchers used anchor points, regular expressions, or context-free-grammars to specify the constraints, wherein alignments produced are forced to align around segments that match the constraints. In this thesis, we propose CSA-X, a modularized program of constrained multiple sequence alignment that accepts constraints in the form of regular expressions. It uses an arbitrary underlying multiple sequence alignment program to generate alignments, and is therefore modular. The name CSA-X refers to our proposed program generally, where the letter X is substituted with the name of a (non-constrained) multiple sequence alignment algorithm which is used as underlying MSA engine in the proposed program. We compare the accuracy of our program with another constrained multiple sequence alignment program called RE-MuSiC that similarly uses regular expressions for constraints. In addition, comparisons are also made to the underlying MSA programs (without constraints). The BAliBASE 3.0 benchmark database is used to assess the performance of the proposed program CSA-X, other MSA programs, and CMSA programs considered in this study. Based on the results presented herein, CSA-X outperforms RE-MuSiC, and scores well against the underlying alignment programs. It also shows that the use of regular expression constraints, if chosen well, created from the least conserved region of the correct alignments, improves the alignment accuracy. In this study, ProbCons and T-Coffee are used as the underlying MSA programs in CSA-X, and the accuracy of the alignments are measured in terms of Q score and TC score. On average, CSA-X used with constraints identified from the least conserved regions of the correct alignments achieves results that are 17.65% more for Q score, and 23.7% more for TC score compared to RE-MuSiC. In fact, CSA-X with ProbCons (CSA-PC) achieves a higher score in over 97.9% of the cases for Q score, and over 96.4% of the cases for TC score. In addition, CSA-X with T-Coffee (CSA-TCOF) achieves a higher score in over 97.7% of the cases for Q score, and over 94.8% of the cases for TC score. Furthermore, CSA-X with regular expressions created from the least conserved regions of the correct alignments achieves higher accuracy scores compared to standalone ProbCons and T-Coffee. To measure the statistical significance of CSA-X results, the Wilcoxon rank-sum test and Wilcoxon signed-rank test are performed, and these tests show that CSA-X results for the least conserved regular expression constraint sets from the correct BAliBASE 3.0 alignments are significantly different than those from RE-MuSiC, ProbCons, and T-Coffee.
17	Contrasting sequence groups by emerging sequences Deng, Kang Unknown Date No description available. Emerging Sequences Sequence Classification Sequence Similarity
18	Novel RNA and protein sequences involved in dimerization and packaging of HIV-1 genomic RNA Russell, Rodney S. January 2004 (has links) During HIV-1 assembly, the Gag structural protein specifically encapsidates two copies of viral genomic RNA in the form of a dimer. An RNA stem-loop structure (SL1) in the 5' untranslated region, known as the dimerization initiation site (DIS), is important for dimerization and packaging of HIV-1 genomic RNA; however, the mechanisms involved are not fully understood. The major goal of this PhD study was to further understand HIV-1 RNA dimerization, and to study the role of the Gag protein in the dimerization and packaging processes. Despite the known involvement of the DIS in RNA dimerization, DIS-mutated viruses still contain significant levels of dimerized RNA, and electron microscopy studies suggest that the RNA molecules are linked at the extreme 5' end. We show here that RNA sequences on both sides of the DIS are also required for HIVA genome dimerization, suggesting that multiple RNA elements are involved. We have also examined the contribution of specific amino acids within Gag to the dimerization and packaging processes. Previous work showed that partial deletion of the DIS impacted on viral replication capacity, but could largely be corrected by compensatory point mutations within Gag. To further elucidate the mechanism(s) of these compensatory mutations, we generated DIS mutants lacking the entire SL1, or only the SL1 loop sequences, and combined these deletions with various combinations of compensatory mutations. Analysis of virion-derived RNA showed that the relevant mutant viruses contained increased levels of spliced viral RNA compared to wild type, indicating that a defect in genome packaging specificity was present. However, this defect was corrected by our compensatory mutations, and a T121 substitution in p2 was shown to be solely responsible for this activity. These results suggest that the p2 spacer peptide plays a critical role in the specific packaging of viral genomic RNA. In summary, these findings provide new insig Base Sequence. Sequence Analysis, Protein. HIV-1.
19	Combinatorial optimization and application to DNA sequence analysis Gupta, Kapil. January 2008 (has links) Thesis (Ph.D)--Industrial and Systems Engineering, Georgia Institute of Technology, 2009. / Committee Chair: Lee, Eva K.; Committee Member: Barnes, Earl; Committee Member: Fan, Yuhong; Committee Member: Johnson, Ellis; Committee Member: Yuan, Ming. Part of the SMARTech Electronic Thesis and Dissertation Collection.
20	Computational analysis of protein identification using peptide mass fingerprinting approach / Ganapathy, Ashwin, January 2004 (has links) Thesis (M.S.)--University of Missouri-Columbia, 2004. / Typescript. Vita. Includes bibliographical references (leaves 63-65). Also available on the Internet.

Search results