Spelling suggestions: "subject:"dequence alignment"" "subject:"dequence lignment""
91 |
Origem de genes recentes, uma abordagem com PSSMs deterioradas e arquiteturas de domínio proteico / Origin of recent genes, an approach with deteriorated PSSMs and protein domain architecturesDiego Trindade de Souza 06 October 2016 (has links)
A origem dos novos genes é um processo importante para a evolução dos organismos, pois ela fornece fontes singulares para a inovação evolutiva. As abordagens que mostram como esses novos genes surgem e adquirem novas funções no curso da evolução são de fundamental importância, por exemplo, elas podem ajudar a correlacionar mutações com alterações metabólicas, fisiológicas e/ou morfológicas, indicando quais mutações podem ser importantes funcionalmente. Recentemente, uma nova abordagem, nomeada de filoestratigrafia, foi desenvolvida para estabelecer origem evolutiva dos genes. Neste método a emergência de novas sequências de um nó filogenético particular em uma linhagem ancestral-descente é inferida geralmente utilizando algoritmos de similaridade. No presente trabalho, nós fizemos uma pesquisa filoestratigráfica de dois bancos de dados de domínios proteicos, CATH e Pfam, para todas as entradas humanas descrevemos a origem dos domínios e arquiteturas humanas. Também realizamos uma nova abordagem para refinar os resultados por Male-PSI-BLAST, em um estudo de caso dos domínios príons e ADHs. A análise das duas bases de dados mostrou que existiram três períodos importantes de aparecimento de domínios proteicos humanos: a origem do organismo celular, Eucarioto e Euteleostomi, nos quais há um elevado número de surgimento de novos genes na linhagem ancestral-descente de humanos. Quando analisamos o aparecimento de arquiteturas, elas são evidentemente mais recentes que o aparecimento de domínios, embora, em seu conteúdo, possa haver domínios muito antigos misturados com domínios novos. Não notamos nenhuma tendência de acréscimo de novos domínios para arquiteturas que compreendem domínios antigos ou recentes. Para medir o grau de versatilidade de domínio, nós utilizamos a frequência ponderada de bigrama, uma combinação específica de dois domínios adjacentes. O teste de correlação de Spearman mostrou que existe uma baixa correlação negativa entre a idade de domínios e índices de versatilidade. Em um estudo de caso, demonstramos que é possível caracterizar descontinuidades evolutivas nos resultados de Male-PSI-BLAST entre domínios que surgiram a partir de outros. Pela primeira vez, a origem de todos os domínios e arquiteturas proteicas presentes nas bases de dados estudadas foi descrita, fornecendo um cenário evolutivo que pode ser mais explorado a partir das abordagens aqui desenvolvidas. / The origin of new genes is an important process for the evolution of organisms because they provide singular sources for evolutionary innovation. The approaches that show how these new genes arise and acquire new functions in the course of evolution are of fundamental importance: they can help to correlate mutations with metabolic, physiological, and morphological changes, indicating which mutations are likely to be functionally important. Recently, a new approach, named phylostratigraphy, was developed to establish the evolutionary origin of the genes. In this method the emergence of novel sequences at a particular phylogenetic node in a descendent-ancestor lineage is infer usually by using the similarity search algorithm. In the present work, we did a phylostratigraphical search of two protein domain databases, CATH and Pfam, for all human entries and depicted the origin of human domains and architectures. We also conducted a new approach to refine results by Male-PSI-BLAST in a case study of prions and ADH\'s domains. The analysis of two databases showed that there are three important periods of appearance of human gene domains: the origin of cellular organism, Eukaryote, and Euteleostomi appear to be important for production of new genes at the ancestor-descendent lineages that lead to the human species. However, when we analyze the appearance of architectures, they are by far more recent than the appearance of domains, although they might contain very ancient domains mixed with recent ones. We did not notice a bias of addition of new domains to architectures comprising either ancient or recent domains. To measure the degree of domain versatility, we used the weighted bigram frequency, where bigram is defined as a specific combination of two adjacent domains. The Spearman correlation test showed that there is a low negative correlation between the age of domains and versatility indexes. In the study of case, we have demonstrated that it is possible to characterize evolutionary ruptures in results of Male-PSI- BLAST between domains that emerged from others. For the first time the origin of all protein domains and architectures was depicted, providing an evolutionary scenario that can be further explored.
|
92 |
KnotAli: informed energy minimization through the use of evolutionary informationGray, Mateo 31 August 2021 (has links)
Motivation:
Improving the prediction of structures, especially those containing pseudoknots (structures with crossing base pairs) is an ongoing challenge. Current alignment-based prediction algorithms only find the consensus structure, and their alignments can come from structure-based alignment algorithms, which is more reliable, but come with an increased cost compared to sequence-based alignment algorithms.
This step can be removed; however, non-alignment based algorithms neglect structural information that can be found within similar sequences.
Results:
We present a new method for prediction of RNA pseudoknotted secondary structures that combines the strengths of MFE prediction and alignment-based methods. KnotAli takes an RNA sequence alignment and uses covariation and thermodynamic energy minimization to predict secondary structures for each individual sequence in the alignment. We compared KnotAli's performance to that of three other alignment-based algorithms, on a large data set of 10 families with pseudoknotted and pseudoknot-free reference structures. We produced sequence alignments for each family using two well-known sequence aligners (MUSCLE and MAFFT).
We found KnotAli to be superior in 6 of the 10 families for MUSCLE and 7 of the 10 for MAFFT. We find KnotAli's predictions to be less dependent on alignment quality. In particular, KnotAli is shown to have more accurate predictions compared to other leading methods as alignment quality deteriorates.
Availability:
The algorithm can be found online on Github at https://github.com/mateog4712/KnotAli / Graduate / 2022-08-16
|
93 |
WRINKLED1, A Ubiquitous Regulator in Oil Accumulating Tissues from Arabidopsis Embryos to Oil Palm MesocarpMa, Wei, Kong, Que, Arondel, Vincent, Kilaru, Aruna, Bates, Philip D., Thrower, Nicholas A., Benning, Christoph, Ohlrogge, John B. 26 July 2013 (has links)
WRINKLED1 (AtWRI1) is a key transcription factor in the regulation of plant oil synthesis in seed and non-seed tissues. The structural features of WRI1 important for its function are not well understood. Comparison of WRI1 orthologs across many diverse plant species revealed a conserved 9 bp exon encoding the amino acids “VYL”. Site-directed mutagenesis of amino acids within the ‘VYL’ exon of AtWRI1 failed to restore the full oil content of wri1-1 seeds, providing direct evidence for an essential role of this small exon in AtWRI1 function. Arabidopsis WRI1 is predicted to have three alternative splice forms. To understand expression of these splice forms we performed RNASeq of Arabidopsis developing seeds and queried other EST and RNASeq databases from several tissues and plant species. In all cases, only one splice form was detected and VYL was observed in transcripts of all WRI1 orthologs investigated. We also characterized a phylogenetically distant WRI1 ortholog (EgWRI1) as an example of a non-seed isoform that is highly expressed in the mesocarp tissue of oil palm. The C-terminal region of EgWRI1 is over 90 amino acids shorter than AtWRI1 and has surprisingly low sequence conservation. Nevertheless, the EgWRI1 protein can restore multiple phenotypes of the Arabidopsis wri1-1 loss-of-function mutant, including reduced seed oil, the “wrinkled” seed coat, reduced seed germination, and impaired seedling establishment. Taken together, this study provides an example of combining phylogenetic analysis with mutagenesis, deep-sequencing technology and computational analysis to examine key elements of the structure and function of the WRI1 plant transcription factor.
|
94 |
Bit-parallel and SIMD alignment algorithms for biological sequence analysisLoving, Joshua 21 November 2017 (has links)
High-throughput next-generation sequencing techniques have hugely decreased the cost and increased the speed of sequencing, resulting in an explosion of sequencing data. This motivates the development of high-efficiency sequence alignment algorithms. In this thesis, I present multiple bit-parallel and Single Instruction Multiple Data (SIMD) algorithms that greatly accelerate the processing of biological sequences.
The first chapter describes the BitPAl bit-parallel algorithms for global alignment with general integer scoring, which assigns integer weights for match, mismatch, and insertion/deletion. The bit-parallel approach represents individual cells in an alignment scoring matrix as bits in computer words and emulates the calculation of scores by a series of logic operations. Bit-parallelism has previously been applied to other pattern matching problems, producing fast algorithms. In timed tests, we show that BitPAl runs 7 - 25 times faster than a standard iterative algorithm.
The second part involves two approaches to alignment with substitution scoring, which assigns a potentially different substitution weight to every pair of alphabet characters, better representing the relative rates of different mutations. The first approach extends the existing BitPAl method. The second approach is a new SIMD algorithm that uses partial sums of adjacent score differences. I present a simple partial sum method as well as one that uses parallel scan for additional acceleration. Results demonstrate that these algorithms are significantly faster than existing SIMD dynamic programming algorithms.
Finally, I describe two extensions to the partial sums algorithm. The first adds support for affine gap penalty scoring. Affine gap scoring represents the biological likelihood that it is more likely for gaps to be continuous than to be distributed throughout a region by introducing a gap opening penalty and a gap extension penalty. The second extension is an algorithm that uses the partial sums method to calculate the tandem alignment of a pattern against a text sequence using a single pattern copy.
Next generation sequencing data provides a wealth of information to researchers. Extracting that information in a timely manner increases the utility and practicality of sequence analysis algorithms. This thesis presents a family of algorithms which provide alignment scores in less time than previous algorithms.
|
95 |
Model Detection Based upon Amino Acid PropertiesMenlove, Kit J. 09 August 2010 (has links) (PDF)
Similarity searches are an essential component to most bioinformatic applications. They form the bases of structural motif identification, gene identification, and insights into functional associations. With the rapid increase in the available genetic data through a wide variety of databases, similarity searches are an essential tool for accessing these data in an informative and productive way. In our chapter, we provide an overview of similarity searching approaches, related databases, and parameter options to achieve the best results for a variety of applications. We then provide a worked example and some notes for consideration. Homology detection is one of the most basic and fundamental problems at the heart of bioinformatics. It is central to problems currently under intense investigation in protein structure prediction, phylogenetic analyses, and computational drug development. Currently discriminative methods for homology detection, which are not readily interpretable, are substantially more powerful than their more interpretable counterparts, particularly when sequence identity is very low. Here I present a computational graph-based framework for homology inference using physiochemical amino acid properties which aims to both reduce the gap in accuracy between discriminative and generative methods and provide a framework for easily identifying the physiochemical basis for the structural similarity between proteins. The accuracy of my method slightly improves on the accuracy of PSI-BLAST, the most popular generative approach, and underscores the potential of this methodology given a more robust statistical foundation.
|
96 |
DEVELOPING TOOLS FOR RNA STRUCTURAL ALIGNMENTMokdad, Ali G. 28 March 2006 (has links)
No description available.
|
97 |
USING PROGRAM SLICING AND SEQUENCE ALIGNMENT TO ANALYZE ORGANISMS OF AVIDA, A DIGITAL EVOLUTION PLATFORMHu, Hanqing 09 March 2012 (has links)
No description available.
|
98 |
A PAIRWISE COMPARISON OF DNA SEQUENCE ALIGNMENT USING AN OPENMP IMPLEMENTATION OF THE SWAMP PARALLEL SMITH-WATERMAN ALGORITHMCuevas, Tristan Lee 22 April 2015 (has links)
No description available.
|
99 |
DERIVING ACTIVITY PATTERNS FROM INDIVIDUAL TRAVEL DIARY DATA: A SPATIOTEMPORAL DATA MINING APPROACHDing, Guoxiang 31 August 2009 (has links)
No description available.
|
100 |
Searching Biological Sequence Databases Using Distributed Adaptive ComputingPappas, Nicholas Peter 06 February 2003 (has links)
Genetic research projects currently can require enormous computing power to processes the vast quantities of data available. Further, DNA sequencing projects are generating data at an exponential rate greater than that of the development microprocessor technology; thus, new, faster methods and techniques of processing this data are needed. One common type of processing involves searching a sequence database for the most similar sequences. Here we present a distributed database search system that utilizes adaptive computing technologies. The search is performed using the Smith-Waterman algorithm, a common sequence comparison algorithm. To reduce the total search time, an initial search is performed using a version of the algorithm, implemented in adaptive computing hardware, which is designed to efficiently perform the initial search. A final search is performed using a complete version of the algorithm. This two-stage search, employing adaptive and distributed hardware, achieves a performance increase of several orders of magnitude over similar processor based systems. / Master of Science
|
Page generated in 0.0571 seconds