Global ETD Search

201	On multiple sequence alignment Wang, Shu, 1973- 29 August 2008 (has links) The tremendous increase in biological sequence data presents us with an opportunity to understand the molecular and cellular basis for cellular life. Comparative studies of these sequences have the potential, when applied with sufficient rigor, to decipher the structure, function, and evolution of cellular components. The accuracy and detail of these studies are directly proportional to the quality of these sequences alignments. Given the large number of sequences per family of interest, and the increasing number of families to study, improving the speed, accuracy and scalability of MSA is becoming an increasingly important task. In the past, much of interest has been on Global MSA. In recent years, the focus for MSA has shifted from global MSA to local MSA. Local MSA is being needed to align variable sequences from different families/species. In this dissertation, we developed two new algorithms for fast and scalable local MSA, a three-way-consistency-based MSA and a biclustering -based MSA. The first MSA algorithm is a three-way-Consistency-Based MSA (CBMSA). CBMSA applies alignment consistency heuristics in the form of a new three-way alignment to MSA. While three-way consistency approach is able to maintain the same time complexity as the traditional pairwise consistency approach, it provides more reliable consistency information and better alignment quality. We quantify the benefit of using three-way consistency as compared to pairwise consistency. We have also compared CBMSA to a suite of leading MSA programs and CBMSA consistently performs favorably. We also developed another new MSA algorithm, a biclustering-based MSA. Biclustering is a clustering method that simultaneously clusters both the domain and range of a relation. A challenge in MSA is that the alignment of sequences is often intended to reveal groups of conserved functional subsequences. Simultaneously, the grouping of the sequences can impact the alignment; precisely the kind of dual situation biclustering algorithms are intended to address. We define a representation of the MSA problem enabling the application of biclustering algorithms. We develop a computer program for local MSA, BlockMSA, that combines biclustering with divide-and-conquer. BlockMSA simultaneously finds groups of similar sequences and locally aligns subsequences within them. Further alignment is accomplished by dividing both the set of sequences and their contents. The net result is both a multiple sequence alignment and a hierarchical clustering of the sequences. BlockMSA was compared with a suite of leading MSA programs. With respect to quantitative measures of MSA, BlockMSA scores comparable to or better than the other leading MSA programs. With respect to biological validation of MSA, the other leading MSA programs lag BlockMSA in their ability to identify the most highly conserved regions. Nucleotide sequence--Data processing Amino acid sequence--Data processing Nucleotide sequence--Computer programs Bioinformatics Computer algorithms
202	Solving repeat problems in shotgun sequencing / Arner, Erik, January 2006 (has links) Diss. (sammanfattning) Stockholm : Karolinska institutet, 2006. / Härtill 3 uppsatser.
203	Protein sequence constraints Lavelle, Daniel Thor. January 2009 (has links) Thesis (Ph. D.)--University of Virginia, 2009. / Title from title page. Includes bibliographical references. Also available online through Digital Dissertations.
204	Evaluation of Live Sequence Charts Using Play Engine Tool / Evaluation of Live Sequence Charts Using Play Engine Tool Gopidi, Vijay Kumar January 2005 (has links) Capturing a requirement is a great challenge in the initial stages of the software development, be it a system requirement or a customer requirement to the software engineers. Understanding the requirement and predicting or differentiating what may happen and what must happen is difficult especially in the complex real time systems. Live sequence charts are extensions of the message sequence charts which can specify the live ness of the requirements. And the play engine tool is used to specify, validate, and analyze the scenarios of the requirements. This thesis is to evaluate live sequence charts using the play engine tool and to see if the built-in model checkers can detect inconsistencies in the LSC’s. / The requirements capturing and analysis has always been the initial criteria and main problem during the software design and development for the software engineers. It’s been very common to use natural language for capturing the requirements in the industries because of its ease of use. The graphical languages were used to represent the requirements, its behavior and the scenarios graphically or visually, for example UML. UML Sequence diagrams are used in the real time software development to capture the requirements which specifies the scenarios of the system behavior and also the interactions between the objects graphically. Message sequence charts are also a graphical language for representing the scenarios and also the behavior of the system especially in the telecommunication domain. But these two are only useful in specifying the one aspect of the behavior and not much helpful in specifying the liveness of the requirement. Liveness can be defined as something good will happen [34] or something must happen. For this reason live sequence charts were developed which can specify the liveness of the requirement. Live sequence charts are capable of specifying the scenarios what may happen and what must happen. This thesis is to evaluate the live sequence charts using the play engine tool running on the windows machine and also to study the built in model checkers for formal verification. The thesis starts with the various types of graphical representation of requirements in Software Engineering, followed by the Research Methodology, next a bit more explanation of Live Sequence Charts, Evaluation, Result, Conclusions and Lessons Learned from the thesis. / Permanent Address: C/O: K.VIJAYA H.NO. 3-1-39/12/3/2 TEACHER'S COLONY ARMOOR-503224 ANDHRAPRADESH INDIA. Live Sequence Charts Message Sequence Charts Sequence Diagrams Play In/Out Software Engineering Programvaruteknik
205	Analysing Message Sequence Graph Specifications Chakraborty, Joy 04 1900 (has links) Message Sequence Charts are a visual representation of the system specification which shows how all the participating processes are interacting with each other. Message Sequence Graphs provide modularity by easily allowing combination of more than one Message Sequence Charts to show more complicated system behavior. Requirements modeled as Message Sequence Graphs give a global view of the system as interaction across all the participating processes can be viewed. Thus systems modeled as Message Sequence Graphs are like sequential composition of parallel process. This makes it very attractive during the requirements gathering and review phases as it needs inter-working between different stakeholders with varied domain knowledge and expertise – requirements engineers, system designers, end customers, test professionals etc. In this thesis we give a detailed construction of a finite-state transition system for a com-connected Message Sequence Graph. Though this result is fairly well-known in the literature there has been no precise description of such a transition system. Several analysis and verification problems concerning MSG specifications can be solved using this transition system. The transition system can be used to construct correct tools for problems like model-checking and detecting implied scenarios in MSG specifications. There are several contributions of this thesis. Firstly, we have provided a detailed construction of a transition system exactly implementing the message sequence graph. We have provided the detailed correctness arguments for this construction. Secondly, this construction works for general Message Sequence Graphs and not limited to com-connected graphs alone, although, we show that a finite model can be ensured only if the original graph is com-connected. Also, we show that the construction works for both synchronous and asynchronous messaging systems. Thirdly, we show how to find implied scenarios using the transition model we have generated. We also discuss some of the flaws in the existing approaches. Fourthly we provide a proof of undecidability argument for non com-connected MSG with synchronous messaging. Data Transmission Modes Message Communication Message Sequence Charts Message Sequence Graphs (MSG) Computer Science
206	Multiple Biolgical Sequence Alignment: Scoring Functions, Algorithms, and Evaluations Nguyen, Ken D 14 December 2011 (has links) Aligning multiple biological sequences such as protein sequences or DNA/RNA sequences is a fundamental task in bioinformatics and sequence analysis. These alignments may contain invaluable information that scientists need to predict the sequences' structures, determine the evolutionary relationships between them, or discover drug-like compounds that can bind to the sequences. Unfortunately, multiple sequence alignment (MSA) is NP-Complete. In addition, the lack of a reliable scoring method makes it very hard to align the sequences reliably and to evaluate the alignment outcomes. In this dissertation, we have designed a new scoring method for use in multiple sequence alignment. Our scoring method encapsulates stereo-chemical properties of sequence residues and their substitution probabilities into a tree-structure scoring scheme. This new technique provides a reliable scoring scheme with low computational complexity. In addition to the new scoring scheme, we have designed an overlapping sequence clustering algorithm to use in our new three multiple sequence alignment algorithms. One of our alignment algorithms uses a dynamic weighted guidance tree to perform multiple sequence alignment in progressive fashion. The use of dynamic weighted tree allows errors in the early alignment stages to be corrected in the subsequence stages. Other two algorithms utilize sequence knowledge-bases and sequence consistency to produce biological meaningful sequence alignments. To improve the speed of the multiple sequence alignment, we have developed a parallel algorithm that can be deployed on reconfigurable computer models. Analytically, our parallel algorithm is the fastest progressive multiple sequence alignment algorithm. Multiple sequence alignments Algorithms Scoring functions Computer Sciences
207	Combinatorial optimization and application to DNA sequence analysis Gupta, Kapil 25 August 2008 (has links) With recent and continuing advances in bioinformatics, the volume of sequence data has increased tremendously. Along with this increase, there is a growing need to develop efficient algorithms to process such data in order to make useful and important discoveries. Careful analysis of genomic data will benefit science and society in numerous ways, including the understanding of protein sequence functions, early detection of diseases, and finding evolutionary relationships that exist among various organisms. Most sequence analysis problems arising from computational genomics and evolutionary biology fall into the class of NP-complete problems. Advances in exact and approximate algorithms to address these problems are critical. In this thesis, we investigate a novel graph theoretical model that deals with fundamental evolutionary problems. The model allows incorporation of the evolutionary operations ``insertion', ``deletion', and ``substitution', and various parameters such as relative distances and weights. By varying appropriate parameters and weights within the model, several important combinatorial problems can be represented, including the weighted supersequence, weighted superstring, and weighted longest common sequence problems. Consequently, our model provides a general computational framework for solving a wide variety of important and difficult biological sequencing problems, including the multiple sequence alignment problem, and the problem of finding an evolutionary ancestor of multiple sequences. In this thesis, we develop large scale combinatorial optimization techniques to solve our graph theoretical model. In particular, we formulate the problem as two distinct but related models: constrained network flow problem and weighted node packing problem. The integer programming models are solved in a branch and bound setting using simultaneous column and row generation. The methodology developed will also be useful to solve large scale integer programming problems arising in other areas such as transportation and logistics. DNA sequence analysis Integer programming Network flow Node packing Row generation Column generation Combinatorial optimization Nucleotide sequence Sequence alignment (Bioinformatics)
208	Uma abordagem para detecção e remoção de artefatos em sequencias ESTs / An approach to detect and remove artifacts in EST sequences Baudet, Christian 12 January 2006 (has links) Orientador: Zanoni Dias / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-08T07:27:54Z (GMT). No. of bitstreams: 1 Baudet_Christian_M.pdf: 13612079 bytes, checksum: 648d18039dc13dcd5a2f422cc7863666 (MD5) Previous issue date: 2006 / Resumo: O sequenciamento de ESTs (Expressed Sequence Tag) [2] e uma tecnica que trabalha com bibliotecas de cDNAs tendo como objetivo a obtençao de uma boa aproximaçao para o ?ndice genico, que e a listagem de genes existentes no genoma do organismo estudado. Antes da serem analisadas, as sequencias obtidas do sequenciamento dos ESTs devem ser processadas para eliminaçao de artefatos. Artefatos sao trechos que nao pertencem ao organismo ou que possuem baixa qualidade ou baixa complexidade. Trechos de vetores, adaptadores e caudas poli-A podem ser citados como exemplos de artefatos. A eliminaçao dos artefatos deve ser feita para que a an'alise das sequencias produzidas no projeto nao seja prejudicada por estes ?ru?dos?. Por exemplo, artefatos presentes em sequencias freq¨uentemente produzem erros em processos de clusterizaçao, pois eles podem determinar se sequencias serao unidas em um mesmo cluster ou separadas em clusters diferentes. Observando a importancia da realizaçao de um bom processo de limpeza das sequencias, o trabalho desenvolvido nesta dissertaçao teve como principal objetivo a obtençao de um conjunto eficiente de procedimentos de detecçao e remoçao de artefatos. Este conjunto foi produzido a partir de uma nova estrategia de deteçao de artefatos. Normalmente, cada projeto de seq¨uenciamento possui seu proprio conjunto de procedimentos dividido em varias etapas. Estas etapas sao, em geral, ligadas entre si e o resultado de uma pode influenciar o resultado de outra. A nossa estrategia visa a realizaçao destas etapas de forma totalmente independente. Alem da avaliaçao desta nova estrategia, o trabalho tambem realizou um estudo mais detalhado sobre dois tipos de artefatos: baixa qualidade e derrapagem. Para cada um deles, algoritmos foram propostos e validados atraves de testes com conjuntos de seq¨u?encias produzidas em projetos reais de sequenciamento. O conjunto final de procedimentos, baseado nos estudos desenvolvidos durante a escrita deste texto, foi testado com as sequencias do projeto SUCEST [100, 103, 113] e mostrou bons resultados. O clustering produzido com as sequencias processadas por nossos metodos apresentou melhores consistencia interna e externa e menores taxas de redundancia quando comparado ao clustering original do projeto / Abstract: Expressed Sequence Tag (EST) Sequencing [2] is one technique that works with cDNA libraries. It aims to achieve a good approximation for the gene index of an organism. Before analyzing the sequences obtained by sequencing ESTs, they must be processed for artifact removal. An artifact is a sequence that does not belong to the studied organism or that has low quality or low complexity. As example of artifacts, we have adapters, poly- A tails, vectors, etc. Artifacts removal must be performed because their presence can produce ?noises? in the sequencing project data analysis. For example, artifact can join two sequences in a same cluster inappropriately or separate them in two different clusters when they should be put together. Motivated by the sequence cleaning process importance, our main objective in this work was to develop an efficient set of procedures to detect and to remove sequence artifacts. Usually, each EST sequencing project has its own procedure set divided in many steps. These steps are, in general, linked and the result of one given step might influence the result of the next one. Our strategy was to perform each step independently assuring that any execution order of those steps would lead to the same result. Additionally to the new strategy evaluation, this work also studied detailedly two type of artifacts: low quality and slippage. For each one, algorithms were proposed and validated through tests with sequences of real sequencing projects. The final set of procedure, developed in this work, was evaluated using the sequences of the SUCEST project [100, 103, 113] and produced good results. The resulting clustering from our method has better external and internal consistency and lower redundacy rate than those produced by the SUCEST project clustering / Mestrado / Ciência da Computação / Mestre em Ciência da Computação Sequência de nucleotídeos DNA - Análise Bioinformática Sequenciamento de DNA Expressed sequence tags Nucleotide sequence DNA - Analysis Bioinformatics DNA sequencing Expressed Sequence Tags
209	Novel Deep Learning Models for Spatiotemporal Predictive Tasks Le, Quang 23 November 2022 (has links) Spatiotemporal Predictive Learning (SPL) is an essential research topic involving many practical and real-world applications, e.g., motion detection, video generation, precipitation forecasting, and traffic flow prediction. The problems and challenges of this field come from numerous data characteristics in both time and space domains, and they vary depending on the specific task. For instance, spatial analysis refers to the study of spatial features, such as spatial location, latitude, elevation, longitude, the shape of objects, and other patterns. From the time domain perspective, the temporal analysis generally illustrates the time steps and time intervals of data points in the sequence, also known as interval recording or time sampling. Typically, there are two types of time sampling in temporal analysis: regular time sampling (i.e., the time interval is assumed to be fixed) and the irregular time sampling (i.e., the time interval is considered arbitrary) related closely to the continuous-time prediction task when data are in continuous space. Therefore, an efficient spatiotemporal predictive method has to model spatial features properly at the given time sampling types. In this thesis, by taking advantage of Machine Learning (ML) and Deep Learning (DL) methods, which have achieved promising performance in many complicated computational tasks, we propose three DL-based models used for Spatiotemporal Sequence Prediction (SSP) with several types of time sampling. First, we design the Trajectory Gated Recurrent Unit Attention (TrajGRU-Attention) with novel attention mechanisms, namely Motion-based Attention (MA), to improve the performance of the standard Convolutional Recurrent Neural Networks (ConvRNNs) in the SSP tasks. In particular, the TrajGRU-Attention model can alleviate the impact of the vanishing gradient, which leads to the blurry effect in the long-term predictions and handle both regularly sampled and irregularly sampled time series. Consequently, this model can work effectively with different scenarios of spatiotemporal sequential data, especially in the case of time series with missing time steps. Second, by taking the idea of Neural Ordinary Differential Equations (NODEs), we propose Trajectory Gated Recurrent Unit integrating Ordinary Differential Equation techniques (TrajGRU-ODE) as a continuous time-series model. With Ordinary Differential Equation (ODE) techniques and the TrajGRU neural network, this model can perform continuous-time spatiotemporal prediction tasks and generate resulting output with high accuracy. Compared to TrajGRU-Attention, TrajGRU-ODE benefits from the development of efficient and accurate ODE solvers. Ultimately, we attempt to combine those two models to create TrajGRU-Attention-ODE. NODEs are still in their early stage of research, and recent ODE-based models were designed for many relatively simple tasks. In this thesis, we will train the models with several video datasets to verify the ability of the proposed models in practical applications. To evaluate the performance of the proposed models, we select four available spatiotemporal datasets based on the complexity level, including the MovingMNIST, MovingMNIST++, and two real-life datasets: the weather radar HKO-7 and KTH Action. With each dataset, we train, validate, and test with distinct types of time sampling to justify the prediction ability of our models. In summary, the experimental results on the four datasets indicate the proposed models can generate predictions properly with high accuracy and sharpness. Significantly, the proposed models outperform state-of-the-art ODE-based approaches under SSP tasks with different circumstances of interval recording. spatiotemporal sequence prediction convolutional recurrent networks attention mechanisms neural ordinary differential equations
210	Molecular characterization and cytogenetic analysis of chicken repetitive DNA sequences 王曉飛, Wang, Xiaofei. January 1999 (has links) published_or_final_version / Zoology / Doctoral / Doctor of Philosophy Nucleotide sequence.

Search results