Global ETD Search

81	The genease activity of mung bean nuclease: fact or fiction? Kula, Nothemba January 2004 (has links) <p>The action of Mung Bean Nuclease (MBN) on DNA makes it possible to clone intact gene fragments from genes of the malaria parasite, Plasmodium. This &ldquo / genease&rdquo / activity has provided a foundation for further investigation of the coding elements of the Plasmodium genome. MBN has been reported to cleave genomic DNA of Plasmodium preferentially at positions before and after genes, but not within gene coding regions. This mechanism has overcome the difficulty encountered in obtaining genes with low expression levels because the cleavage mechanism of the enzyme yields sequences of genes from genomic DNA rather than mRNA. However, as potentially useful as MBN may be, evidence to support its genease activity comes from analysis of a limited number of genes. It is not clear whether this mechanism is specific to certain genes or species of Plasmodia or whether it is a general cleavage mechanism for Plasmodium DNA .There have also been some projects (Nomura et al., 2001 / van Lin, Janse, and Waters, 2000) which have identified MBN generated fragments which contain fragments of genes with both introns and exons, rather than the intact genes expected from MBN-digestion of genomic DNA, which raises concerns about the efficiency of the MBN mechanism in generating complete genes.</p> <p><br /> Using a large-scale, whole genome mapping approach, 7242 MBN generated genome survey sequences (GSSs) have been mapped to determine their position relative to coding sequences within the complete genome sequences of the human malaria parasite Plasmodium falciparum and the incomplete genome of a rodent malaria parasite Plasmodium berghei. The location of MBN cleavage sites was determined with respect to coding regions in orthologous genes, non-coding /intergenic regions and exon-intron boundaries in these two species of Plasmodium. The survey illustrates that for P. falciparum 79% of GSSs had at least one terminal mapping within an ortholog coding sequence and 85% of GSSs which overlapped coding sequence boundaries mapped within 50 bp of the start or end of the gene. Similarly, despite the partial nature of P.berghei genome sequence information, 73% of P.berghei GSSs had at least one terminal mapping within an ortholog coding sequence and 37% of these mapped between 0-50 bp of the start or end of the gene. This indicates that a larger percentage of cleavage sites in both P.falciparum and P.berghei were found proximal to coding regions. Furthermore, 86% of P.falciparum GSSs had at least one terminal mapping within a coding exon and 85% of GSSs which overlapped exon-intron boundaries mapped within 50bp of the exon start and end site. The fact that 11% of GSSs mapped completely to intronic regions, suggests that some introns contain specific cleavage sites sensitive to cleavage and this also indicates that MBN cleavage of Plasmodium DNA does not always yield complete exons.</p> <p><br /> Finally, the results presented herein were obtained from analysis of several thousand Plasmodium genes which have different coding sequences, in different locations on individual chromosomes/contigs in two different species of Plasmodium. Therefore it appears that the MBN mechanism is neither species specific nor is it limited to specific genes.</p> Exon Genome survey sequence Mung Bean Nuclease Nuclease cleavage site Plasmodium falciparum Plasmodium berghei Sequence Alignment.
82	Multiple Versions and Overlap in Digital Text Desmond Schmidt Unknown Date (has links) This thesis is unusual in that it tries to solve a problem that exists between two widely separated disciplines: the humanities (and to some extent also linguistics) on the one hand and information science on the other. Chapter 1 explains why it is essential to strike a balance between study of the solution and problem domains. Chapter 2 surveys the various models of cultural heritage text, starting in the remote past, through the coming of the digital era to the present. It establishes why current models are outdated and need to be revised, and also what significance such a revision would have. Chapter 3 examines the history of markup in an attempt to trace how inadequacies of representation arose. It then examines two major problems in cultural heritage and lin- guistics digital texts: overlapping hierarchies and textual variation. It assesses previously proposed solutions to both problems and explains why they are all inadequate. It argues that overlapping hierarchies is a subset of the textual variation problem, and also why markup cannot be the solution to either problem. Chapter 4 develops a new data model for representing cultural heritage and linguistics texts, called a ‘variant graph’, which separates the natural overlapping structures from the content. It develops a simplified list-form of the graph that scales well as the number of versions increases. It also describes the main operations that need to be performed on the graph and explores their algorithmic complexities. Chapter 5 draws on research in bioinformatics and text processing to develop a greedy algorithm that aligns n versions with non-overlapping block transpositions in O(M N ) time in the worst case, where M is the size of the graph and N is the length of the new version being added or updated. It shows how this algorithm can be applied to texts in corpus linguistics and the humanities, and tests an implementation of the algorithm on a variety of real-world texts. 08 Information and Computing Sciences overlapping hierarchies textual variation multiple sequence alignment textual criticism
83	Algorithms for building and evaluating multiple sequence alignments / Lassmann, Timo, January 2006 (has links) Diss. (sammanfattning) Stockholm : Karolinska institutet, 2006. / Härtill 6 uppsatser.
84	Solving repeat problems in shotgun sequencing / Arner, Erik, January 2006 (has links) Diss. (sammanfattning) Stockholm : Karolinska institutet, 2006. / Härtill 3 uppsatser.
85	Bioinformatics and Handwriting/Speech Reconition: Uncoventional Applications of Similarity Search Tools Jensen, Kyle, Stephanopoulos, Gregory 01 1900 (has links) This work introduces two unconventional applications for sequence alignment algorithms outside the domain of bioinformatics: handwriting recognition and speech recognition. In each application we treated data samples, such as the path of a and written pen stroke, as a protein sequence and use the FastA sequence alignment tool to classify unknown data samples, such as a written character. That is, we handle the handwriting and speech recognition problems like the protein annotation problem: given a sequence of unknown function, we annotate the sequence via sequence alignment. This approach achieves classification rates of 99.65% and 93.84% for the handwriting and speech recognition respectively. In addition, we provide a framework for applying sequence alignment to a variety of other non–traditional problems. / Singapore-MIT Alliance (SMA) Machine learning bioinformatics amino acids protein sequences sequence alignment FastA voice dynamic programming handwriting
86	Waveform Mapping and Time-Frequency Processing of Biological Sequences and Structures January 2011 (has links) abstract: Genomic and proteomic sequences, which are in the form of deoxyribonucleic acid (DNA) and amino acids respectively, play a vital role in the structure, function and diversity of every living cell. As a result, various genomic and proteomic sequence processing methods have been proposed from diverse disciplines, including biology, chemistry, physics, computer science and electrical engineering. In particular, signal processing techniques were applied to the problems of sequence querying and alignment, that compare and classify regions of similarity in the sequences based on their composition. However, although current approaches obtain results that can be attributed to key biological properties, they require pre-processing and lack robustness to sequence repetitions. In addition, these approaches do not provide much support for efficiently querying sub-sequences, a process that is essential for tracking localized database matches. In this work, a query-based alignment method for biological sequences that maps sequences to time-domain waveforms before processing the waveforms for alignment in the time-frequency plane is first proposed. The mapping uses waveforms, such as time-domain Gaussian functions, with unique sequence representations in the time-frequency plane. The proposed alignment method employs a robust querying algorithm that utilizes a time-frequency signal expansion whose basis function is matched to the basic waveform in the mapped sequences. The resulting WAVEQuery approach is demonstrated for both DNA and protein sequences using the matching pursuit decomposition as the signal basis expansion. The alignment localization of WAVEQuery is specifically evaluated over repetitive database segments, and operable in real-time without pre-processing. It is demonstrated that WAVEQuery significantly outperforms the biological sequence alignment method BLAST for queries with repetitive segments for DNA sequences. A generalized version of the WAVEQuery approach with the metaplectic transform is also described for protein sequence structure prediction. For protein alignment, it is often necessary to not only compare the one-dimensional (1-D) primary sequence structure but also the secondary and tertiary three-dimensional (3-D) space structures. This is done after considering the conformations in the 3-D space due to the degrees of freedom of these structures. As a result, a novel directionality based 3-D waveform mapping for the 3-D protein structures is also proposed and it is used to compare protein structures using a matched filter approach. By incorporating a 3-D time axis, a highly-localized Gaussian-windowed chirp waveform is defined, and the amino acid information is mapped to the chirp parameters that are then directly used to obtain directionality in the 3-D space. This mapping is unique in that additional characteristic protein information such as hydrophobicity, that relates the sequence with the structure, can be added as another representation parameter. The additional parameter helps tracking similarities over local segments of the structure, this enabling classification of distantly related proteins which have partial structural similarities. This approach is successfully tested for pairwise alignments over full length structures, alignments over multiple structures to form a phylogenetic trees, and also alignments over local segments. Also, basic classification over protein structural classes using directional descriptors for the protein structure is performed. / Dissertation/Thesis / Ph.D. Electrical Engineering 2011 Engineering Bioinformatics Molecular Biology chirp signal classification DNA sequence alignment Gaussian signal protein structure alignment querying
87	Aplicação de algoritmos genéricos multi-objetivo para alinhamento de seqüências biológicas. / Multi-objective genetic algorithms applied to protein sequence alignment. Waldo Gonzalo Cancino Ticona 26 February 2003 (has links) O alinhamento de seqüências biológicas é uma operação básica em Bioinformática, já que serve como base para outros processos como, por exemplo, a determinação da estrutura tridimensional das proteínas. Dada a grande quantidade de dados presentes nas seqüencias, são usadas técnicas matemáticas e de computação para realizar esta tarefa. Tradicionalmente, o Problema de Alinhamento de Seqüências Biológicas é formulado como um problema de otimização de objetivo simples, onde alinhamento de maior semelhança, conforme um esquema de pontuação, é procurado. A Otimização Multi-Objetivo aborda os problemas de otimização que possuem vários critérios a serem atingidos. Para este tipo de problema, existe um conjunto de soluções que representam um "compromiso" entre os objetivos. Uma técnica que se aplica com sucesso neste contexto são os Algoritmos Evolutivos, inspirados na Teoria da Evolução de Darwin, que trabalham com uma população de soluções que vão evoluindo até atingirem um critério de convergência ou de parada. Este trabalho formula o Problema de Alinhamento de Seqüências Biológicas como um Problema de Otimização Multi-Objetivo, para encontrar um conjunto de soluções que representem um compromisso entre a extensão e a qualidade das soluções. Aplicou-se vários modelos de Algoritmos Evolutivos para Otimização Multi-Objetivo. O desempenho de cada modelo foi avaliado por métricas de performance encontradas na literatura. / The Biological Sequence Alignment is a basic operation in Bioinformatics since it serves as a basis for other processes, i.e. determination of the protein's three-dimensional structure. Due to the large amount of data involved, mathematical and computational methods have been used to solve this problem. Traditionally, the Biological Alignment Sequence Problem is formulated as a single optimization problem. Each solution has a score that reflects the similarity between sequences. Then, the optimization process looks for the best score solution. The Multi-Objective Optimization solves problems with multiple objectives that must be reached. Frequently, there is a solution set that represents a trade-off between the objectives. Evolutionary Algorithms, which are inspired by Darwin's Evolution Theory, have been applied with success in solving this kind of problems. This work formulates the Biological Sequence Alignment as a Multi-Objective Optimization Problem in order to find a set of solutions that represent a trade-off between the extension and the quality of the solutions. Several models of Evolutionary Algorithms for Multi-Objetive Optimization have been applied and were evaluated using several performance metrics found in the literature. algoritmos evolutivos alinhamento de seqüências otimização multi-objetivo evolutionary algorithms multi-objective optimization sequence alignment
88	Identificação e caracterização de grupos de indivíduos segundo padrões de seqüências de atividades multidimensionais. / Identification and characterization of groups of individuals according to patterns of multidimensional activity sequences. Ricardo Curvello Dalmaso 30 April 2009 (has links) O presente estudo procura identificar grupos homogêneos de indivíduos quanto aos padrões de seqüências de atividades diárias que estes realizam. As atividades são caracterizadas por múltiplos atributos, fazendo com que as seqüências sejam multidimensionais. Como atributos, ou características, são considerados a natureza da atividade realizada, ou motivo da viagem, e o período de realização da mesma, ambos separados em categorias. É estudado o efeito da inclusão da forma de acesso à atividade, ou modo de viagem, como uma terceira dimensão. Este atributo, entretanto, dados os resultados obtidos, não é utilizado nas análises finais. É também considerada a adoção de diferentes categorizações para a dimensão motivo. São usados dados da pesquisa Origem e Destino realizada em 1997, na Região Metropolitana de São Paulo. No trabalho são considerados os indivíduos com 12 anos ou mais, com pelo menos duas viagens diárias e com seqüência de viagens iniciada e terminada em sua residência, sem inconsistências internas. O número de indivíduos que atende a estes critérios é 49.616. A classificação, ou agrupamento, das seqüências de atividades em classes ou grupos é feita considerando uma medida de distância ou dissimilaridade calculada entre as seqüências, que é baseada no esforço necessário para igualá-las. Esta medida é chamada de OT-MDSAM (uni-dimensional Optimum Trajectories-based MultiDimensional Sequence Alignment Method). A partir da matriz de dissimilaridades é executado um processo estatístico de agrupamento hierárquico aglomerativo usando o Método de Ward. Os grupos de seqüências formados são analisados considerando características das próprias seqüências e atributos sóciodemográficas e econômicas dos indivíduos que os compõem, e usados em um modelo de segmentação do tipo árvore de decisão, usando o CHAID (Chi-square Automatic Interaction Detector). Resultados indicam que os grupos formados são bastante homogêneos quanto aos padrões de seqüências de atividades que representam e aos indivíduos associados a eles. / The main objective of the dissertation is to identify homogeneous groups of individuals, with regard to the daily activity/travel sequences performed in a weekday. Activities are characterized by multiple attributes, thus generating mutidimensional seguences. In this study, the nature of the activity (travel purpose) and the starting period of engagement in the activity (ending time of a trip) were the dimensions considered in the characterization of activities. Access mode to the activity was also considered as a third dimension, but the results had led to the decision not to include it in the final analysis. Alternative categorizations of the activity nature dimension were also studied, that resulted in further disaggregation than adopted in previous analyses of the same data. The study used data from the 1997 Origin-Destination household survey of the Sao Paulo Metropolitan Area. The analysis considered all individuals aged 12 or over that conducted two or more trips (starting and ending at home) on the survey day, resulting in a sample of 49,616 individuals. A sequence alignment method - OT-MDSUM (uni-dimensional Optimum Trajectories-based MultiDimensional Sequence Alignment Method) - was used to compare and calculate distances between pairs of different activity/travel sequences. These distances were then fed into a Ward hierarchical clustering algorithm to create classes of groups of activity/travel patterns. These groups were then analyzed according to the characteristics of the activity/travel sequences included and to the sociodemographic and economic characteristics of individuals who performed these patterns. The data were then utilized to develop a decision tree model using CHAID - Chi-Squared Automatic Interaction Detector, having the group of activity/travel sequences as the response variable and the characteristics of individuals and their families as independent variables. The results indicate that the groups formed through this procedure present a good degree of homogeneity regarding the activity patterns they represent and that they can be clearly associated to the characteristics of the individuals which perform these patterns. Planejamento de transportes Activity sequences multidimensional sequence alignment Transport planning Travel demand
89	The genease activity of mung bean nuclease: fact or fiction? Kula, Nothemba January 2004 (has links) Magister Scientiae - MSc / The action of Mung Bean Nuclease (MBN) on DNA makes it possible to clone intact gene fragments from genes of the malaria parasite, Plasmodium. This “genease” activity has provided a foundation for further investigation of the coding elements of the Plasmodium genome. MBN has been reported to cleave genomic DNA of Plasmodium preferentially at positions before and after genes, but not within gene coding regions. This mechanism has overcome the difficulty encountered in obtaining genes with low expression levels because the cleavage mechanism of the enzyme yields sequences of genes from genomic DNA rather than mRNA. However, as potentially useful as MBN may be, evidence to support its genease activity comes from analysis of a limited number of genes. It is not clear whether this mechanism is specific to certain genes or species of Plasmodia or whether it is a general cleavage mechanism for Plasmodium DNA .There have also been some projects (Nomura et al., 2001;van Lin, Janse, and Waters, 2000) which have identified MBN generated fragments which contain fragments of genes with both introns and exons, rather than the intact genes expected from MBN-digestion of genomic DNA, which raises concerns about the efficiency of the MBN mechanism in generating complete genes.Using a large-scale, whole genome mapping approach, 7242 MBN generated genome survey sequences (GSSs) have been mapped to determine their position relative to coding sequences within the complete genome sequences of the human malaria parasite Plasmodium falciparum and the incomplete genome of a rodent malaria parasite Plasmodium berghei. The location of MBN cleavage sites was determined with respect to coding regions in orthologous genes, non-coding intergenic regions and exon-intron boundaries in these two species of Plasmodium. The survey illustrates that for P. falciparum 79% of GSSs had at least one terminal mapping within an ortholog coding sequence and 85% of GSSs which overlapped coding sequence boundaries mapped within 50 bp of the start or end of the gene. Similarly, despite the partial nature of P.berghei genome sequence information, 73% of P.berghei GSSs had at least one terminal mapping within an ortholog coding sequence and 37% of these mapped between 0-50 bp of the start or end of the gene. This indicates that a larger percentage of cleavage sites in both P.falciparum and P.berghei were found proximal to coding regions. Furthermore, 86% of P.falciparum GSSs had at least one terminal mapping within a coding exon and 85% of GSSs which overlapped exon-intron boundaries mapped within 50bp of the exon start and end site. The fact that 11% of GSSs mapped completely to intronic regions, suggests that some introns contain specific cleavage sites sensitive to cleavage and this also indicates that MBN cleavage of Plasmodium DNA does not always yield complete exons. Finally, the results presented herein were obtained from analysis of several thousand Plasmodium genes which have different coding sequences, in different locations on individual chromosomes/contigs in two different species of Plasmodium. Therefore it appears that the MBN mechanism is neither species specific nor is it limited to specific genes. / South Africa Exon Genome survey sequence Mung Bean Nuclease Nuclease cleavage site Plasmodium falciparum Plasmodium berghei Sequence Alignment
90	Multiple sequence alignment using particle swarm optimization Zablocki, Fabien Bernard Roman 16 January 2009 (has links) The recent advent of bioinformatics has given rise to the central and recurrent problem of optimally aligning biological sequences. Many techniques have been proposed in an attempt to solve this complex problem with varying degrees of success. This thesis investigates the application of a computational intelligence technique known as particle swarm optimization (PSO) to the multiple sequence alignment (MSA) problem. Firstly, the performance of the standard PSO (S-PSO) and its characteristics are fully analyzed. Secondly, a scalability study is conducted that aims at expanding the S-PSO’s application to complex MSAs, as well as studying the behaviour of three other kinds of PSOs on the same problems. Experimental results show that the PSO is efficient in solving the MSA problem and compares positively with well-known CLUSTAL X and T-COFFEE. / Dissertation (MSc)--University of Pretoria, 2009. / Computer Science / Unrestricted Computational intelligence Particle swarm optimization (PSO) Bioinformatics Artificial intelligence Multi sequence alignment Deoxyribonucleic acid (DNA) UCTD

Search results