Global ETD Search

1	Comparing XML Documents as Reference-aware Labeled Ordered Trees Mikhaiel, Rimon A. E. Unknown Date No description available. XML Differencing Tree Edit Distance XML Edit Distance Ontology Matching
2	Approaching intonational distance and change Sullivan, Jennifer Niamh January 2011 (has links) The main aim of this thesis is to begin to extend phonetic distance measurements to the domain of intonation. Existing studies of segmental phonetic distance have strong associations with historical linguistic questions. I begin with this context and demonstrate problems with the use of feature systems in these segmental measures. Then I attempt to draw strands from the disparate fields of quantitative historical linguistics and intonation together. The intonation of Belfast and Glasgow English provides a central case study for this. Previous work suggests that both varieties display nuclear rises on statements, yet they have never been formally compared. This thesis presents two main hypotheses on the source of these statement rises: the Alignment hypothesis and the Transfer hypothesis. The Alignment hypothesis posits that statement rises were originally more typical statement falls but have changed into rises over time through gradual phonetic change to the location of the pitch peak. The Transfer hypothesis considers that statement rises have come about through pragmatic transfer of rises onto a statement context, either from question rises or continuation rises. I evaluate these hypotheses using the primary parameters of alignment and scaling as phonetic distance measurements. The main data set consists of data from 3 Belfast English and 3 Glasgow English speakers in a Sentence reading task and Map task. The results crucially indicate that the origin of the statement rises in Belfast and Glasgow English respectively may be different. The Glasgow statement nuclear tones show support for the Alignment hypothesis, while the Belfast nuclear tones fit best with the Transfer hypothesis. The fundamental differences between Glasgow and Belfast are the earlier alignment of the peak (H) in Glasgow and the presence of a final low (L) tonal target in Glasgow and a final high (H) target in Belfast. The scaling of the final H in Belfast statements suggests that the transfer may be from continuation rather than from question rises. I then present a proposal for an overall measure of intonational distance, showing problems with parameter weighting, comparing like with like, and distinguishing between chance resemblance and genuine historical connections. The thesis concludes with an assessment of the benefits that intonational analysis could bring to improving segmental phonetic distance measures. 410
3	PiaNote: A Sight-Reading Program That Algorithmically Generates Music Based on Human Performance Schulz, Drew 01 June 2016 (has links) Sight-reading is the act of performing a piece of music at first sight. This can be a difficult task to master, because it requires extensive knowledge of music theory, practice, quick thinking, and most importantly, a wide variety of musical material. A musician can only effectively sight-read with a new piece of music. This not only requires many resources, but also musical pieces that are challenging while also within a player's abilities. This thesis presents PiaNote, a sight-reading web application for pianists that algorithmically generates music based on human performance. PiaNote's goal is to alleviate some of the hassles pianists face when sight-reading. PiaNote presents musicians with algorithmically generated pieces, ensuring that a musician never sees the same piece of music twice. PiaNote also monitors player performances in order to intelligently present music that is challenging, but within the player's abilities. As a result, PiaNote offers a sight-reading experience that is tailored to the player. On a broader level, this thesis explores different methods in effectively creating a sight-reading application. We evaluate PiaNote with a user study involving novice piano players. The players actively practice with PiaNote over three fifteen-minute sessions. At the end of the study, users are asked to determine whether PiaNote is an effective practice tool that improves both their confidence in sight-reading and their sight-reading abilities. Results suggest that PiaNote does improve user's sight-reading confidence and abilities, but further research must be conducted to clearly validate PiaNote's effectiveness. We conclude that PiaNote has potential to become an effective sight-reading application with slight improvements and further research. music piano algorithm edit-distance sight-reading AI Music Education
4	Combinatorial aspects of genome rearrangements and haplotype networks/Aspects combinatoires des réarrangements génomiques et des réseaux d'haplotypes Labarre, Anthony 12 September 2008 (has links) The dissertation covers two problems motivated by computational biology: genome rearrangements, and haplotype networks. Genome rearrangement problems are a particular case of edit distance problems, where one seeks to transform two given objects into one another using as few operations as possible, with the additional constraint that the set of allowed operations is fixed beforehand; we are also interested in computing the corresponding distances between those objects, i.e. merely computing the minimum number of operations rather than an optimal sequence. Genome rearrangement problems can often be formulated as sorting problems on permutations (viewed as linear orderings of {1,2,...,n}) using as few (allowed) operations as possible. In this thesis, we focus among other operations on ``transpositions', which displace intervals of a permutation. Many questions related to sorting by transpositions are open, related in particular to its computational complexity. We use the disjoint cycle decomposition of permutations, rather than the ``standard tools' used in genome rearrangements, to prove new upper bounds on the transposition distance, as well as formulae for computing the exact distance in polynomial time in many cases. This decomposition also allows us to solve a counting problem related to the ``cycle graph' of Bafna and Pevzner, and to construct a general framework for obtaining lower bounds on any edit distance between permutations by recasting their computation as factorisation problems on related even permutations. Haplotype networks are graphs in which a subset of vertices is labelled, used in comparative genomics as an alternative to trees. We formalise a new method due to Cassens, Mardulyn and Milinkovitch, which consists in building a graph containing a given set of partially labelled trees and with as few edges as possible. We give exact algorithms for solving the problem on two graphs, with an exponential running time in the general case but with a polynomial running time if at least one of the graphs belong to a particular class. / La thèse couvre deux problèmes motivés par la biologie: l'étude des réarrangements génomiques, et celle des réseaux d'haplotypes. Les problèmes de réarrangements génomiques sont un cas particulier des problèmes de distances d'édition, où l'on cherche à transformer un objet en un autre en utilisant le plus petit nombre possible d'opérations, les opérations autorisées étant fixées au préalable; on s'intéresse également à la distance entre les deux objets, c'est-à-dire au calcul du nombre d'opérations dans une séquence optimale plutôt qu'à la recherche d'une telle séquence. Les problèmes de réarrangements génomiques peuvent souvent s'exprimer comme des problèmes de tri de permutations (vues comme des arrangements linéaires de {1,2,...,n}) en utilisant le plus petit nombre d'opérations (autorisées) possible. Nous examinons en particulier les ``transpositions', qui déplacent un intervalle de la permutation. Beaucoup de problèmes liés au tri par transpositions sont ouverts, en particulier sa complexité algorithmique. Nous nous écartons des ``outils standards' utilisés dans le domaine des réarrangements génomiques, et utilisons la décomposition en cycles disjoints des permutations pour prouver de nouvelles majorations sur la distance des transpositions ainsi que des formules permettant de calculer cette distance en temps polynomial dans de nombreux cas. Cette décomposition nous sert également à résoudre un problème d'énumération concernant le ``graphe des cycles' de Bafna et Pevzner, et à construire une technique générale permettant d'obtenir de nouvelles minorations en reformulant tous les problèmes de distances d'édition sur les permutations en termes de factorisations de permutations paires associées. Les réseaux d'haplotypes sont des graphes dont une partie des sommets porte des étiquettes, utilisés en génomique comparative quand les arbres sont trop restrictifs, ou quand l'on ne peut choisir une ``meilleure' topologie parmi un ensemble donné d'arbres. Nous formalisons une nouvelle méthode due à Cassens, Mardulyn et Milinkovitch, qui consiste à construire un graphe contenant tous les arbres partiellement étiquetés donnés et possédant le moins d'arêtes possible, et donnons des algorithmes résolvant le problème de manière optimale sur deux graphes, dont le temps d'exécution est exponentiel en général mais polynomial dans quelques cas que nous caractérisons. permutation cycle graph/graphe des cycles supergraph/supergraphe edit distance/distance d'édition sorting/tri
5	Efficient Algorithms for the Block Edit Distance and Related Problems Ann, Hsing-Yen 18 May 2010 (has links) Computing the similarity of two strings or sequences is one of the most important fundamental in computer field, and it has been widely studied for several decades. In the last decade, it gained the researchers' attentions again because of the improvements of the hardware computation ability and the presence of huge amount of data in biotechnology. In this dissertation, we pay attention to computing the edit distance between two sequences where the block-edit operations are involved in addition to the character-edit operations. Previous researches show that this problem is NP-hard if recursive block moves are allowed. Since we are interested in solving the editing problems by the polynomial-time optimization algorithms, we consider the simplified version of the edit distance problem. We first focus on the longest common subsequence (LCS) of run-length encoded (RLE) strings, where the runs can be seen as a class of simplified blocks. Then, we apply constraints to the problem, i.e. to find the constrained LCS (CLCS) of RLE strings. Besides, we show that the problems which involve block-edit operations can still be solved by the polynomial-time optimization algorithms if some restrictions are applied. Let X and Y be two sequences of lengths n and m, respectively. Also, let N and M, be the numbers of runs in the corresponding RLE forms of X and Y, respectively. In this dissertation, first, we propose a simple algorithm for computing the LCS of X and Y in O(NM + min{ p_1, p_2 }) time, where p_1 and p_2 denote the numbers of elements in the bottom and right boundaries of the matched blocks, respectively. This new algorithm improves the previously known time bound O(min{nM, Nm}) and outperforms the time bounds O(NM log NM) or O((N+M+q) log (N+M+q)) for some cases, where q denotes the number of matched blocks. Next, we give an efficient algorithm for solving the CLCS problem, which is to find a common subsequences Z of X and Y such that a given constrained sequence P is a subsequence of Z and the length of Z is maximized. Suppose X, Y and P are all in RLE format, and the lengths of X, Y and P are n, m and r, respectively. Let N, M and R be the numbers of runs in X, Y, and P, respectively. We show that by RLE, the CLCS problem can be solved in O(NMr + min{q_1 r + q_4, q_2 r + q_5 }) time, where q_1 and q_2 denote the numbers of elements in the south and east boundaries of the partially matched blocks on the first layer, respectively, and q_4 and q_5 denote the numbers of elements of the west and north pillars in the bottom boundaries of all fully matched cuboids in the DP lattice, respectively. When the input strings have good compression ratios, our work obviously outperforms the previously known DP algorithms and the Hunt-Szymanski-like algorithms. Finally, we consider variations of the block edit distance problem that involve character insertions, character deletions, block copies and block deletions, for two given sequences X and Y. In this dissertation, three variations are defined with different measuring functions, which are P(EIS, C), P(EI, L) and P(EI, N). Then we show that with some preprocessing, the minimum block edit distances of these three variations can be obtained by dynamic programming in O(nm), O(nm log m) and O(nm^2) time, respectively, where n and m are the lengths of X and Y. run-length encoding longest common subsequence dynamic programming similarity design of algorithms block edit distance
6	A Clustering Method For The Problem Of Protein Subcellular Localization Bezek, Perit 01 December 2006 (has links) (PDF) In this study, the focus is on predicting the subcellular localization of a protein, since subcellular localization is helpful in understanding a protein&rsquo / s functions. Function of a protein may be estimated from its sequence. Motifs or conserved subsequences are strong indicators of function. In a given sample set of protein sequences known to perform the same function, a certain subsequence or group of subsequences should be common / that is, occurrence (frequency) of common subsequences should be high. Our idea is to find the common subsequences through clustering and use these common groups (implicit motifs) to classify proteins. To calculate the distance between two subsequences, traditional string edit distance is modified so that only replacement is allowed and the cost of replacement is related to an amino acid substitution matrix. Based on the modified string edit distance, spectral clustering embeds the subsequences into some transformed space for which the clustering problem is expected to become easier to solve. For a given protein sequence, distribution of its subsequences over the clusters is the feature vector which is subsequently fed to a classifier. The most important aspect if this approach is the use of spectral clustering based on modified string edit distance. QA Computer Software 76.75-76.765
7	An information theoretic approach to structured high-dimensional problems Das, Abhik Kumar 06 February 2014 (has links) A majority of the data transmitted and processed today has an inherent structured high-dimensional nature, either because of the process of encoding using high-dimensional codebooks for providing a systematic structure, or dependency of the data on a large number of agents or variables. As a result, many problem setups associated with transmission and processing of data have a structured high-dimensional aspect to them. This dissertation takes a look at two such problems, namely, communication over networks using network coding, and learning the structure of graphical representations like Markov networks using observed data, from an information-theoretic perspective. Such an approach yields intuition about good coding architectures as well as the limitations imposed by the high-dimensional framework. Th e dissertation studies the problem of network coding for networks having multiple transmission sessions, i.e., multiple users communicating with each other at the same time. The connection between such networks and the information-theoretic interference channel is examined, and the concept of interference alignment, derived from interference channel literature, is coupled with linear network coding to develop novel coding schemes off ering good guarantees on achievable throughput. In particular, two setups are analyzed – the first where each user requires data from only one user (multiple unicasts), and the second where each user requires data from potentially multiple users (multiple multicasts). It is demonstrated that one can achieve a rate equalling a signi ficant fraction of the maximal rate for each transmission session, provided certain constraints on the network topology are satisfi ed. Th e dissertation also analyzes the problem of learning the structure of Markov networks from observed samples – the learning problem is interpreted as a channel coding problem and its achievability and converse aspects are examined. A rate-distortion theoretic approach is taken for the converse aspect, and information-theoretic lower bounds on the number of samples, required for any algorithm to learn the Markov graph up to a pre-speci fied edit distance, are derived for ensembles of discrete and Gaussian Markov networks based on degree-bounded graphs. The problem of accurately learning the structure of discrete Markov networks, based on power-law graphs generated from the con figuration model, is also studied. The eff ect of power-law exponent value on the hardness of the learning problem is deduced from the converse aspect – it is shown that discrete Markov networks on power-law graphs with smaller exponent values require more number of samples to ensure accurate recovery of their underlying graphs for any learning algorithm. For the achievability aspect, an effi cient learning algorithm is designed for accurately reconstructing the structure of Ising model based on power-law graphs from the con figuration model; it is demonstrated that optimal number of samples su ffices for recovering the exact graph under certain constraints on the Ising model potential values. / text Networks Network coding Unicast Multicast Interference alignment Markov network Ising model Edit distance Power-law graph
8	Automated Biomedical Text Fragmentation In Support Of Biomedical Sentence Fragment Classification Salehi, Sara 29 September 2009 (has links) The past decade has seen a tremendous growth in the amount of biomedical literature, specifically in the area of bioinformatics. As a result, biomedical text categorization has become a central task for providing researchers with literature appropriate for their specific information needs. Pan et al. have explored a method that automatically identifies information-bearing sentence fragments within scientific text. Their proposed method aims to automatically classify sentence fragments into certain sets of categories defined to satisfy specific types of information needs. The categories are grouped into five different dimensions known as Focus, Polarity, Certainty, Evidence, and Trend. The reason that fragments are used as the unit of classification is that the class value along each of these dimensions can change mid-sentence. In order to automatically annotate sentence fragments along the five dimensions, automatically breaking sentences into fragments is a necessary step. The performance of the classifier depends on the sentence fragments. In this study, we investigate the problem of automatic fragmentation of biomedical sentences, which is a fundamental layer in the multi-dimensional fragment classification. In addition, we believe that our proposed fragmentation algorithm can be used in other domains such as sentiment analysis. The goal of sentiment analysis is often to classify the polarity (positive or negative) of a given text. Sentiment classification can be conducted at different levels such as document, sentence, or phrase (fragment) level. Our proposed fragmentation algorithm can be used as a prerequisite for phrase-level sentiment categorization which aims to automatically capture multiple sentiments within a sentence. / Thesis (Master, Computing) -- Queen's University, 2009-09-25 10:08:04.429 text fragmentation fragment automated biomedical text fragmentation edit distance with move
9	Diagnostika chyb v počítačových sítích založená na překlepech / Diagnosing Errors inside Computer Networks Based on the Typo Errors Bohuš, Michal January 2020 (has links) The goal of this diploma thesis is to create system for network data diagnostics based on detecting and correcting spelling errors. The system is intended to be used by network administrators as next diagnostics tool. As opposed to the primary use of detection and correction spelling error in common text, these methods are applied to network data, which are given by the user. Created system works with NetFlow data, pcap files or log files. Context is modeled with different created data categories. Dictionaries are used to verify the correctness of words, where each category uses its own. Finding a correction only according to the edit distance leads to many results and therefore a heuristic for evaluating candidates was proposed for selecting the right candidate. The created system was tested in terms of functionality and performance.
10	Spell checker for a Java Application / Stavningskontroll till en Java-applikation Viktorsson, Arvid, Kyrychenko, Illya January 2020 (has links) Many text-editor users depend on spellcheckers to correct their typographical errors. The absence of a spellchecker can create a negative experience for the user. In today's advanced technological environment spellchecking is an expected feature. 2Consiliate Business Solutions owns a Java application with a text-editor which does not have a spellchecker. This project aims to investigate and implement available techniques and algorithms for spellcheckers and automated word correction. During implementation, the techniques were tested for their performance and the best solutions were chosen for this project. All the techniques were gathered from earlier written literature on the topic and implemented in Java using default Java libraries. Analysis of the results proves that it is possible to create a complete spellchecker combining available techniques and that the quality of a spellchecker largely depends on a well defined dictionary. Spellchecker Java Trie edit distance Soundex damerau levenshtein Computer Sciences Datavetenskap (datalogi)

Search results