Global ETD Search

31	Distância de edição para estruturas de dados Silva Junior, Paulo Matias da January 2018 (has links) Orientador: Prof. Dr. Rodrigo de Alencar Hausen / Coorientador: Prof. Dr. Jerônimo Cordoni Pellegrini / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Ciência da Computação, Santo André, 2018. / O problema de distância de edição geral de árvores consiste na comparação de duas Árvores enraizadas e rotuladas a partir de operações de edição tais como a deleção e a inserção de nós, buscando obter o menor custo necessário para uma sequência de operações que transforme uma árvore em outra. Neste trabalho provamos que encontrar a maior subfloresta comum pela deleção de nós dentre duas árvores dadas, chamada de LCS-floresta, é um caso particular de distância de edição. Para o problema de encontrar a subárvore comum máxima entre duas árvores, existe uma demonstração feita por Valiente[Val02] de que esse problema é um caso particular de distância de edição considerando uma condição que preserva fortemente a ancestralidade entre os pares de nós das árvores comparadas. Realizamos uma demonstração alternativa para esse problema que toma por condição a existência de caminhos entre os pares de nós. Também estabelecemos uma hierarquia que relaciona as distâncias obtidas como solução desses três problemas, mostrando que a distância que se obtém como solução do problema de edição mais geral é limite inferior para a distância encontrada como solução do LCS-floresta, e esta última é limite inferior para a distância obtida com a subárvore comum máxima. Na segunda parte do trabalho, descrevemos as estruturas de dados como árvores enraizadas e rotuladas, assim pudemos aplicar o conceito de distância de edição e, com isso, analisar os custos para comparar uma estrutura de dados consigo mesma após uma sequência de operações. Para tal, modelamos os custos das operações nas árvores das respectivas estruturas considerando informações como o número de nós da árvore e o nível do nó que passou pela operação. Nos modelos de pilha, lista ligada e árvore de busca binária as distâncias de edição foram relacionadas às complexidades de tempo de se operar nessas estruturas. Adaptamos também os custos operacionais para tries e árvores B. Realizamos experimentos para calcular as distâncias de edição de uma estrutura de dados consigo mesma após uma sequência aleatória de operações com o intuito de verificar como essas medidas de distância atuavam sobre cada estrutura. Observamos nesses testes que o tamanho da sequência influencia na distância final. Também verificamos que os custos operacionais que consideram o nível do nó operado obtinham distâncias menores se comparadas com aquelas obtidas pelo custo de tamanho da estrutura. / The general tree edit distance problem consists in the comparison between two rooted labelled trees using operations which change one tree into another. The tree edit distance is defined as the minimum cost sequence of edit operations needed to transform two trees. The edit operations studied are inserting, deleting and replacing nodes. In this work, we prove that find the largest common subforest between trees restricted to node deletion, called LCS-forest, is a particular case of tree edit distance. Valiente [Val02] proved that find the maximum common subtree is a particular case of tree edit distance considering a ancestrality preserving condition, while we present an alternative proof using paths between pair of nodes. These three problems of distance are shown related in a hierarchy, where the general tree edit distance is a lower bound of the distance value obtained from LCS-forest solution. The latter is a lower bound of the distance obtained from maximum common subtree solution. In the second part of this work, we describe data structures as rooted labelled trees. Then it is possible to compare a data structure with itself after a sequence of operations applying the tree edit distance. For this, the model of operational cost of a tree considers information like number of nodes in the tree and level of operated node. The data structures modeled as trees were stack, linked list and binary search tree. The models associate the edit distance with the time complexities of these data structures operations. The operational costs of tries and B-trees also were adaptated for the edit distances. Some experiments to compute the distances are presented. They compare each data structure with itself after random sequences of operations. The results show how each proposed measure operate on the respective structure. The sequence size was an influence factor on distance values. For the operational costs, the cost defined as the level of operated nodes obtain smaller distances compared to the case of cost defined as the structure size. DISTÂNCIA DE EDIÇÃO ESTRUTURAS DE DADOS ÁRVORES EDIT DISTANCE DATA STRUCTURES TREES
32	Approximate string matching distance for image classification / Distance d’édition entre chaines d’histogrammes pour la classification d’images Nguyen, Hong-Thinh 29 August 2014 (has links) L'augmentation exponentielle du nombre d'images nécessite des moyens efficaces pour les classer en fonction de leur contenu visuel. Le sac de mot visuel (Bag-Of-visual-Words, BOW), en raison de sa simplicité et de sa robustesse, devient l'approche la plus populaire. Malheureusement, cette approche ne prend pas en compte de l'information spatiale, ce qui joue un rôle important dans les catégories de modélisation d'image. Récemment, Lazebnik ont introduit la représentation pyramidale spatiale (Spatial Pyramid Representation, SPR) qui a incorporé avec succès l'information spatiale dans le modèle BOW. Néanmoins, ce système de correspondance rigide empêche la SPR de gérer les variations et les transformations d'image. L'objectif principal de cette thèse est d'étudier un modèle de chaîne de correspondance plus souple qui prend l'avantage d'histogrammes de BOW locaux et se rapproche de la correspondance de la chaîne. Notre première contribution est basée sur une représentation en chaîne et une nouvelle distance d'édition (String Matching Distance, SMD) bien adapté pour les chaînes de l'histogramme qui peut calculer efficacement par programmation dynamique. Un noyau d'édition correspondant comprenant à la fois d'une pondération et d'un système pyramidal est également dérivée. La seconde contribution est une version étendue de SMD qui remplace les opérations d'insertion et de suppression par les opérations de fusion entre les symboles successifs, ce qui apporte de la souplesse labours et correspond aux images. Toutes les distances proposées sont évaluées sur plusieurs jeux de données tâche de classification et sont comparés avec plusieurs approches concurrentes / The exponential increasing of the number of images requires efficient ways to classify them based on their visual content. The most successful and popular approach is the Bag of visual Word (BoW) representation due to its simplicity and robustness. Unfortunately, this approach fails to capture the spatial image layout, which plays an important roles in modeling image categories. Recently, Lazebnik et al (2006) introduced the Spatial Pyramid Representation (SPR) which successfully incorporated spatial information into the BoW model. The idea of their approach is to split the image into a pyramidal grid and to represent each grid cell as a BoW. Assuming that images belonging to the same class have similar spatial distributions, it is possible to use a pairwise matching as similarity measurement. However, this rigid matching scheme prevents SPR to cope with image variations and transformations. The main objective of this dissertation is to study a more flexible string matching model. Keeping the idea of local BoW histograms, we introduce a new class of edit distance to compare strings of local histograms. Our first contribution is a string based image representation model and a new edit distance (called SMD for String Matching Distance) well suited for strings composed of symbols which are local BoWs. The new distance benefits from an efficient Dynamic Programming algorithm. A corresponding edit kernel including both a weighting and a pyramidal scheme is also derived. The performance is evaluated on classification tasks and compared to the standard method and several related methods. The new method outperforms other methods thanks to its ability to detect and ignore identical successive regions inside images. Our second contribution is to propose an extended version of SMD replacing insertion and deletion operations by merging operations between successive symbols. In this approach, the number of sub regions ie. the grid divisions may vary according to the visual content. We describe two algorithms to compute this merge-based distance. The first one is a greedy version which is efficient but can produce a non optimal edit script. The other one is an optimal version but it requires a 4th degree polynomial complexity. All the proposed distances are evaluated on several datasets and are shown to outperform comparable existing methods. Distance d'édition Sac de mots visuels Correspondance spatiale Classification d'images Edit distance Bag of visual words Spatial matching Image classification
33	Spell checkers and correctors : a unified treatment Liang, Hsuan Lorraine 25 June 2009 (has links) The aim of this dissertation is to provide a unified treatment of various spell checkers and correctors. Firstly, the spell checking and correcting problems are formally described in mathematics in order to provide a better understanding of these tasks. An approach that is similar to the way in which denotational semantics used to describe programming languages is adopted. Secondly, the various attributes of existing spell checking and correcting techniques are discussed. Extensive studies on selected spell checking/correcting algorithms and packages are then performed. Lastly, an empirical investigation of various spell checking/correcting packages is presented. It provides a comparison and suggests a classification of these packages in terms of their functionalities, implementation strategies, and performance. The investigation was conducted on packages for spell checking and correcting in English as well as in Northern Sotho and Chinese. The classification provides a unified presentation of the strengths and weaknesses of the techniques studied in the research. The findings provide a better understanding of these techniques in order to assist in improving some existing spell checking/correcting applications and future spell checking/correcting package designs and implementations. / Dissertation (MSc)--University of Pretoria, 2009. / Computer Science / unrestricted Performance N-gram Fsa Formal concept analysis Edit distance Dictionary lookup Classification Spell checking Spell correcting UCTD
34	Nestacionární pohyb tuhého tělesa v kapalině / Unsteady movement of a stiff body in a liquid Kubo, Miroslav January 2011 (has links) This diploma thesis deals with computing of edit influences on assigned stiff body from the flow of inviscid liquid. There are derived equations for computation of the influences during translational or torsional wobble and follow-up calculation of the units of their tensors.
35	Mining for Frequent Community Structures using Approximate Graph Matching Kolli, Lakshmi Priya 15 July 2021 (has links) No description available. Computer Science Frequent subgraph mining Approximate Graph Matching Random Walks Markov Clustering Graph Edit Distance KL Divergence
36	A new traversal method for virtual reality: overcoming the drawbacks of common methods Smink, Karl A 01 May 2020 (has links) One of the biggest issues facing VR as a platform is the limitation of the user’s physical space. Not everyone has a lab, empty warehouse, or open space in their home or office, and even if they do, the hardware also limits the physical space the user can take advantage of. Fitting the entirety of the environment within few square meters is a strict limitation for many applications. A method of moving the user within a larger space is needed, but current methods come with drawbacks. Developing a new movement method that avoids these drawbacks will help ensure a better experience for the user. virtual reality teleportation virtual reality sickness cognitive mapping edit blindness HTC Vive virtual reality toolkit (VRTK) memory spatial awareness
37	Spelling Normalisation and Linguistic Analysis of Historical Text for Information Extraction Pettersson, Eva January 2016 (has links) Historical text constitutes a rich source of information for historians and other researchers in humanities. Many texts are however not available in an electronic format, and even if they are, there is a lack of NLP tools designed to handle historical text. In my thesis, I aim to provide a generic workflow for automatic linguistic analysis and information extraction from historical text, with spelling normalisation as a core component in the pipeline. In the spelling normalisation step, the historical input text is automatically normalised to a more modern spelling, enabling the use of existing taggers and parsers trained on modern language data in the succeeding linguistic analysis step. In the final information extraction step, certain linguistic structures are identified based on the annotation labels given by the NLP tools, and ranked in accordance with the specific information need expressed by the user. An important consideration in my implementation is that the pipeline should be applicable to different languages, time periods, genres, and information needs by simply substituting the language resources used in each module. Furthermore, the reuse of existing NLP tools developed for the modern language is crucial, considering the lack of linguistically annotated historical data combined with the high variability in historical text, making it hard to train NLP tools specifically aimed at analysing historical text. In my evaluation, I show that spelling normalisation can be a very useful technique for easy access to historical information content, even in cases where there is little (or no) annotated historical training data available. For the specific information extraction task of automatically identifying verb phrases describing work in Early Modern Swedish text, 91 out of the 100 top-ranked instances are true positives in the best setting. NLP for historical text spelling normalisation digital humanities information extraction SMT Levenshtein edit distance language technology computational linguistics
38	Supervised metric learning with generalization guarantees / Apprentissage supervisé de métriques avec garanties en généralisation Bellet, Aurélien 11 December 2012 (has links) Ces dernières années, l'importance cruciale des métriques en apprentissage automatique a mené à un intérêt grandissant pour l'optimisation de distances et de similarités en utilisant l'information contenue dans des données d'apprentissage pour les rendre adaptées au problème traité. Ce domaine de recherche est souvent appelé apprentissage de métriques. En général, les méthodes existantes optimisent les paramètres d'une métrique devant respecter des contraintes locales sur les données d'apprentissage. Les métriques ainsi apprises sont généralement utilisées dans des algorithmes de plus proches voisins ou de clustering.Concernant les données numériques, beaucoup de travaux ont porté sur l'apprentissage de distance de Mahalanobis, paramétrisée par une matrice positive semi-définie. Les méthodes récentes sont capables de traiter des jeux de données de grande taille.Moins de travaux ont été dédiés à l'apprentissage de métriques pour les données structurées (comme les chaînes ou les arbres), car cela implique souvent des procédures plus complexes. La plupart des travaux portent sur l'optimisation d'une notion de distance d'édition, qui mesure (en termes de nombre d'opérations) le coût de transformer un objet en un autre.Au regard de l'état de l'art, nous avons identifié deux limites importantes des approches actuelles. Premièrement, elles permettent d'améliorer la performance d'algorithmes locaux comme les k plus proches voisins, mais l'apprentissage de métriques pour des algorithmes globaux (comme les classifieurs linéaires) n'a pour l'instant pas été beaucoup étudié. Le deuxième point, sans doute le plus important, est que la question de la capacité de généralisation des méthodes d'apprentissage de métriques a été largement ignorée.Dans cette thèse, nous proposons des contributions théoriques et algorithmiques qui répondent à ces limites. Notre première contribution est la construction d'un nouveau noyau construit à partir de probabilités d'édition apprises. A l'inverse d'autres noyaux entre chaînes, sa validité est garantie et il ne comporte aucun paramètre. Notre deuxième contribution est une nouvelle approche d'apprentissage de similarités d'édition pour les chaînes et les arbres inspirée par la théorie des (epsilon,gamma,tau)-bonnes fonctions de similarité et formulée comme un problème d'optimisation convexe. En utilisant la notion de stabilité uniforme, nous établissons des garanties théoriques pour la similarité apprise qui donne une borne sur l'erreur en généralisation d'un classifieur linéaire construit à partir de cette similarité. Dans notre troisième contribution, nous étendons ces principes à l'apprentissage de métriques pour les données numériques en proposant une méthode d'apprentissage de similarité bilinéaire qui optimise efficacement l'(epsilon,gamma,tau)-goodness. La similarité est apprise sous contraintes globales, plus appropriées à la classification linéaire. Nous dérivons des garanties théoriques pour notre approche, qui donnent de meilleurs bornes en généralisation pour le classifieur que dans le cas des données structurées. Notre dernière contribution est un cadre théorique permettant d'établir des bornes en généralisation pour de nombreuses méthodes existantes d'apprentissage de métriques. Ce cadre est basé sur la notion de robustesse algorithmique et permet la dérivation de bornes pour des fonctions de perte et des régulariseurs variés / In recent years, the crucial importance of metrics in machine learningalgorithms has led to an increasing interest in optimizing distanceand similarity functions using knowledge from training data to make them suitable for the problem at hand.This area of research is known as metric learning. Existing methods typically aim at optimizing the parameters of a given metric with respect to some local constraints over the training sample. The learned metrics are generally used in nearest-neighbor and clustering algorithms.When data consist of feature vectors, a large body of work has focused on learning a Mahalanobis distance, which is parameterized by a positive semi-definite matrix. Recent methods offer good scalability to large datasets.Less work has been devoted to metric learning from structured objects (such as strings or trees), because it often involves complex procedures. Most of the work has focused on optimizing a notion of edit distance, which measures (in terms of number of operations) the cost of turning an object into another.We identify two important limitations of current supervised metric learning approaches. First, they allow to improve the performance of local algorithms such as k-nearest neighbors, but metric learning for global algorithms (such as linear classifiers) has not really been studied so far. Second, and perhaps more importantly, the question of the generalization ability of metric learning methods has been largely ignored.In this thesis, we propose theoretical and algorithmic contributions that address these limitations. Our first contribution is the derivation of a new kernel function built from learned edit probabilities. Unlike other string kernels, it is guaranteed to be valid and parameter-free. Our second contribution is a novel framework for learning string and tree edit similarities inspired by the recent theory of (epsilon,gamma,tau)-good similarity functions and formulated as a convex optimization problem. Using uniform stability arguments, we establish theoretical guarantees for the learned similarity that give a bound on the generalization error of a linear classifier built from that similarity. In our third contribution, we extend the same ideas to metric learning from feature vectors by proposing a bilinear similarity learning method that efficiently optimizes the (epsilon,gamma,tau)-goodness. The similarity is learned based on global constraints that are more appropriate to linear classification. Generalization guarantees are derived for our approach, highlighting that our method minimizes a tighter bound on the generalization error of the classifier. Our last contribution is a framework for establishing generalization bounds for a large class of existing metric learning algorithms. It is based on a simple adaptation of the notion of algorithmic robustness and allows the derivation of bounds for various loss functions and regularizers. Apprentissage de métriques Apprentissage statistique Optimisation convexe Classification Données structurées Distance d'édition Bornes en généralisation Metric learning Statistical learning Convex optimization Classification Structured data Edit distance Generalization bounds
39	Approximation of OLAP queries on data warehouses Cao, Phuong Thao 20 June 2013 (has links) (PDF) We study the approximate answers to OLAP queries on data warehouses. We consider the relative answers to OLAP queries on a schema, as distributions with the L1 distance and approximate the answers without storing the entire data warehouse. We first introduce three specific methods: the uniform sampling, the measure-based sampling and the statistical model. We introduce also an edit distance between data warehouses with edit operations adapted for data warehouses. Then, in the OLAP data exchange, we study how to sample each source and combine the samples to approximate any OLAP query. We next consider a streaming context, where a data warehouse is built by streams of different sources. We show a lower bound on the size of the memory necessary to approximate queries. In this case, we approximate OLAP queries with a finite memory. We describe also a method to discover the statistical dependencies, a new notion we introduce. We are looking for them based on the decision tree. We apply the method to two data warehouses. The first one simulates the data of sensors, which provide weather parameters over time and location from different sources. The second one is the collection of RSS from the web sites on Internet. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre OLAP Approximate query answering OLAP data exchange Streaming data Edit distance Sampling algorithm Statistical dependencies Statistical model
40	The final final final cut : Fan edits och hur de samverkar med filmindustrin Pontén, Joon January 2011 (has links) Begreppet ”fan edits” betecknar filmer som klipps om av fans, vilka är missnöjda med hur en adaption för vita duken som gjorts. I min uppsats vill jag påvisa dels hur samspelet mellan fans och filmmakare/filmbolag sett och ser ut, dels försöka klargöra varför copyright/fair use är så knepigt att applicera på området. Fan edit fan fiction fan films Star Wars Lord of the Rings A.I. Kubrick George Lucas Lucasfilm Tolkien David Lynch Dune fair use copyright

Search results