• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 27
  • 9
  • 7
  • 4
  • 3
  • 2
  • 1
  • 1
  • Tagged with
  • 63
  • 41
  • 12
  • 10
  • 9
  • 8
  • 7
  • 7
  • 7
  • 6
  • 6
  • 6
  • 5
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Supervised metric learning with generalization guarantees

Bellet, Aurélien 11 December 2012 (has links) (PDF)
In recent years, the crucial importance of metrics in machine learningalgorithms has led to an increasing interest in optimizing distanceand similarity functions using knowledge from training data to make them suitable for the problem at hand.This area of research is known as metric learning. Existing methods typically aim at optimizing the parameters of a given metric with respect to some local constraints over the training sample. The learned metrics are generally used in nearest-neighbor and clustering algorithms.When data consist of feature vectors, a large body of work has focused on learning a Mahalanobis distance, which is parameterized by a positive semi-definite matrix. Recent methods offer good scalability to large datasets.Less work has been devoted to metric learning from structured objects (such as strings or trees), because it often involves complex procedures. Most of the work has focused on optimizing a notion of edit distance, which measures (in terms of number of operations) the cost of turning an object into another.We identify two important limitations of current supervised metric learning approaches. First, they allow to improve the performance of local algorithms such as k-nearest neighbors, but metric learning for global algorithms (such as linear classifiers) has not really been studied so far. Second, and perhaps more importantly, the question of the generalization ability of metric learning methods has been largely ignored.In this thesis, we propose theoretical and algorithmic contributions that address these limitations. Our first contribution is the derivation of a new kernel function built from learned edit probabilities. Unlike other string kernels, it is guaranteed to be valid and parameter-free. Our second contribution is a novel framework for learning string and tree edit similarities inspired by the recent theory of (epsilon,gamma,tau)-good similarity functions and formulated as a convex optimization problem. Using uniform stability arguments, we establish theoretical guarantees for the learned similarity that give a bound on the generalization error of a linear classifier built from that similarity. In our third contribution, we extend the same ideas to metric learning from feature vectors by proposing a bilinear similarity learning method that efficiently optimizes the (epsilon,gamma,tau)-goodness. The similarity is learned based on global constraints that are more appropriate to linear classification. Generalization guarantees are derived for our approach, highlighting that our method minimizes a tighter bound on the generalization error of the classifier. Our last contribution is a framework for establishing generalization bounds for a large class of existing metric learning algorithms. It is based on a simple adaptation of the notion of algorithmic robustness and allows the derivation of bounds for various loss functions and regularizers.
42

The mat sat on the cat : investigating structure in the evaluation of order in machine translation

McCaffery, Martin January 2017 (has links)
We present a multifaceted investigation into the relevance of word order in machine translation. We introduce two tools, DTED and DERP, each using dependency structure to detect differences between the structures of machine-produced translations and human-produced references. DTED applies the principle of Tree Edit Distance to calculate edit operations required to convert one structure into another. Four variants of DTED have been produced, differing in the importance they place on words which match between the two sentences. DERP represents a more detailed procedure, making use of the dependency relations between words when evaluating the disparities between paths connecting matching nodes. In order to empirically evaluate DTED and DERP, and as a standalone contribution, we have produced WOJ-DB, a database of human judgments. Containing scores relating to translation adequacy and more specifically to word order quality, this is intended to support investigations into a wide range of translation phenomena. We report an internal evaluation of the information in WOJ-DB, then use it to evaluate variants of DTED and DERP, both to determine their relative merit and their strength relative to third-party baselines. We present our conclusions about the importance of structure to the tools and their relevance to word order specifically, then propose further related avenues of research suggested or enabled by our work.
43

Arquiteturas em hardware para o alinhamento local de sequências biológicas / Hardware architectures for local biological sequence alignment

Mallmann, Rafael Mendes January 2010 (has links)
Bancos de dados biológicos utilizados para comparação e alinhamento local de sequências tem crescido de forma exponencial. Isso popularizou programas que realizam buscas nesses bancos. As implementações dos algoritmos de alinhamento de sequências Smith- Waterman e distância Levenshtein demonstraram ser computacionalmente intensivas e, portanto, propícias para aceleração em hardware. Este trabalho descreve arquiteturas em hardware dedicado prototipadas para FPGA e ASIC para acelerar os algoritmos Smith- Waterman e distância Levenshtein mantendo os mesmos resultados obtidos por softwares. Descrevemos uma nova e eficiente unidade de processamento para o cálculo do Smith- Waterman utilizando affine gap. Também projetamos uma arquitetura que permite particionar as sequências de entrada para a distância Levenshtein em um array sistólico de tamanho fixo. Nossa implementação em FPGA para o Smith-Waterman acelera de 275 a 494 vezes o algoritmo em relação a um computador com processador de propósito geral. Ainda é 52 a 113% mais rápida em relação, segundo nosso conhecimento, as mais rápidas arquiteturas recentemente publicadas. / Bioinformatics databases used for sequence comparison and local sequence alignment are growing exponentially. This has popularized programs that carry out database searches. Current implementations of sequence alignment methods based on Smith- Waterman and Levenshtein distance have proven to be computationally intensive and, hence, amenable for hardware acceleration. This Msc. Thesis describes an FPGA and ASIC based hardware implementation designed to accelerate the Smith-Waterman and Levenshtein distance maintaining the same results yielded by general softwares. We describe an new efficient Smith-Waterman affine gap process element and a new architecture to partitioning and maping the Levenshtein distance into fixed size systolic arrays. Our FPGA Smith-Waterman implementation delivers 275 to 494-fold speed-up over a standard desktop computer and is also about 52 to 113% faster, to the best of our knowledge, than the fastest implementation in a most recent family of accelerators.
44

Arquiteturas em hardware para o alinhamento local de sequências biológicas / Hardware architectures for local biological sequence alignment

Mallmann, Rafael Mendes January 2010 (has links)
Bancos de dados biológicos utilizados para comparação e alinhamento local de sequências tem crescido de forma exponencial. Isso popularizou programas que realizam buscas nesses bancos. As implementações dos algoritmos de alinhamento de sequências Smith- Waterman e distância Levenshtein demonstraram ser computacionalmente intensivas e, portanto, propícias para aceleração em hardware. Este trabalho descreve arquiteturas em hardware dedicado prototipadas para FPGA e ASIC para acelerar os algoritmos Smith- Waterman e distância Levenshtein mantendo os mesmos resultados obtidos por softwares. Descrevemos uma nova e eficiente unidade de processamento para o cálculo do Smith- Waterman utilizando affine gap. Também projetamos uma arquitetura que permite particionar as sequências de entrada para a distância Levenshtein em um array sistólico de tamanho fixo. Nossa implementação em FPGA para o Smith-Waterman acelera de 275 a 494 vezes o algoritmo em relação a um computador com processador de propósito geral. Ainda é 52 a 113% mais rápida em relação, segundo nosso conhecimento, as mais rápidas arquiteturas recentemente publicadas. / Bioinformatics databases used for sequence comparison and local sequence alignment are growing exponentially. This has popularized programs that carry out database searches. Current implementations of sequence alignment methods based on Smith- Waterman and Levenshtein distance have proven to be computationally intensive and, hence, amenable for hardware acceleration. This Msc. Thesis describes an FPGA and ASIC based hardware implementation designed to accelerate the Smith-Waterman and Levenshtein distance maintaining the same results yielded by general softwares. We describe an new efficient Smith-Waterman affine gap process element and a new architecture to partitioning and maping the Levenshtein distance into fixed size systolic arrays. Our FPGA Smith-Waterman implementation delivers 275 to 494-fold speed-up over a standard desktop computer and is also about 52 to 113% faster, to the best of our knowledge, than the fastest implementation in a most recent family of accelerators.
45

Arquiteturas em hardware para o alinhamento local de sequências biológicas / Hardware architectures for local biological sequence alignment

Mallmann, Rafael Mendes January 2010 (has links)
Bancos de dados biológicos utilizados para comparação e alinhamento local de sequências tem crescido de forma exponencial. Isso popularizou programas que realizam buscas nesses bancos. As implementações dos algoritmos de alinhamento de sequências Smith- Waterman e distância Levenshtein demonstraram ser computacionalmente intensivas e, portanto, propícias para aceleração em hardware. Este trabalho descreve arquiteturas em hardware dedicado prototipadas para FPGA e ASIC para acelerar os algoritmos Smith- Waterman e distância Levenshtein mantendo os mesmos resultados obtidos por softwares. Descrevemos uma nova e eficiente unidade de processamento para o cálculo do Smith- Waterman utilizando affine gap. Também projetamos uma arquitetura que permite particionar as sequências de entrada para a distância Levenshtein em um array sistólico de tamanho fixo. Nossa implementação em FPGA para o Smith-Waterman acelera de 275 a 494 vezes o algoritmo em relação a um computador com processador de propósito geral. Ainda é 52 a 113% mais rápida em relação, segundo nosso conhecimento, as mais rápidas arquiteturas recentemente publicadas. / Bioinformatics databases used for sequence comparison and local sequence alignment are growing exponentially. This has popularized programs that carry out database searches. Current implementations of sequence alignment methods based on Smith- Waterman and Levenshtein distance have proven to be computationally intensive and, hence, amenable for hardware acceleration. This Msc. Thesis describes an FPGA and ASIC based hardware implementation designed to accelerate the Smith-Waterman and Levenshtein distance maintaining the same results yielded by general softwares. We describe an new efficient Smith-Waterman affine gap process element and a new architecture to partitioning and maping the Levenshtein distance into fixed size systolic arrays. Our FPGA Smith-Waterman implementation delivers 275 to 494-fold speed-up over a standard desktop computer and is also about 52 to 113% faster, to the best of our knowledge, than the fastest implementation in a most recent family of accelerators.
46

Combinatorial aspects of genome rearrangements and haplotype networks / Aspects combinatoires des réarrangements génomiques et des réseaux d'haplotypes

Labarre, Anthony 12 September 2008 (has links)
The dissertation covers two problems motivated by computational biology: genome rearrangements, and haplotype networks.<p><p>Genome rearrangement problems are a particular case of edit distance problems, where one seeks to transform two given objects into one another using as few operations as possible, with the additional constraint that the set of allowed operations is fixed beforehand; we are also interested in computing the corresponding distances between those objects, i.e. merely computing the minimum number of operations rather than an optimal sequence. Genome rearrangement problems can often be formulated as sorting problems on permutations (viewed as linear orderings of {1,2,n}) using as few (allowed) operations as possible. In this thesis, we focus among other operations on ``transpositions', which displace intervals of a permutation. Many questions related to sorting by transpositions are open, related in particular to its computational complexity. We use the disjoint cycle decomposition of permutations, rather than the ``standard tools' used in genome rearrangements, to prove new upper bounds on the transposition distance, as well as formulae for computing the exact distance in polynomial time in many cases. This decomposition also allows us to solve a counting problem related to the ``cycle graph' of Bafna and Pevzner, and to construct a general framework for obtaining lower bounds on any edit distance between permutations by recasting their computation as factorisation problems on related even permutations.<p><p>Haplotype networks are graphs in which a subset of vertices is labelled, used in comparative genomics as an alternative to trees. We formalise a new method due to Cassens, Mardulyn and Milinkovitch, which consists in building a graph containing a given set of partially labelled trees and with as few edges as possible. We give exact algorithms for solving the problem on two graphs, with an exponential running time in the general case but with a polynomial running time if at least one of the graphs belong to a particular class.<p>/<p>La thèse couvre deux problèmes motivés par la biologie: l'étude des réarrangements génomiques, et celle des réseaux d'haplotypes.<p><p>Les problèmes de réarrangements génomiques sont un cas particulier des problèmes de distances d'édition, où l'on cherche à transformer un objet en un autre en utilisant le plus petit nombre possible d'opérations, les opérations autorisées étant fixées au préalable; on s'intéresse également à la distance entre les deux objets, c'est-à-dire au calcul du nombre d'opérations dans une séquence optimale plutôt qu'à la recherche d'une telle séquence. Les problèmes de réarrangements génomiques peuvent souvent s'exprimer comme des problèmes de tri de permutations (vues comme des arrangements linéaires de {1,2,n}) en utilisant le plus petit nombre d'opérations (autorisées) possible. Nous examinons en particulier les ``transpositions', qui déplacent un intervalle de la permutation. Beaucoup de problèmes liés au tri par transpositions sont ouverts, en particulier sa complexité algorithmique. Nous nous écartons des ``outils standards' utilisés dans le domaine des réarrangements génomiques, et utilisons la décomposition en cycles disjoints des permutations pour prouver de nouvelles majorations sur la distance des transpositions ainsi que des formules permettant de calculer cette distance en temps polynomial dans de nombreux cas. Cette décomposition nous sert également à résoudre un problème d'énumération concernant le ``graphe des cycles' de Bafna et Pevzner, et à construire une technique générale permettant d'obtenir de nouvelles minorations en reformulant tous les problèmes de distances d'édition sur les permutations en termes de factorisations de permutations paires associées.<p><p>Les réseaux d'haplotypes sont des graphes dont une partie des sommets porte des étiquettes, utilisés en génomique comparative quand les arbres sont trop restrictifs, ou quand l'on ne peut choisir une ``meilleure' topologie parmi un ensemble donné d'arbres. Nous formalisons une nouvelle méthode due à Cassens, Mardulyn et Milinkovitch, qui consiste à construire un graphe contenant tous les arbres partiellement étiquetés donnés et possédant le moins d'arêtes possible, et donnons des algorithmes résolvant le problème de manière optimale sur deux graphes, dont le temps d'exécution est exponentiel en général mais polynomial dans quelques cas que nous caractérisons.<p> / Doctorat en Sciences / info:eu-repo/semantics/nonPublished
47

Vision-Based Human Directed Robot Guidance

Arthur, Richard B. 11 October 2004 (has links) (PDF)
This paper describes methods to track a user-defined point in the vision of a robot as it drives forward. This tracking allows a robot to keep itself directed at that point while driving so that it can get to that user-defined point. I develop and present two new multi-scale algorithms for tracking arbitrary points between two frames of video, as well as through a video sequence. The multi-scale algorithms do not use the traditional pyramid image, but instead use a data structure called an integral image (also known as a summed area table). The first algorithm uses edge-detection to track the movement of the tracking point between frames of video. The second algorithm uses a modified version of the Moravec operator to track the movement of the tracking point between frames of video. Both of these algorithms can track the user-specified point very quickly. Implemented on a conventional desktop, tracking can proceed at a rate of at least 20 frames per second.
48

Last Word in Art Shades: The Textual State of James Joyce's Ulysses

Tully-Needler, Kelly Lynn 06 March 2008 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / James Joyce’s Ulysses is a work of art that engendered scandal in every stage of its production, dissemination, and reception. The work is now hailed as the prose monument of modernism, a twentieth-century masterpiece, and revolutionary in its stylistic technique, its foregrounding of language and psychological drama, and its ambiguity. Ulysses is, in truth, a simple tale, about a lifetime of one day, in a world of one place, in the lives of one people, played out on a stage of pages. The telling of the tale is far from simple—it is among the greatest literary artifacts of our cultural heritage. But the text of Ulysses continues to be entangled in the tension of its status as both a literary artifact, created by an artist, and a cultural artifact, influenced by the aspects of its currency. Among the many questions the novel begs is, who controls the meaning of a work of literary art? This thesis begins to answer that question. Chapter 1 surveys available materials and outlines four waves in the history of textual scholarship of Ulysses. This chapter reads like the prose version of a library catalogue. Sorry, it is a symptom of academese. Chapter 2 outlines the history of censorship and suppression of Ulysses. Chapter 3 gives a historical context to legalizing the work and discusses the implications of the ban upon the development and reliability of the text. Chapter 4 outlines the second scandal of Ulysses, at the close of the twentieth century, now commonly referred to as the Joyce Wars. Chapter 5 discusses the influences upon Gabler’s editorial method and the resultant text. Together, these chapters tell the story of the book's creation and life in print.
49

Approximation of OLAP queries on data warehouses / Approximation aux requêtes OLAP sur les entrepôts de données

Cao, Phuong Thao 20 June 2013 (has links)
Nous étudions les réponses proches à des requêtes OLAP sur les entrepôts de données. Nous considérons les réponses relatives aux requêtes OLAP sur un schéma, comme les distributions avec la distance L1 et rapprocher les réponses sans stocker totalement l'entrepôt de données. Nous présentons d'abord trois méthodes spécifiques: l'échantillonnage uniforme, l'échantillonnage basé sur la mesure et le modèle statistique. Nous introduisons également une distance d'édition entre les entrepôts de données avec des opérations d'édition adaptées aux entrepôts de données. Puis, dans l'échange de données OLAP, nous étudions comment échantillonner chaque source et combiner les échantillons pour rapprocher toutes requêtes OLAP. Nous examinons ensuite un contexte streaming, où un entrepôt de données est construit par les flux de différentes sources. Nous montrons une borne inférieure de la taille de la mémoire nécessaire aux requêtes approximatives. Dans ce cas, nous avons les réponses pour les requêtes OLAP avec une mémoire finie. Nous décrivons également une méthode pour découvrir les dépendances statistique, une nouvelle notion que nous introduisons. Nous recherchons ces dépendances en basant sur l'arbre de décision. Nous appliquons la méthode à deux entrepôts de données. Le premier simule les données de capteurs, qui fournissent des paramètres météorologiques au fil du temps et de l'emplacement à partir de différentes sources. Le deuxième est la collecte de RSS à partir des sites web sur Internet. / We study the approximate answers to OLAP queries on data warehouses. We consider the relative answers to OLAP queries on a schema, as distributions with the L1 distance and approximate the answers without storing the entire data warehouse. We first introduce three specific methods: the uniform sampling, the measure-based sampling and the statistical model. We introduce also an edit distance between data warehouses with edit operations adapted for data warehouses. Then, in the OLAP data exchange, we study how to sample each source and combine the samples to approximate any OLAP query. We next consider a streaming context, where a data warehouse is built by streams of different sources. We show a lower bound on the size of the memory necessary to approximate queries. In this case, we approximate OLAP queries with a finite memory. We describe also a method to discover the statistical dependencies, a new notion we introduce. We are looking for them based on the decision tree. We apply the method to two data warehouses. The first one simulates the data of sensors, which provide weather parameters over time and location from different sources. The second one is the collection of RSS from the web sites on Internet.
50

Inexact graph matching : application to 2D and 3D Pattern Recognition / Appariement inexact de graphes : application à la reconnaissance de formes 2D et 3D

Madi, Kamel 13 December 2016 (has links)
Les Graphes sont des structures mathématiques puissantes constituant un outil de modélisation universel utilisé dans différents domaines de l'informatique, notamment dans le domaine de la reconnaissance de formes. L'appariement de graphes est l'opération principale dans le processus de la reconnaissance de formes à base de graphes. Dans ce contexte, trouver des solutions d'appariement de graphes, garantissant l'optimalité en termes de précision et de temps de calcul est un problème de recherche difficile et d'actualité. Dans cette thèse, nous nous intéressons à la résolution de ce problème dans deux domaines : la reconnaissance de formes 2D et 3D. Premièrement, nous considérons le problème d'appariement de graphes géométriques et ses applications sur la reconnaissance de formes 2D. Dance cette première partie, la reconnaissance des Kites (structures archéologiques) est l'application principale considérée. Nous proposons un "framework" complet basé sur les graphes pour la reconnaissance des Kites dans des images satellites. Dans ce contexte, nous proposons deux contributions. La première est la proposition d'un processus automatique d'extraction et de transformation de Kites a partir d'images réelles en graphes et un processus de génération aléatoire de graphes de Kites synthétiques. En utilisant ces deux processus, nous avons généré un benchmark de graphes de Kites (réels et synthétiques) structuré en 3 niveaux de bruit. La deuxième contribution de cette première partie, est la proposition d'un nouvel algorithme d'appariement pour les graphes géométriques et par conséquent pour les Kites. L'approche proposée combine les invariants de graphes au calcul de l'édition de distance géométrique. Deuxièmement, nous considérons le problème de reconnaissance des formes 3D ou nous nous intéressons à la reconnaissance d'objets déformables représentés par des graphes c.à.d. des tessellations de triangles. Nous proposons une décomposition des tessellations de triangles en un ensemble de sous structures que nous appelons triangle-étoiles. En se basant sur cette décomposition, nous proposons un nouvel algorithme d'appariement de graphes pour mesurer la distance entre les tessellations de triangles. L'algorithme proposé assure un nombre minimum de structures disjointes, offre une meilleure mesure de similarité en couvrant un voisinage plus large et utilise un ensemble de descripteurs qui sont invariants ou au moins tolérants aux déformations les plus courantes. Finalement, nous proposons une approche plus générale de l'appariement de graphes. Cette approche est fondée sur une nouvelle formalisation basée sur le problème de mariage stable. L'approche proposée est optimale en terme de temps d'exécution, c.à.d. la complexité est quadratique O(n2), et flexible en terme d'applicabilité (2D et 3D). Cette approche se base sur une décomposition en sous structures suivie par un appariement de ces structures en utilisant l'algorithme de mariage stable. L'analyse de la complexité des algorithmes proposés et l'ensemble des expérimentations menées sur les bases de graphes des Kites (réelle et synthétique) et d'autres bases de données standards (2D et 3D) attestent l'efficacité, la haute performance et la précision des approches proposées et montrent qu'elles sont extensibles et générales / Graphs are powerful mathematical modeling tools used in various fields of computer science, in particular, in Pattern Recognition. Graph matching is the main operation in Pattern Recognition using graph-based approach. Finding solutions to the problem of graph matching that ensure optimality in terms of accuracy and time complexity is a difficult research challenge and a topical issue. In this thesis, we investigate the resolution of this problem in two fields: 2D and 3D Pattern Recognition. Firstly, we address the problem of geometric graphs matching and its applications on 2D Pattern Recognition. Kite (archaeological structures) recognition in satellite images is the main application considered in this first part. We present a complete graph based framework for Kite recognition on satellite images. We propose mainly two contributions. The first one is an automatic process transforming Kites from real images into graphs and a process of generating randomly synthetic Kite graphs. This allowing to construct a benchmark of Kite graphs (real and synthetic) structured in different level of deformations. The second contribution in this part, is the proposition of a new graph similarity measure adapted to geometric graphs and consequently for Kite graphs. The proposed approach combines graph invariants with a geometric graph edit distance computation. Secondly, we address the problem of deformable 3D objects recognition, represented by graphs, i.e., triangular tessellations. We propose a new decomposition of triangular tessellations into a set of substructures that we call triangle-stars. Based on this new decomposition, we propose a new algorithm of graph matching to measure the distance between triangular tessellations. The proposed algorithm offers a better measure by assuring a minimum number of triangle-stars covering a larger neighbourhood, and uses a set of descriptors which are invariant or at least oblivious under most common deformations. Finally, we propose a more general graph matching approach founded on a new formalization based on the stable marriage problem. The proposed approach is optimal in term of execution time, i.e. the time complexity is quadratic O(n2) and flexible in term of applicability (2D and 3D). The analyze of the time complexity of the proposed algorithms and the extensive experiments conducted on Kite graph data sets (real and synthetic) and standard data sets (2D and 3D) attest the effectiveness, the high performance and accuracy of the proposed approaches and show that the proposed approaches are extensible and quite general

Page generated in 0.0482 seconds