Spelling suggestions: "subject:"free dit 2distance"" "subject:"free dit 4distance""
1 |
Comparing XML Documents as Reference-aware Labeled Ordered TreesMikhaiel, Rimon A. E. Unknown Date
No description available.
|
2 |
Methods for Analyzing Tree-Structured Data and their Applications to Computational Biology / 木構造データの解析手法とその計算生物学への応用Mori, Tomoya 24 September 2015 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第19336号 / 情博第588号 / 新制||情||103(附属図書館) / 32338 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 阿久津 達也, 教授 山本 章博, 教授 岡部 寿男 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
3 |
The mat sat on the cat : investigating structure in the evaluation of order in machine translationMcCaffery, Martin January 2017 (has links)
We present a multifaceted investigation into the relevance of word order in machine translation. We introduce two tools, DTED and DERP, each using dependency structure to detect differences between the structures of machine-produced translations and human-produced references. DTED applies the principle of Tree Edit Distance to calculate edit operations required to convert one structure into another. Four variants of DTED have been produced, differing in the importance they place on words which match between the two sentences. DERP represents a more detailed procedure, making use of the dependency relations between words when evaluating the disparities between paths connecting matching nodes. In order to empirically evaluate DTED and DERP, and as a standalone contribution, we have produced WOJ-DB, a database of human judgments. Containing scores relating to translation adequacy and more specifically to word order quality, this is intended to support investigations into a wide range of translation phenomena. We report an internal evaluation of the information in WOJ-DB, then use it to evaluate variants of DTED and DERP, both to determine their relative merit and their strength relative to third-party baselines. We present our conclusions about the importance of structure to the tools and their relevance to word order specifically, then propose further related avenues of research suggested or enabled by our work.
|
4 |
Extração automática de dados de páginas HTML utilizando alinhamento em dois níveisPedralho, André de Souza 28 July 2011 (has links)
Made available in DSpace on 2015-04-11T14:02:41Z (GMT). No. of bitstreams: 1
andre.pdf: 821975 bytes, checksum: 8b72d2493d068d6a827082e5eb108bf6 (MD5)
Previous issue date: 2011-07-28 / There is a huge amount of information in the World Wide Web in pages composed by similar objects. E-commerce Web sites and on-line catalogs, in general, are examples of such data repositories. Although this information usually occurs in semi-structured texts, it is designed to be interpreted and used by humans and not processed by machines. The identification of these objects inWeb pages is performed by external applications called extractors or wrappers. In this work we propose and evaluate an automatic approach to the problem of generating wrappers capable of extracting and structuring data records and the values of their attributes. It uses the Tree Alignment Algorithm to find in the Web page examples of objects of interest. Then, our method generates regular expressions for extracting objects similar to the examples given using the Multiple Sequence Alignment Algorithm. In a final step, the method decomposes the objects in sequences of text using the regular expression and common formats and delimiters, in order to identify the value of the attributes of the data records. Experiments using a collection
composed by 128 Web pages from different domains have demonstrated the feasibility of our extraction method. It is evaluated regarding the identification of blocks of HTML source code that contain data records and regarding record extraction and the value of its attributes. It reached a precision of 83% and a recall of 80% when extracting the value of attributes. These values mean a gain in precision of 43.37% and in recall of 68.75% when compared to similar proposals. / Existe uma grande quantidade de informação na World Wide Web em páginas compostas por objetos similares. Web sites de comércio eletrônico e catálogos online, em geral, são exemplos destes repositórios de dados. Apesar destes dados serem
apresentados em porções de texto semi-estruturados, são projetados para serem interpretados e utilizados por humanos e não processados por máquinas. A identificação destes objetos em páginas Web é feita por aplicações externas chamadas extratores ou wrappers. Neste trabalho propomos e avaliamos um método automático para o problema de extrair e estruturar registros e valores de seus atributos presentes em páginas Web
ricas em dados. O método utiliza um Algoritmo de Alinhamento de Árvores para encontrar nestas páginas exemplos de registros que correspondem a objetos de interesse. Em seguida, o método gera expressões regulares para extrair objetos similares aos exemplos dados usando o Algoritmo de Alinhamento de Múltiplas Sequências. Em um passo final, o método decompõe os registros em sequências de texto aplicando a expressão regular criada e formatações e delimitadores comuns, com o intuito de identificar os valores dos atributos dos registros. Experimentos utilizando uma coleção composta por 128 páginasWeb de diferentes domínios demonstram a viabilidade do nosso método de extração. O método foi avaliado em relação à identificação de blocos de código HTML que contêm os registros e quanto à extração dos registros e dos valores de seus atributos. Obtivemos precisão de 83% e revocação de 80% na
extração de valores de atributos. Estes valores significam um ganho na precisão de 43,37% e na revocação de 68,75%, em relação a propostas similares
|
5 |
Comparaison et évolution de schémas XML / Comparison and evolution of XML schemaAmavi, Joshua 28 November 2014 (has links)
XML est devenu le format standard d’échange de données. Nous souhaitons construire un environnement multi-système où des systèmes locaux travaillent en harmonie avec un système global, qui est une évolution conservatrice des systèmes locaux. Dans cet environnement, l’échange de données se fait dans les deux sens. Pour y parvenir nous avons besoin d’un mapping entre les schémas des systèmes. Le but du mapping est d’assurer l’évolution des schémas et de guider l’adaptation des documents entre les schémas concernés. Nous proposons des outils pour faciliter l’évolution de base de données XML. Ces outils permettent de : (i) calculer un mapping entre le schéma global et les schémas locaux, et d’adapter les documents ; (ii) calculer les contraintes d’intégrité du système global à partir de celles des systèmes locaux ; (iii) comparer les schémas de deux systèmes pour pouvoir remplacer un système par celui qui le contient ; (iv) corriger un nouveau document qui est invalide par rapport au schéma d’un système, afin de l’ajouter au système. Des expériences ont été menées sur des données synthétiques et réelles pour montrer l’efficacité de nos méthodes. / XML has become the de facto format for data exchange. We aim at establishing a multi-system environment where some local original systems work in harmony with a global integrated system, which is a conservative evolution of local ones. Data exchange is possible in both directions, allowing activities on both levels. For this purpose, we need schema mapping whose is to ensure schema evolution, and to guide the construction of a document translator, allowing automatic data adaptation wrt type evolution. We propose a set of tools to help dealing with XML database evolution. These tools are used : (i) to compute a mapping capable of obtaining a global schema which is a conservative extension of original local schemas, and to adapt XML documents ; (ii) to compute the set of integrity constraints for the global system on the basis of the local ones ; (iii) to compare XML types of two systems in order to replace a system by another one ; (iv) to correct a new document with respect to an XML schema. Experimental results are discussed, showing the efficiency of our methods in many situations.
|
Page generated in 0.0868 seconds