Global ETD Search

1	Methods and applications of text-driven toponym resolution with indirect supervision Speriosu, Michael Adrian 24 September 2013 (has links) This thesis addresses the problem of toponym resolution. Given an ambiguous placename like Springfield in some natural language context, the task is to automatically predict the location on the earth's surface the author is referring to. Many previous efforts use hand-built heuristics to attempt to solve this problem, looking for specific words in close proximity such as Springfield, Illinois, and disambiguating any remaining toponyms to possible locations close to those already resolved. Such approaches require the data to take a fairly specific form in order to perform well, thus they often have low coverage. Some have applied machine learning to this task in an attempt to build more general resolvers, but acquiring large amounts of high quality hand-labeled training material is difficult. I discuss these and other approaches found in previous work before presenting several new toponym resolvers that rely neither on hand-labeled training material prepared explicitly for this task nor on particular co-occurrences of toponyms in close proximity in the data to be disambiguated. Some of the resolvers I develop reflect the intuition of many heuristic resolvers that toponyms nearby in text tend to (but do not always) refer to locations nearby on Earth, but do not require toponyms to occur in direct sequence with one another. I also introduce several resolvers that use the predictions of a document geolocation system (i.e. one that predicts a location for a piece of text of arbitrary length) to inform toponym disambiguation. Another resolver takes into account these document-level location predictions, knowledge of different administrative levels (country, state, city, etc.), and predictions from a logistic regression classifier trained on automatically extracted training instances from Wikipedia in a probabilistic way. It takes advantage of all content words in each toponym's context (both local window and whole document) rather than only toponyms. One resolver I build that extracts training material for a machine learned classifier from Wikipedia, taking advantage of link structure and geographic coordinates on articles, resolves 83% of toponyms in a previously introduced corpus of news articles correctly, beating the strong but simplistic population baseline. I introduce a corpus of Civil War related writings not previously used for this task on which the population baseline does poorly; combining a Wikipedia informed resolver with an algorithm that seeks to minimize the geographic scope of all predicted locations in a document achieves 86% blind test set accuracy on this dataset. After providing these high performing resolvers, I form the groundwork for more flexible and complex approaches by transforming the problem of toponym resolution into the traveling purchaser problem, modeling the probability of a location given its toponym's textual context and the geographic distribution of all locations mentioned in a document as two components of an objective function to be minimized. As one solution to this incarnation of the traveling purchaser problem, I simulate properties of ants traveling the globe and disambiguating toponyms. The ants' preferences for various kinds of behavior evolves over time, revealing underlying patterns in the corpora that other disambiguation methods do not account for. I also introduce several automated visualizations of texts that have had their toponyms resolved. Given a resolved corpus, these visualizations summarize the areas of the globe mentioned and allow the user to refer back to specific passages in the text that mention a location of interest. One visualization presented automatically generates a dynamic tour of the corpus, showing changes in the area referred to by the text as it progresses. Such visualizations are an example of a practical application of work in toponym resolution, and could be used by scholars interested in the geographic connections in any collection of text on both broad and fine-grained levels. / text Toponym resolution Semi-supervised learning Computational linguistics
2	Russian and Ukrainian Adjectives Referring to Place-names: a Contrastive Analysis Phillips, Olena January 2010 (has links) This thesis examines linguistic similarities and differences between the Russian and Ukrainian languages regarding the word formation of adjectives referring to place names (toponyms). Using contrastive analysis for analyzing the database composed of approximately 1500 shared toponyms, information is presented revealing the use of appropriate derivational paradigms. Tables are provided illustrating important characteristics of toponym stem-endings and the acquisition of their corresponding suffixes. This information culminates in a better understanding of the proper use within each language for the 25 Russian and 18 Ukrainian suffixes used in the derivational models, and its application within language. Analyzing derivational paradigms of these two investigated languages, I found 15 similar and 7 different models resulting from the word formation process. This information brings a clearer picture for both languages on how derivational paradigms are used in the proper formation of adjectives. adjective adjectonym place-name Russian toponym Ukrainian
3	Francouzské překlady pražských toponym / French translation of Prague toponyms PLATILOVÁ, Kateřina January 2016 (has links) This thesis deals with french translations of Prague toponyms. The first part contains a brief description of onomastics and toponomastics. Next, the term place names (toponyms) is defined. Afterwards this part is dedicated to the classification and the translation of toponyms. The next part of thesis is focused on the analysis of particular translations of prague place names. Firstly, they are divided into seven groups and then the terms are analysed from two different perspectives : translatological analysis and culturally - historical analysis. The aim of first analysis is to determine the translated terms from view of the six translation procedures. The second part observes the culturally - historical context of names which is also important in translation. The bilingual dictionary of analysed terms is attached at the end of the thesis.
4	Identificação da cobertura espacial de documentos usando mineração de textos / Identification of spatial coverage documents with mining Vargas, Rosa Nathalie Portugal 08 August 2012 (has links) Atualmente, é comum que usuários levem em consideração a localização geográfica dos documentos, é dizer considerar o escopo geográfico que está sendo tratado no contexto do documento, nos processos de Recuperação de Informação. No entanto, os sistemas convencionais de extração de informação que estão baseados em palavras-chave não consideram que as palavras podem representar entidades geográficas espacialmente relacionadas com outras entidades nos documentos. Para resolver esse problema, é necessário viabilizar o georreferenciamento dos textos, ou seja, identificar as entidades geográficas presentes e associá-las com sua correta localização espacial. A identificação e desambiguação das entidades geográficas apresenta desafios importantes, principalmente do ponto de vista linguístico, já que um topônimo, pode possuir variados tipos de ambiguidade associados. Esse problema de ambiguidade causa ruido nos processos de recuperação de informação, já que o mesmo termo pode ter informação relevante ou irrelevante associada. Assim, a principal estratégia para superar os problemas de ambiguidade, compreende a identificação de evidências que auxiliem na identificação e desambiguação das localidades nos textos. O presente trabalho propõe uma metodologia que permite identificar e determinar a cobertura espacial dos documentos, denominada SpatialCIM. A metodologia SpatialCIM tem o objetivo de organizar os processos de resolução de topônimos. Assim, o principal objetivo deste trabalho é avaliar e selecionar técnicas de desambiguação que permitam resolver a ambiguidade dos topônimos nos textos. Para isso, foram propostas e desenvolvidas as abordagens de (1)Desambiguação por Pontos e a (2)Desambiguação Textual e Estrutural. Essas abordagens, exploram duas técnicas diferentes de desambiguação de topônimos, as quais, geram e desambiguam os caminhos geográficos associados aos topônimos reconhecidos para cada documento. Assim, a hipótese desta pesquisa é que o uso das técnicas de desambiguação de topônimos viabilizam uma melhor localização espacial dos documentos. A partir dos resultados obtidos neste trabalho, foi possível demonstrar que as técnicas de desambiguação melhoram a precisão e revocação na classificação espacial dos documentos. Demonstrou-se também o impacto positivo do uso de uma ferramenta linguística no processo de reconhecimento das entidades geográficas. Assim, foi demostrada a utilidade dos processos de desambiguação para a obtenção da cobertura espacial dos documentos / Currently, it is usual that users take into account the geographical localization of the documents in the Information Retrieval process. However, the conventional information retrieval systems based on key-word matching do not consider which words can represent geographical entities that are spatially related to other entities in the documents. To solve this problem, it is necessary to enable the geo-referencing of texts by identifying the geographical entities present in text and associate them with their correct spatial location. The identification and disambiguation of the geographical entities present major challenges mainly from the linguistic point of view, since one location can have different types of associated ambiguity. The ambiguity problem causes noise in the process of information retrieval, since the same term may have relevant or irrelevant information associated. Thus, the main strategy to overcome these problems, include the identification of evidence to assist in the identification and disambiguation of locations in the texts. This study proposes a methodology that allows the identification and spatial localization of the documents, denominated SpatialCIM. The SpatialCIM methodology has the objective to organize the Topônym Resolution process. Therefore the main objective of this study is to evaluate and select disambiguation techniques that allow solving the toponym ambiguity in texts. Therefore, we proposed and developed the approaches of (1) Disambiguation for Points and (2) Textual and Structural Disambiguation. These approaches exploit two different techniques of toponym disambiguation, which generate and desambiguate the associated paths with the recognized geographical toponym for each document. Therefore the hypothesis is, that the use of the toponyms disambiguation techniques enable a better spatial localization of documents. From the results it was possible to demonstrate that the disambiguation techniques improve the precision and recall for the spatial classification of documents. The positive effect of using a linguistic tool for the process of geographical entities recognition was also demonstrated. Thus, it was proved the usefulness of the disambiguation process for obtaining a spatial coverage of the document Ambiguity problem Named entity recognition Problemas de ambiguidade Reconhecimento de entidades mencionadas Resolição de topônimos Toponym resolution
5	Der Name Leipzig als Hinweis auf Gegend mit Wasserreichtum / The name Leipzig as an indication of an area with abundant amounts of water Hengst, Karlheinz 20 August 2014 (has links) (PDF) The article continues to discuss the origins and the history of the Saxon place name Leipzig. Several questions are under scrutiny. Starting out from recent research which gives the oldest historical evidence of the place name Leipzig as Lib-, it deals with certain new doubts regarding explanations that try to date the origins of the place name in pre-monolingual times. The question whether one can assume an original Slavonic form to the Slavonic root *lib- is dealt with in detail. The results of this discourse are: Today’s research cannot give a satisfactory explanation that the primary place name is derived from Slavonic. Furthermore, the hypothesis of an existing pre-monolingual form is newly evaluated. In this regard also the formerly existing geographical setting of the area around Leipzig is considered as the deciding motive in naming the place. Onomastik Leipzig Toponym Ortsname Onomastics ddc:410 ddc:412 Onomastik Namenforschung Eigennamen Siedlungsname
6	Recherche intelligente d'informations géographiques à partir des toponymes, des métadonnées et d'une ontologie : application aux forêts du Bassin congolais / Smart search geographical information from names, metadata and ontologies : application to the forests of the Congo Massala, Marius 25 January 2013 (has links) Classées parmi les forêts tropicales, les forêts du bassin du Congo constituent un immense sanctuaire écologique digne de conservation que l'on classe juste derrière l'Amazonie en Amérique du sud. Le développement des États financé à grands budget, conjugué à l'urbanisation galopante et à l'augmentation de la population, s'accompagnent de problèmes environnementaux qui se posent avec acuité. C'est dans ce contexte que s'inscrivent nos travaux de thèse. Notre objectif a été de proposer une méthodologie qui vise à mettre en place un mécanisme de recherche des informations via internet pour les pays de cette région. L'utilisation des métadonnées, des toponymes et d'une ontologie a paru l'une des pistes potentielles pouvant aider à la résolution des problèmes rencontrés dans le suivi de la dynamique des objets spatiaux ainsi que l'accès efficace à des ressources informationnelles. A la différence des autres modèles, celui que nous proposons lie la dynamique spatio-temporelle des objets à celle de leurs toponymes et permet la description des ressources informationnelles à partir de mots clé provenant de l'ontologie et de l'index des toponymes. / Classified as tropical forests, the forests of the Congo basin is a huge of ecological sanctuary worthy of conservation that is second only to the Amazon in South America. States funded the development of big budget, combined with rapid urbanization and population growth, accompanied environmental problems are acute. It is in this context that our thesis comes. Our objective was to propose a methodology that aims to establish a mechanism for finding information on the internet for the countries of this region. The use of metadata, and ontology names appeared as one of the potential ways that can help in solving problems in the dynamic followed by space objects and efficient access to their information resources. Unlike other models, we propose that binds the spatiotemporal objects with those of their names and allows the description of information resources based on keywords from the ontology and the index of names. Ontologie Métadonnée Toponyme Forêt Objet géographique Ontology Metadata Toponym Forest Geographical object
7	Toponym resolution in text Leidner, Jochen Lothar January 2007 (has links) Background. In the area of Geographic Information Systems (GIS), a shared discipline between informatics and geography, the term geo-parsing is used to describe the process of identifying names in text, which in computational linguistics is known as named entity recognition and classification (NERC). The term geo-coding is used for the task of mapping from implicitly geo-referenced datasets (such as structured address records) to explicitly geo-referenced representations (e.g., using latitude and longitude). However, present-day GIS systems provide no automatic geo-coding functionality for unstructured text. In Information Extraction (IE), processing of named entities in text has traditionally been seen as a two-step process comprising a flat text span recognition sub-task and an atomic classification sub-task; relating the text span to a model of the world has been ignored by evaluations such as MUC or ACE (Chinchor (1998); U.S. NIST (2003)). However, spatial and temporal expressions refer to events in space-time, and the grounding of events is a precondition for accurate reasoning. Thus, automatic grounding can improve many applications such as automatic map drawing (e.g. for choosing a focus) and question answering (e.g. for questions like How far is London from Edinburgh?, given a story in which both occur and can be resolved). Whereas temporal grounding has received considerable attention in the recent past (Mani and Wilson (2000); Setzer (2001)), robust spatial grounding has long been neglected. Concentrating on geographic names for populated places, I define the task of automatic Toponym Resolution (TR) as computing the mapping from occurrences of names for places as found in a text to a representation of the extensional semantics of the location referred to (its referent), such as a geographic latitude/longitude footprint. The task of mapping from names to locations is hard due to insufficient and noisy databases, and a large degree of ambiguity: common words need to be distinguished from proper names (geo/non-geo ambiguity), and the mapping between names and locations is ambiguous (London can refer to the capital of the UK or to London, Ontario, Canada, or to about forty other Londons on earth). In addition, names of places and the boundaries referred to change over time, and databases are incomplete. Objective. I investigate how referentially ambiguous spatial named entities can be grounded, or resolved, with respect to an extensional coordinate model robustly on open-domain news text. I begin by comparing the few algorithms proposed in the literature, and, comparing semiformal, reconstructed descriptions of them, I factor out a shared repertoire of linguistic heuristics (e.g. rules, patterns) and extra-linguistic knowledge sources (e.g. population sizes). I then investigate how to combine these sources of evidence to obtain a superior method. I also investigate the noise effect introduced by the named entity tagging step that toponym resolution relies on in a sequential system pipeline architecture. Scope. In this thesis, I investigate a present-day snapshot of terrestrial geography as represented in the gazetteer defined and, accordingly, a collection of present-day news text. I limit the investigation to populated places; geo-coding of artifact names (e.g. airports or bridges), compositional geographic descriptions (e.g. 40 miles SW of London, near Berlin), for instance, is not attempted. Historic change is a major factor affecting gazetteer construction and ultimately toponym resolution. However, this is beyond the scope of this thesis. Method. While a small number of previous attempts have been made to solve the toponym resolution problem, these were either not evaluated, or evaluation was done by manual inspection of system output instead of curating a reusable reference corpus. Since the relevant literature is scattered across several disciplines (GIS, digital libraries, information retrieval, natural language processing) and descriptions of algorithms are mostly given in informal prose, I attempt to systematically describe them and aim at a reconstruction in a uniform, semi-formal pseudo-code notation for easier re-implementation. A systematic comparison leads to an inventory of heuristics and other sources of evidence. In order to carry out a comparative evaluation procedure, an evaluation resource is required. Unfortunately, to date no gold standard has been curated in the research community. To this end, a reference gazetteer and an associated novel reference corpus with human-labeled referent annotation are created. These are subsequently used to benchmark a selection of the reconstructed algorithms and a novel re-combination of the heuristics catalogued in the inventory. I then compare the performance of the same TR algorithms under three different conditions, namely applying it to the (i) output of human named entity annotation, (ii) automatic annotation using an existing Maximum Entropy sequence tagging model, and (iii) a na¨ıve toponym lookup procedure in a gazetteer. Evaluation. The algorithms implemented in this thesis are evaluated in an intrinsic or component evaluation. To this end, we define a task-specific matching criterion to be used with traditional Precision (P) and Recall (R) evaluation metrics. This matching criterion is lenient with respect to numerical gazetteer imprecision in situations where one toponym instance is marked up with different gazetteer entries in the gold standard and the test set, respectively, but where these refer to the same candidate referent, caused by multiple near-duplicate entries in the reference gazetteer. Main Contributions. The major contributions of this thesis are as follows: • A new reference corpus in which instances of location named entities have been manually annotated with spatial grounding information for populated places, and an associated reference gazetteer, from which the assigned candidate referents are chosen. This reference gazetteer provides numerical latitude/longitude coordinates (such as 51320 North, 0 50 West) as well as hierarchical path descriptions (such as London > UK) with respect to a world wide-coverage, geographic taxonomy constructed by combining several large, but noisy gazetteers. This corpus contains news stories and comprises two sub-corpora, a subset of the REUTERS RCV1 news corpus used for the CoNLL shared task (Tjong Kim Sang and De Meulder (2003)), and a subset of the Fourth Message Understanding Contest (MUC-4; Chinchor (1995)), both available pre-annotated with gold-standard. This corpus will be made available as a reference evaluation resource; • a new method and implemented system to resolve toponyms that is capable of robustly processing unseen text (open-domain online newswire text) and grounding toponym instances in an extensional model using longitude and latitude coordinates and hierarchical path descriptions, using internal (textual) and external (gazetteer) evidence; • an empirical analysis of the relative utility of various heuristic biases and other sources of evidence with respect to the toponym resolution task when analysing free news genre text; • a comparison between a replicated method as described in the literature, which functions as a baseline, and a novel algorithm based on minimality heuristics; and • several exemplary prototypical applications to show how the resulting toponym resolution methods can be used to create visual surrogates for news stories, a geographic exploration tool for news browsing, geographically-aware document retrieval and to answer spatial questions (How far...?) in an open-domain question answering system. These applications only have demonstrative character, as a thorough quantitative, task-based (extrinsic) evaluation of the utility of automatic toponym resolution is beyond the scope of this thesis and left for future work. 621.382
8	Identificação da cobertura espacial de documentos usando mineração de textos / Identification of spatial coverage documents with mining Rosa Nathalie Portugal Vargas 08 August 2012 (has links) Atualmente, é comum que usuários levem em consideração a localização geográfica dos documentos, é dizer considerar o escopo geográfico que está sendo tratado no contexto do documento, nos processos de Recuperação de Informação. No entanto, os sistemas convencionais de extração de informação que estão baseados em palavras-chave não consideram que as palavras podem representar entidades geográficas espacialmente relacionadas com outras entidades nos documentos. Para resolver esse problema, é necessário viabilizar o georreferenciamento dos textos, ou seja, identificar as entidades geográficas presentes e associá-las com sua correta localização espacial. A identificação e desambiguação das entidades geográficas apresenta desafios importantes, principalmente do ponto de vista linguístico, já que um topônimo, pode possuir variados tipos de ambiguidade associados. Esse problema de ambiguidade causa ruido nos processos de recuperação de informação, já que o mesmo termo pode ter informação relevante ou irrelevante associada. Assim, a principal estratégia para superar os problemas de ambiguidade, compreende a identificação de evidências que auxiliem na identificação e desambiguação das localidades nos textos. O presente trabalho propõe uma metodologia que permite identificar e determinar a cobertura espacial dos documentos, denominada SpatialCIM. A metodologia SpatialCIM tem o objetivo de organizar os processos de resolução de topônimos. Assim, o principal objetivo deste trabalho é avaliar e selecionar técnicas de desambiguação que permitam resolver a ambiguidade dos topônimos nos textos. Para isso, foram propostas e desenvolvidas as abordagens de (1)Desambiguação por Pontos e a (2)Desambiguação Textual e Estrutural. Essas abordagens, exploram duas técnicas diferentes de desambiguação de topônimos, as quais, geram e desambiguam os caminhos geográficos associados aos topônimos reconhecidos para cada documento. Assim, a hipótese desta pesquisa é que o uso das técnicas de desambiguação de topônimos viabilizam uma melhor localização espacial dos documentos. A partir dos resultados obtidos neste trabalho, foi possível demonstrar que as técnicas de desambiguação melhoram a precisão e revocação na classificação espacial dos documentos. Demonstrou-se também o impacto positivo do uso de uma ferramenta linguística no processo de reconhecimento das entidades geográficas. Assim, foi demostrada a utilidade dos processos de desambiguação para a obtenção da cobertura espacial dos documentos / Currently, it is usual that users take into account the geographical localization of the documents in the Information Retrieval process. However, the conventional information retrieval systems based on key-word matching do not consider which words can represent geographical entities that are spatially related to other entities in the documents. To solve this problem, it is necessary to enable the geo-referencing of texts by identifying the geographical entities present in text and associate them with their correct spatial location. The identification and disambiguation of the geographical entities present major challenges mainly from the linguistic point of view, since one location can have different types of associated ambiguity. The ambiguity problem causes noise in the process of information retrieval, since the same term may have relevant or irrelevant information associated. Thus, the main strategy to overcome these problems, include the identification of evidence to assist in the identification and disambiguation of locations in the texts. This study proposes a methodology that allows the identification and spatial localization of the documents, denominated SpatialCIM. The SpatialCIM methodology has the objective to organize the Topônym Resolution process. Therefore the main objective of this study is to evaluate and select disambiguation techniques that allow solving the toponym ambiguity in texts. Therefore, we proposed and developed the approaches of (1) Disambiguation for Points and (2) Textual and Structural Disambiguation. These approaches exploit two different techniques of toponym disambiguation, which generate and desambiguate the associated paths with the recognized geographical toponym for each document. Therefore the hypothesis is, that the use of the toponyms disambiguation techniques enable a better spatial localization of documents. From the results it was possible to demonstrate that the disambiguation techniques improve the precision and recall for the spatial classification of documents. The positive effect of using a linguistic tool for the process of geographical entities recognition was also demonstrated. Thus, it was proved the usefulness of the disambiguation process for obtaining a spatial coverage of the document Problemas de ambiguidade Reconhecimento de entidades mencionadas Resolição de topônimos Ambiguity problem Named entity recognition Toponym resolution
9	Entity-Centric Text Mining for Historical Documents Coll Ardanuy, Maria 07 July 2017 (has links) No description available. 510 digital humanities text mining toponym disambiguation person name disambiguation historical text mining Informatik (PPN619939052)
10	Toponym Disambiguation in Information Retrieval Buscaldi, Davide 12 November 2010 (has links) In recent years, geography has acquired a great importance in the context of Information Retrieval (IR) and, in general, of the automated processing of information in text. Mobile devices that are able to surf the web and at the same time inform about their position are now a common reality, together with applications that can exploit this data to provide users with locally customised information, such as directions or advertisements. Therefore, it is important to deal properly with the geographic information that is included in electronic texts. The majority of such kind of information is contained as place names, or toponyms. Toponym ambiguity represents an important issue in Geographical Information Retrieval (GIR), due to the fact that queries are geographically constrained. There has been a struggle to nd speci c geographical IR methods that actually outperform traditional IR techniques. Toponym ambiguity may constitute a relevant factor in the inability of current GIR systems to take advantage from geographical knowledge. Recently, some Ph.D. theses have dealt with Toponym Disambiguation (TD) from di erent perspectives, from the development of resources for the evaluation of Toponym Disambiguation (Leidner (2007)) to the use of TD to improve geographical scope resolution (Andogah (2010)). The Ph.D. thesis presented here introduces a TD method based on WordNet and carries out a detailed study of the relationship of Toponym Disambiguation to some IR applications, such as GIR, Question Answering (QA) and Web retrieval. The work presented in this thesis starts with an introduction to the applications in which TD may result useful, together with an analysis of the ambiguity of toponyms in news collections. It could not be possible to study the ambiguity of toponyms without studying the resources that are used as placename repositories; these resources are the equivalent to language dictionaries, which provide the di erent meanings of a given word. / Buscaldi, D. (2010). Toponym Disambiguation in Information Retrieval [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8912 Geographical information retrieval Toponym disambiguation Information retrieval LENGUAJES Y SISTEMAS INFORMATICOS

Search results