Global ETD Search

351	Evaluation of the usability of constraint diagrams as a visual modelling language : theoretical and empirical investigations Fetais, Noora January 2013 (has links) This research evaluates the constraint diagrams (CD) notation, which is a formal representation for program specification that has some promise to be used by people who are not expert in software design. Multiple methods were adopted in order to provide triangulated evidence of the potential benefits of constraint diagrams compared with other notational systems. Three main approaches were adopted for this research. The first approach was a semantic and task analysis of the CD notation. This was conducted by the application of the Cognitive Dimensions framework, which was used to examine the relative strengths and weaknesses of constraint diagrams and conventional notations in terms of the perceptive facilitation or impediments of these different representations. From this systematic analysis, we found that CD cognitively reduced the cost of exploratory design, modification, incrementation, searching, and transcription activities with regard to the cognitive dimensions: consistency, visibility, abstraction, closeness of mapping, secondary notation, premature commitment, role-expressiveness, progressive evaluation, diffuseness, provisionality, hidden dependency, viscosity, hard mental operations, and error-proneness. The second approach was an empirical evaluation of the comprehension of CD compared to natural language (NL) with computer science students. This experiment took the form of a web-based competition in which 33 participants were given instructions and training on either CD or the equivalent NL specification expressions, and then after each example, they responded to three multiple-choice questions requiring the interpretation of expressions in their particular notation. Although the CD group spent more time on the training and had less confidence, they obtained comparable interpretation scores to the NL group and took less time to answer the questions, although they had no prior experience of CD notation. The third approach was an experiment on the construction of CD. 20 participants were given instructions and training on either CD or the equivalent NL specification expressions, and then after each example, they responded to three questions requiring the construction of expressions in their particular notation. We built an editor to allow the construction of the two notations, which automatically logged their interactions. In general, for constructing program specification, the CD group had more accurate answers, they had spent less time in training, and their returns to the training examples were fewer than those of the NL group. Overall it was found that CD is understandable, usable, intuitive, and expressive with unambiguous semantic notation. 004
352	The use of belief networks in natural language understanding and dialog modeling. January 2001 (has links) Wai, Chi Man Carmen. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves 129-136). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview --- p.1 / Chapter 1.2 --- Natural Language Understanding --- p.3 / Chapter 1.3 --- BNs for Handling Speech Recognition Errors --- p.4 / Chapter 1.4 --- BNs for Dialog Modeling --- p.5 / Chapter 1.5 --- Thesis Goals --- p.8 / Chapter 1.6 --- Thesis Outline --- p.8 / Chapter 2 --- Background --- p.10 / Chapter 2.1 --- Natural Language Understanding --- p.11 / Chapter 2.1.1 --- Rule-based Approaches --- p.12 / Chapter 2.1.2 --- Stochastic Approaches --- p.13 / Chapter 2.1.3 --- Phrase-Spotting Approaches --- p.16 / Chapter 2.2 --- Handling Recognition Errors in Spoken Queries --- p.17 / Chapter 2.3 --- Spoken Dialog Systems --- p.19 / Chapter 2.3.1 --- Finite-State Networks --- p.21 / Chapter 2.3.2 --- The Form-based Approaches --- p.21 / Chapter 2.3.3 --- Sequential Decision Approaches --- p.22 / Chapter 2.3.4 --- Machine Learning Approaches --- p.24 / Chapter 2.4 --- Belief Networks --- p.27 / Chapter 2.4.1 --- Introduction --- p.27 / Chapter 2.4.2 --- Bayesian Inference --- p.29 / Chapter 2.4.3 --- Applications of the Belief Networks --- p.32 / Chapter 2.5 --- Chapter Summary --- p.33 / Chapter 3 --- Belief Networks for Natural Language Understanding --- p.34 / Chapter 3.1 --- The ATIS Domain --- p.35 / Chapter 3.2 --- Problem Formulation --- p.36 / Chapter 3.3 --- Semantic Tagging --- p.37 / Chapter 3.4 --- Belief Networks Development --- p.38 / Chapter 3.4.1 --- Concept Selection --- p.39 / Chapter 3.4.2 --- Bayesian Inferencing --- p.40 / Chapter 3.4.3 --- Thresholding --- p.40 / Chapter 3.4.4 --- Goal Identification --- p.41 / Chapter 3.5 --- Experiments on Natural Language Understanding --- p.42 / Chapter 3.5.1 --- Comparison between Mutual Information and Informa- tion Gain --- p.42 / Chapter 3.5.2 --- Varying the Input Dimensionality --- p.44 / Chapter 3.5.3 --- Multiple Goals and Rejection --- p.46 / Chapter 3.5.4 --- Comparing Grammars --- p.47 / Chapter 3.6 --- Benchmark with Decision Trees --- p.48 / Chapter 3.7 --- Performance on Natural Language Understanding --- p.51 / Chapter 3.8 --- Handling Speech Recognition Errors in Spoken Queries --- p.52 / Chapter 3.8.1 --- Corpus Preparation --- p.53 / Chapter 3.8.2 --- Enhanced Belief Network Topology --- p.54 / Chapter 3.8.3 --- BNs for Handling Speech Recognition Errors --- p.55 / Chapter 3.8.4 --- Experiments on Handling Speech Recognition Errors --- p.60 / Chapter 3.8.5 --- Significance Testing --- p.64 / Chapter 3.8.6 --- Error Analysis --- p.65 / Chapter 3.9 --- Chapter Summary --- p.67 / Chapter 4 --- Belief Networks for Mixed-Initiative Dialog Modeling --- p.68 / Chapter 4.1 --- The CU FOREX Domain --- p.69 / Chapter 4.1.1 --- Domain-Specific Constraints --- p.69 / Chapter 4.1.2 --- Two Interaction Modalities --- p.70 / Chapter 4.2 --- The Belief Networks --- p.70 / Chapter 4.2.1 --- Informational Goal Inference --- p.72 / Chapter 4.2.2 --- Detection of Missing / Spurious Concepts --- p.74 / Chapter 4.3 --- Integrating Two Interaction Modalities --- p.78 / Chapter 4.4 --- Incorporating Out-of-Vocabulary Words --- p.80 / Chapter 4.4.1 --- Natural Language Queries --- p.80 / Chapter 4.4.2 --- Directed Queries --- p.82 / Chapter 4.5 --- Evaluation of the BN-based Dialog Model --- p.84 / Chapter 4.6 --- Chapter Summary --- p.87 / Chapter 5 --- Scalability and Portability of Belief Network-based Dialog Model --- p.88 / Chapter 5.1 --- Migration to the ATIS Domain --- p.89 / Chapter 5.2 --- Scalability of the BN-based Dialog Model --- p.90 / Chapter 5.2.1 --- Informational Goal Inference --- p.90 / Chapter 5.2.2 --- Detection of Missing / Spurious Concepts --- p.92 / Chapter 5.2.3 --- Context Inheritance --- p.94 / Chapter 5.3 --- Portability of the BN-based Dialog Model --- p.101 / Chapter 5.3.1 --- General Principles for Probability Assignment --- p.101 / Chapter 5.3.2 --- Performance of the BN-based Dialog Model with Hand- Assigned Probabilities --- p.105 / Chapter 5.3.3 --- Error Analysis --- p.108 / Chapter 5.4 --- Enhancements for Discourse Query Understanding --- p.110 / Chapter 5.4.1 --- Combining Trained and Handcrafted Probabilities --- p.110 / Chapter 5.4.2 --- Handcrafted Topology for BNs --- p.111 / Chapter 5.4.3 --- Performance of the Enhanced BN-based Dialog Model --- p.117 / Chapter 5.5 --- Chapter Summary --- p.120 / Chapter 6 --- Conclusions --- p.122 / Chapter 6.1 --- Summary --- p.122 / Chapter 6.2 --- Contributions --- p.126 / Chapter 6.3 --- Future Work --- p.127 / Bibliography --- p.129 / Chapter A --- The Two Original SQL Query --- p.137 / Chapter B --- "The Two Grammars, GH and GsA" --- p.139 / Chapter C --- Probability Propagation in Belief Networks --- p.149 / Chapter C.1 --- Computing the aposteriori probability of P(G) based on in- put concepts --- p.151 / Chapter C.2 --- Computing the aposteriori probability of P(Cj) by backward inference --- p.154 / Chapter D --- Total 23 Concepts for the Handcrafted BN --- p.156 Machine learning Automatic speech recognition Human-computer interaction
353	A robust unification-based parser for Chinese natural language processing. January 2001 (has links) Chan Shuen-ti Roy. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves 168-175). / Abstracts in English and Chinese. / Chapter 1. --- Introduction --- p.12 / Chapter 1.1. --- The nature of natural language processing --- p.12 / Chapter 1.2. --- Applications of natural language processing --- p.14 / Chapter 1.3. --- Purpose of study --- p.17 / Chapter 1.4. --- Organization of this thesis --- p.18 / Chapter 2. --- Organization and methods in natural language processing --- p.20 / Chapter 2.1. --- Organization of natural language processing system --- p.20 / Chapter 2.2. --- Methods employed --- p.22 / Chapter 2.3. --- Unification-based grammar processing --- p.22 / Chapter 2.3.1. --- Generalized Phase Structure Grammar (GPSG) --- p.27 / Chapter 2.3.2. --- Head-driven Phrase Structure Grammar (HPSG) --- p.31 / Chapter 2.3.3. --- Common drawbacks of UBGs --- p.33 / Chapter 2.4. --- Corpus-based processing --- p.34 / Chapter 2.4.1. --- Drawback of corpus-based processing --- p.35 / Chapter 3. --- Difficulties in Chinese language processing and its related works --- p.37 / Chapter 3.1. --- A glance at the history --- p.37 / Chapter 3.2. --- Difficulties in syntactic analysis of Chinese --- p.37 / Chapter 3.2.1. --- Writing system of Chinese causes segmentation problem --- p.38 / Chapter 3.2.2. --- Words serving multiple grammatical functions without inflection --- p.40 / Chapter 3.2.3. --- Word order of Chinese --- p.42 / Chapter 3.2.4. --- The Chinese grammatical word --- p.43 / Chapter 3.3. --- Related works --- p.45 / Chapter 3.3.1. --- Unification grammar processing approach --- p.45 / Chapter 3.3.2. --- Corpus-based processing approach --- p.48 / Chapter 3.4. --- Restatement of goal --- p.50 / Chapter 4. --- SERUP: Statistical-Enhanced Robust Unification Parser --- p.54 / Chapter 5. --- Step One: automatic preprocessing --- p.57 / Chapter 5.1. --- Segmentation of lexical tokens --- p.57 / Chapter 5.2. --- "Conversion of date, time and numerals" --- p.61 / Chapter 5.3. --- Identification of new words --- p.62 / Chapter 5.3.1. --- Proper nouns ´ؤ Chinese names --- p.63 / Chapter 5.3.2. --- Other proper nouns and multi-syllabic words --- p.67 / Chapter 5.4. --- Defining smallest parsing unit --- p.82 / Chapter 5.4.1. --- The Chinese sentence --- p.82 / Chapter 5.4.2. --- Breaking down the paragraphs --- p.84 / Chapter 5.4.3. --- Implementation --- p.87 / Chapter 6. --- Step Two: grammar construction --- p.91 / Chapter 6.1. --- Criteria in choosing a UBG model --- p.91 / Chapter 6.2. --- The grammar in details --- p.92 / Chapter 6.2.1. --- The PHON feature --- p.93 / Chapter 6.2.2. --- The SYN feature --- p.94 / Chapter 6.2.3. --- The SEM feature --- p.98 / Chapter 6.2.4. --- Grammar rules and features principles --- p.99 / Chapter 6.2.5. --- Verb phrases --- p.101 / Chapter 6.2.6. --- Noun phrases --- p.104 / Chapter 6.2.7. --- Prepositional phrases --- p.113 / Chapter 6.2.8. --- """Ba2"" and ""Bei4"" constructions" --- p.115 / Chapter 6.2.9. --- The terminal node S --- p.119 / Chapter 6.2.10. --- Summary of phrasal rules --- p.121 / Chapter 6.2.11. --- Morphological rules --- p.122 / Chapter 7. --- Step Three: resolving structural ambiguities --- p.128 / Chapter 7.1. --- Sources of ambiguities --- p.128 / Chapter 7.2. --- The traditional practices: an illustration --- p.132 / Chapter 7.3. --- Deficiency of current practices --- p.134 / Chapter 7.4. --- A new point of view: Wu (1999) --- p.140 / Chapter 7.5. --- Improvement over Wu (1999) --- p.142 / Chapter 7.6. --- Conclusion on semantic features --- p.146 / Chapter 8. --- "Implementation, performance and evaluation" --- p.148 / Chapter 8.1. --- Implementation --- p.148 / Chapter 8.2. --- Performance and evaluation --- p.150 / Chapter 8.2.1. --- The test set --- p.150 / Chapter 8.2.2. --- Segmentation of lexical tokens --- p.150 / Chapter 8.2.3. --- New word identification --- p.152 / Chapter 8.2.4. --- Parsing unit segmentation --- p.156 / Chapter 8.2.5. --- The grammar --- p.158 / Chapter 8.3. --- Overall performance of SERUP --- p.162 / Chapter 9. --- Conclusion --- p.164 / Chapter 9.1. --- Summary of this thesis --- p.164 / Chapter 9.2. --- Contribution of this thesis --- p.165 / Chapter 9.3. --- Future work --- p.166 / References --- p.168 / Appendix I --- p.176 / Appendix II --- p.181 / Appendix III --- p.183 Chinese language--Data processing Parsing (Computer grammar)
354	Automatic construction and adaptation of wrappers for semi-structured web documents. January 2003 (has links) Wong Tak Lam. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 88-94). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Wrapper Induction for Semi-structured Web Documents --- p.1 / Chapter 1.2 --- Adapting Wrappers to Unseen Web Sites --- p.6 / Chapter 1.3 --- Thesis Contributions --- p.7 / Chapter 1.4 --- Thesis Organization --- p.8 / Chapter 2 --- Related Work --- p.10 / Chapter 2.1 --- Related Work on Wrapper Induction --- p.10 / Chapter 2.2 --- Related Work on Wrapper Adaptation --- p.16 / Chapter 3 --- Automatic Construction of Hierarchical Wrappers --- p.20 / Chapter 3.1 --- Hierarchical Record Structure Inference --- p.22 / Chapter 3.2 --- Extraction Rule Induction --- p.30 / Chapter 3.3 --- Applying Hierarchical Wrappers --- p.38 / Chapter 4 --- Experimental Results for Wrapper Induction --- p.40 / Chapter 5 --- Adaptation of Wrappers for Unseen Web Sites --- p.52 / Chapter 5.1 --- Problem Definition --- p.52 / Chapter 5.2 --- Overview of Wrapper Adaptation Framework --- p.55 / Chapter 5.3 --- Potential Training Example Candidate Identification --- p.58 / Chapter 5.3.1 --- Useful Text Fragments --- p.58 / Chapter 5.3.2 --- Training Example Generation from the Unseen Web Site --- p.60 / Chapter 5.3.3 --- Modified Nearest Neighbour Classification --- p.63 / Chapter 5.4 --- Machine Annotated Training Example Discovery and New Wrap- per Learning --- p.64 / Chapter 5.4.1 --- Text Fragment Classification --- p.64 / Chapter 5.4.2 --- New Wrapper Learning --- p.69 / Chapter 6 --- Case Study and Experimental Results for Wrapper Adapta- tion --- p.71 / Chapter 6.1 --- Case Study on Wrapper Adaptation --- p.71 / Chapter 6.2 --- Experimental Results --- p.73 / Chapter 6.2.1 --- Book Domain --- p.74 / Chapter 6.2.2 --- Consumer Electronic Appliance Domain --- p.79 / Chapter 7 --- Conclusions and Future Work --- p.83 / Bibliography --- p.88 / Chapter A --- Detailed Performance of Wrapper Induction for Book Do- main --- p.95 / Chapter B --- Detailed Performance of Wrapper Induction for Consumer Electronic Appliance Domain --- p.99 Text processing (Computer science) World Wide Web
355	Ontology Learning and Information Extraction for the Semantic Web Kavalec, Martin January 2006 (has links) The work gives overview of its three main topics: semantic web, information extraction and ontology learning. A method for identification relevant information on web pages is described and experimentally tested on pages of companies offering products and services. The method is based on analysis of a sample web pages and their position in the Open Directory catalogue. Furthermore, a modfication of association rules mining algorithm is proposed and experimentally tested. In addition to an identification of a relation between ontology concepts, it suggest possible naming of the relation.
356	Investigação de métodos de desambiguação lexical de sentidos de verbos do português do Brasil / Research of word sense disambiguation methods for verbs in brazilian portuguese Marco Antonio Sobrevilla Cabezudo 28 August 2015 (has links) A Desambiguação Lexical de Sentido (DLS) consiste em determinar o sentido mais apropriado da palavra em um contexto determinado, utilizando-se um repositório de sentidos pré-especificado. Esta tarefa é importante para outras aplicações, por exemplo, a tradução automática. Para o inglês, a DLS tem sido amplamente explorada, utilizando diferentes abordagens e técnicas, contudo, esta tarefa ainda é um desafio para os pesquisadores em semântica. Analisando os resultados dos métodos por classes gramaticais, nota-se que todas as classes não apresentam os mesmos resultados, sendo que os verbos são os que apresentam os piores resultados. Estudos ressaltam que os métodos de DLS usam informações superficiais e os verbos precisam de informação mais profunda para sua desambiguação, como frames sintáticos ou restrições seletivas. Para o português, existem poucos trabalhos nesta área e só recentemente tem-se investigado métodos de uso geral. Além disso, salienta-se que, nos últimos anos, têm sido desenvolvidos recursos lexicais focados nos verbos. Nesse contexto, neste trabalho de mestrado, visou-se investigar métodos de DLS de verbos em textos escritos em português do Brasil. Em particular, foram explorados alguns métodos tradicionais da área e, posteriormente, foi incorporado conhecimento linguístico proveniente da Verbnet.Br. Para subsidiar esta investigação, o córpus CSTNews foi anotado com sentidos de verbos usando a WordNet-Pr como repositório de sentidos. Os resultados obtidos mostraram que os métodos de DLS investigados não conseguiram superar o baseline mais forte e que a incorporação de conhecimento da VerbNet.Br produziu melhorias nos métodos, porém, estas melhorias não foram estatisticamente significantes. Algumas contribuições deste trabalho de mestrado foram um córpus anotado com sentidos de verbos, a criação de uma ferramenta que auxilie a anotação de sentidos, a investigação de métodos de DLS e o uso de informações especificas de verbos (provenientes da VerbNet.Br) na DLS de verbos. / Word Sense Disambiguation (WSD) aims at identifying the appropriate sense of a word in a given context, using a pre-specified sense-repository. This task is important to other applications as Machine Translation. For English, WSD has been widely studied, using different approaches and techniques, however, this task is still a challenge for researchers in Semantics. Analyzing the performance of different methods by the morphosyntactic class, note that not all classes have the same results, and the worst results are obtained for Verbs. Studies highlight that WSD methods use shallow information and Verbs need deeper information for its disambiguation, like syntactic frames or selectional restrictions. For Portuguese, there are few works in WSD and, recently, some works for general purpose. In addition, it is noted that, recently, have been developed lexical resources focused on Verbs. In this context, this master work aimed at researching WSD methods for verbs in texts written in Brazilian Portuguese. In particular, traditional WSD methods were explored and, subsequently, linguistic knowledge of VerbNet.Br was incorporated in these methods. To support this research, CSTNews corpus was annotated with verb senses using the WordNet-Pr as a sense-repository. The results showed that explored WSD methods did not outperform the hard baseline and the incorporation of VerbNet.Br knowledge yielded improvements in the methods, however, these improvements were not statistically significant. Some contributions of this work were the sense-annotated corpus, the creation of a tool for support the sense-annotation, the research of WSD methods for verbs and the use of specific information of verbs (from VerbNet.Br) in the WSD of verbs. Desambiguação lexical de sentindo Linguística computacional Processamento da linguagem natural Computational linguistics Natural language processing Word sense disambiguation
357	Sumarização automática de opiniões baseada em aspectos / Automatic aspect-based opinion summarization Roque Enrique López Condori 24 August 2015 (has links) A sumarização de opiniões, também conhecida como sumarização de sentimentos, é a tarefa que consiste em gerar automaticamente sumários para um conjunto de opiniões sobre uma entidade específica. Uma das principais abordagens para gerar sumários de opiniões é a sumarização baseada em aspectos. A sumarização baseada em aspectos produz sumários das opiniões para os principais aspectos de uma entidade. As entidades normalmente referem-se a produtos, serviços, organizações, entre outros, e os aspectos são atributos ou componentes das entidades. Nos últimos anos, essa tarefa tem ganhado muita relevância diante da grande quantidade de informação online disponível na web e do interesse cada vez maior em conhecer a avaliação dos usuários sobre produtos, empresas, pessoas e outros. Infelizmente, para o Português do Brasil, pouco se tem pesquisado nessa área. Nesse cenário, neste projeto de mestrado, investigou-se o desenvolvimento de alguns métodos de sumarização de opiniões com base em aspectos. Em particular, foram implementados quatro métodos clássicos da literatura, extrativos e abstrativos. Esses métodos foram analisados em cada uma de suas fases e, como consequência dessa análise, produziram-se duas propostas para gerar sumários de opiniões. Essas duas propostas tentam utilizar as principais vantagens dos métodos clássicos para gerar melhores sumários. A fim de analisar o desempenho dos métodos implementados, foram realizados experimentos em função de três medidas de avaliação tradicionais da área: informatividade, qualidade linguística e utilidade do sumário. Os resultados obtidos mostram que os métodos propostos neste trabalho são competitivos com os métodos da literatura e, em vários casos, os superam. / Opinion summarization, also known as sentiment summarization, is the task of automatically generating summaries for a set of opinions about a specific entity. One of the main approaches to generate opinion summaries is aspect-based opinion summarization. Aspect-based opinion summarization generates summaries of opinions for the main aspects of an entity. Entities could be products, services, organizations or others, and aspects are attributes or components of them. In the last years, this task has gained much importance because of the large amount of online information available on the web and the increasing interest in learning the user evaluation about products, companies, people and others. Unfortunately, for Brazilian Portuguese language, there are few researches in that area. In this scenario, this master\'s project investigated the development of some aspect-based opinion summarization methods. In particular, it was implemented four classical methods of the literature, extractive and abstractive ones. These methods were analyzed in each of its phases and, as a result of this analysis, it was produced two proposals to generate summaries of opinions. Both proposals attempt to use the main advantages of the classical methods to generate better summaries. In order to analyze the performance of the implemented methods, experiments were carried out according to three traditional evaluation measures: informativeness, linguistic quality and usefulness of the summary. The results show that the proposed methods in this work are competitive with the classical methods and, in many cases, they got the best performance. Processamento da língua natural Aspect-based opinion summarization Natural language processing
358	Método semi-automático de construção de ontologias parciais de domínio com base em textos. / Semi-automatic method for the construction of partial domain ontologies based on texts. Luiz Carlos da Cruz Carvalheira 31 August 2007 (has links) Os recentes desenvolvimentos relacionados à gestão do conhecimento, à web semântica e à troca de informações eletrônicas por meio de agentes têm suscitado a necessidade de ontologias para descrever de modo formal conceituações compartilhadas à respeito dos mais variados domínios. Para que computadores e pessoas possam trabalhar em cooperação é necessário que as informações por eles utilizadas tenham significados bem definidos e compartilhados. Ontologias são instrumentos viabilizadores dessa cooperação. Entretanto, a construção de ontologias envolve um processo complexo e longo de aquisição de conhecimento, o que tem dificultado a utilização desse tipo de solução em mais larga escala. Este trabalho apresenta um método de criação semi-automática de ontologias a partir do uso de textos de um domínio qualquer para a extração dos conceitos e relações presentes nesses textos. Baseando-se na comparação da freqüência relativa dos termos extraídos com os escritos típicos da língua e na extração de padrões lingüísticos específicos, este método identifica termos candidatos a conceitos e relações existentes entre eles, apresenta-os a um ontologista para validação e, ao final, disponibiliza a ontologia ratificada para publicação e uso especificando-a na linguagem OWL. / The recent developments related to knowledge management, the semantic web and the exchange of electronic information through the use of agents have increased the need for ontologies to describe, in a formal way, shared understanding of a given domain. For computers and people to work in cooperation it is necessary that information have well defined and shared definitions. Ontologies are enablers of that cooperation. However, ontology construction remains a very complex and costly process, which has hindered its use in a wider scale. This work presents a method for the semi-automatic construction of ontologies using texts of any domain for the extraction of concepts and relations. By comparing the relative frequency of terms in the text with their expected use and extracting specific linguistic patterns, the method identifies concepts and relations and specifies the corresponding ontology using OWL for further use by other applications. Gestão do conhecimento Inteligência artificial Ontologias Processamento de linguagem natural Artificial intelligence Knowledge management Natural language processing Ontology
359	A verb learning model driven by syntactic constructions / Um modelo de aquisição de verbos guiado por construções sintáticas Machado, Mario Lúcio Mesquita January 2008 (has links) Desde a segunda metade do último século, as teorias cognitivas têm trazido algumas visões interessantes em relação ao aprendizado de linguagem. A aplicação destas teorias em modelos computacionais tem duplo benefício: por um lado, implementações computacionais podem ser usaas como uma forma de validação destas teorias; por outro lado, modelos computacionais podem alcançar uma performance melhorada a partir da adoção de estratégias de aprendizado cognitivamente plausíveis. Estruturas sintáticas são ditas fornecer uma pista importante para a aquisição do significado de verbos. Ainda, para um subconjunto particular de verbos muito frequentes e gerais - os assim-chamados light verbs - há uma forte ligação entre as estruturas sintáticas nas quais eles aparecem e seus significados. Neste trabalho, empregamos um modelo computacional para investigar estas propostas, em particular, considerando a tarefa de aquisição como um mapeamento entre um verbo desconhecido e referentes prototípicos para eventos verbais, com base na estrutura sintática na qual o verbo aparece. Os experimentos conduzidos ressaltaram alguns requerimentos para um aprendizado bem-sucedido, em termos de níveis de informação disponível para o aprendiz e da estratégia de aprendizado adotada. / Cognitive theories have been, since the second half of the last century, bringing some interesting views about language learning. The application of these theories on computational models has double benefits: in the one hand, computational implementations can be used as a form of validation of these theories; on the other hand, computational models can earn an improved performance from adopting some cognitively plausible learning strategies. Syntactic structures are said to provide an important cue for the acquisition of verb meaning. Yet, for a particular subset of very frequent and general verbs – the so-called light verbs – there is a strong link between the syntactic structures in which they appear and their meanings. In this work, we used a computational model, to further investigate these proposals, in particular looking at the acquisition task as a mapping between an unknown verb and prototypical referents for verbal events, on the basis of the syntactic structure in which the verb appears. The experiments conducted have highlighted some requirements for a successful learning, both in terms of the levels of information available to the learner and the learning strategies adopted. Teoria da computação Linguagem natural Linguística computacional Natural language processing Cognitively based models Mental lexicon
360	Toponym resolution in text Leidner, Jochen Lothar January 2007 (has links) Background. In the area of Geographic Information Systems (GIS), a shared discipline between informatics and geography, the term geo-parsing is used to describe the process of identifying names in text, which in computational linguistics is known as named entity recognition and classification (NERC). The term geo-coding is used for the task of mapping from implicitly geo-referenced datasets (such as structured address records) to explicitly geo-referenced representations (e.g., using latitude and longitude). However, present-day GIS systems provide no automatic geo-coding functionality for unstructured text. In Information Extraction (IE), processing of named entities in text has traditionally been seen as a two-step process comprising a flat text span recognition sub-task and an atomic classification sub-task; relating the text span to a model of the world has been ignored by evaluations such as MUC or ACE (Chinchor (1998); U.S. NIST (2003)). However, spatial and temporal expressions refer to events in space-time, and the grounding of events is a precondition for accurate reasoning. Thus, automatic grounding can improve many applications such as automatic map drawing (e.g. for choosing a focus) and question answering (e.g. for questions like How far is London from Edinburgh?, given a story in which both occur and can be resolved). Whereas temporal grounding has received considerable attention in the recent past (Mani and Wilson (2000); Setzer (2001)), robust spatial grounding has long been neglected. Concentrating on geographic names for populated places, I define the task of automatic Toponym Resolution (TR) as computing the mapping from occurrences of names for places as found in a text to a representation of the extensional semantics of the location referred to (its referent), such as a geographic latitude/longitude footprint. The task of mapping from names to locations is hard due to insufficient and noisy databases, and a large degree of ambiguity: common words need to be distinguished from proper names (geo/non-geo ambiguity), and the mapping between names and locations is ambiguous (London can refer to the capital of the UK or to London, Ontario, Canada, or to about forty other Londons on earth). In addition, names of places and the boundaries referred to change over time, and databases are incomplete. Objective. I investigate how referentially ambiguous spatial named entities can be grounded, or resolved, with respect to an extensional coordinate model robustly on open-domain news text. I begin by comparing the few algorithms proposed in the literature, and, comparing semiformal, reconstructed descriptions of them, I factor out a shared repertoire of linguistic heuristics (e.g. rules, patterns) and extra-linguistic knowledge sources (e.g. population sizes). I then investigate how to combine these sources of evidence to obtain a superior method. I also investigate the noise effect introduced by the named entity tagging step that toponym resolution relies on in a sequential system pipeline architecture. Scope. In this thesis, I investigate a present-day snapshot of terrestrial geography as represented in the gazetteer defined and, accordingly, a collection of present-day news text. I limit the investigation to populated places; geo-coding of artifact names (e.g. airports or bridges), compositional geographic descriptions (e.g. 40 miles SW of London, near Berlin), for instance, is not attempted. Historic change is a major factor affecting gazetteer construction and ultimately toponym resolution. However, this is beyond the scope of this thesis. Method. While a small number of previous attempts have been made to solve the toponym resolution problem, these were either not evaluated, or evaluation was done by manual inspection of system output instead of curating a reusable reference corpus. Since the relevant literature is scattered across several disciplines (GIS, digital libraries, information retrieval, natural language processing) and descriptions of algorithms are mostly given in informal prose, I attempt to systematically describe them and aim at a reconstruction in a uniform, semi-formal pseudo-code notation for easier re-implementation. A systematic comparison leads to an inventory of heuristics and other sources of evidence. In order to carry out a comparative evaluation procedure, an evaluation resource is required. Unfortunately, to date no gold standard has been curated in the research community. To this end, a reference gazetteer and an associated novel reference corpus with human-labeled referent annotation are created. These are subsequently used to benchmark a selection of the reconstructed algorithms and a novel re-combination of the heuristics catalogued in the inventory. I then compare the performance of the same TR algorithms under three different conditions, namely applying it to the (i) output of human named entity annotation, (ii) automatic annotation using an existing Maximum Entropy sequence tagging model, and (iii) a na¨ıve toponym lookup procedure in a gazetteer. Evaluation. The algorithms implemented in this thesis are evaluated in an intrinsic or component evaluation. To this end, we define a task-specific matching criterion to be used with traditional Precision (P) and Recall (R) evaluation metrics. This matching criterion is lenient with respect to numerical gazetteer imprecision in situations where one toponym instance is marked up with different gazetteer entries in the gold standard and the test set, respectively, but where these refer to the same candidate referent, caused by multiple near-duplicate entries in the reference gazetteer. Main Contributions. The major contributions of this thesis are as follows: • A new reference corpus in which instances of location named entities have been manually annotated with spatial grounding information for populated places, and an associated reference gazetteer, from which the assigned candidate referents are chosen. This reference gazetteer provides numerical latitude/longitude coordinates (such as 51320 North, 0 50 West) as well as hierarchical path descriptions (such as London > UK) with respect to a world wide-coverage, geographic taxonomy constructed by combining several large, but noisy gazetteers. This corpus contains news stories and comprises two sub-corpora, a subset of the REUTERS RCV1 news corpus used for the CoNLL shared task (Tjong Kim Sang and De Meulder (2003)), and a subset of the Fourth Message Understanding Contest (MUC-4; Chinchor (1995)), both available pre-annotated with gold-standard. This corpus will be made available as a reference evaluation resource; • a new method and implemented system to resolve toponyms that is capable of robustly processing unseen text (open-domain online newswire text) and grounding toponym instances in an extensional model using longitude and latitude coordinates and hierarchical path descriptions, using internal (textual) and external (gazetteer) evidence; • an empirical analysis of the relative utility of various heuristic biases and other sources of evidence with respect to the toponym resolution task when analysing free news genre text; • a comparison between a replicated method as described in the literature, which functions as a baseline, and a novel algorithm based on minimality heuristics; and • several exemplary prototypical applications to show how the resulting toponym resolution methods can be used to create visual surrogates for news stories, a geographic exploration tool for news browsing, geographically-aware document retrieval and to answer spatial questions (How far...?) in an open-domain question answering system. These applications only have demonstrative character, as a thorough quantitative, task-based (extrinsic) evaluation of the utility of automatic toponym resolution is beyond the scope of this thesis and left for future work. 621.382

Search results