• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 929
  • 156
  • 74
  • 55
  • 27
  • 23
  • 18
  • 13
  • 10
  • 9
  • 8
  • 7
  • 5
  • 5
  • 4
  • Tagged with
  • 1601
  • 1601
  • 1601
  • 622
  • 565
  • 464
  • 383
  • 376
  • 266
  • 256
  • 245
  • 228
  • 221
  • 208
  • 204
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
341

Exponential Family Embeddings

Rudolph, Maja January 2018 (has links)
Word embeddings are a powerful approach for capturing semantic similarity among terms in a vocabulary. Exponential family embeddings extend the idea of word embeddings to other types of high-dimensional data. Exponential family embeddings have three ingredients; embeddings as latent variables, a predefined conditioning set for each observation called the context and a conditional likelihood from the exponential family. The embeddings are inferred with a scalable algorithm. This thesis highlights three advantages of the exponential family embeddings model class: (A) The approximations used for existing methods such as word2vec can be understood as a biased stochastic gradients procedure on a specific type of exponential family embedding model --- the Bernoulli embedding. (B) By choosing different likelihoods from the exponential family we can generalize the task of learning distributed representations to different application domains. For example, we can learn embeddings of grocery items from shopping data, embeddings of movies from click data, or embeddings of neurons from recordings of zebrafish brains. On all three applications, we find exponential family embedding models to be more effective than other types of dimensionality reduction. They better reconstruct held-out data and find interesting qualitative structure. (C) Finally, the probabilistic modeling perspective allows us to incorporate structure and domain knowledge in the embedding space. We develop models for studying how language varies over time, differs between related groups of data, and how word usage differs between languages. Key to the success of these methods is that the embeddings share statistical information through hierarchical priors or neural networks. We demonstrate the benefits of this approach in empirical studies of Senate speeches, scientific abstracts, and shopping baskets.
342

Cross-Lingual Transfer of Natural Language Processing Systems

Rasooli, Mohammad Sadegh January 2019 (has links)
Accurate natural language processing systems rely heavily on annotated datasets. In the absence of such datasets, transfer methods can help to develop a model by transferring annotations from one or more rich-resource languages to the target language of interest. These methods are generally divided into two approaches: 1) annotation projection from translation data, aka parallel data, using supervised models in rich-resource languages, and 2) direct model transfer from annotated datasets in rich-resource languages. In this thesis, we demonstrate different methods for transfer of dependency parsers and sentiment analysis systems. We propose an annotation projection method that performs well in the scenarios for which a large amount of in-domain parallel data is available. We also propose a method which is a combination of annotation projection and direct transfer that can leverage a minimal amount of information from a small out-of-domain parallel dataset to develop highly accurate transfer models. Furthermore, we propose an unsupervised syntactic reordering model to improve the accuracy of dependency parser transfer for non-European languages. Finally, we conduct a diverse set of experiments for the transfer of sentiment analysis systems in different data settings. A summary of our contributions are as follows: * We develop accurate dependency parsers using parallel text in an annotation projection framework. We make use of the fact that the density of word alignments is a valuable indicator of reliability in annotation projection. * We develop accurate dependency parsers in the absence of a large amount of parallel data. We use the Bible data, which is in orders of magnitude smaller than a conventional parallel dataset, to provide minimal cues for creating cross-lingual word representations. Our model is also capable of boosting the performance of annotation projection with a large amount of parallel data. Our model develops cross-lingual word representations for going beyond the traditional delexicalized direct transfer methods. Moreover, we propose a simple but effective word translation approach that brings in explicit lexical features from the target language in our direct transfer method. * We develop different syntactic reordering models that can change the source treebanks in rich-resource languages, thus preventing learning a wrong model for a non-related language. Our experimental results show substantial improvements over non-European languages. * We develop transfer methods for sentiment analysis in different data availability scenarios. We show that we can leverage cross-lingual word embeddings to create accurate sentiment analysis systems in the absence of annotated data in the target language of interest. We believe that the novelties that we introduce in this thesis indicate the usefulness of transfer methods. This is appealing in practice, especially since we suggest eliminating the requirement for annotating new datasets for low-resource languages which is expensive, if not impossible, to obtain.
343

Forced Attention for Image Captioning

Hemanth Devarapalli (5930603) 17 January 2019 (has links)
<div> <div> <div> <p>Automatic generation of captions for a given image is an active research area in Artificial Intelligence. The architectures have evolved from using metadata of the images on which classical machine learning was employed to neural networks. Two different styles of architectures evolved in the neural network space for image captioning: Encoder-Attention-Decoder architecture, and the transformer architecture. This study is an attempt to modify the attention to allow any object to be specified. An archetypical Encoder-Attention-Decoder architecture (Show, Attend, and Tell (Xu et al., 2015)) is employed as a baseline for this study, and a modification of the Show, Attend, and Tell architecture is proposed. Both the architectures are evaluated on the MSCOCO (Lin et al., 2014) dataset, and seven metrics: BLEU – 1, 2, 3, 4 (Papineni, Roukos, Ward & Zhu, 2002), METEOR (Banerjee & Lavie, 2005), ROGUE L (Lin, 2004), and CIDer (Vedantam, Lawrence & Parikh, 2015) are calculated. Finally, the statistical significance of the results is evaluated by performing paired t tests. </p> </div> </div> </div>
344

Evaluation of the usability of constraint diagrams as a visual modelling language : theoretical and empirical investigations

Fetais, Noora January 2013 (has links)
This research evaluates the constraint diagrams (CD) notation, which is a formal representation for program specification that has some promise to be used by people who are not expert in software design. Multiple methods were adopted in order to provide triangulated evidence of the potential benefits of constraint diagrams compared with other notational systems. Three main approaches were adopted for this research. The first approach was a semantic and task analysis of the CD notation. This was conducted by the application of the Cognitive Dimensions framework, which was used to examine the relative strengths and weaknesses of constraint diagrams and conventional notations in terms of the perceptive facilitation or impediments of these different representations. From this systematic analysis, we found that CD cognitively reduced the cost of exploratory design, modification, incrementation, searching, and transcription activities with regard to the cognitive dimensions: consistency, visibility, abstraction, closeness of mapping, secondary notation, premature commitment, role-expressiveness, progressive evaluation, diffuseness, provisionality, hidden dependency, viscosity, hard mental operations, and error-proneness. The second approach was an empirical evaluation of the comprehension of CD compared to natural language (NL) with computer science students. This experiment took the form of a web-based competition in which 33 participants were given instructions and training on either CD or the equivalent NL specification expressions, and then after each example, they responded to three multiple-choice questions requiring the interpretation of expressions in their particular notation. Although the CD group spent more time on the training and had less confidence, they obtained comparable interpretation scores to the NL group and took less time to answer the questions, although they had no prior experience of CD notation. The third approach was an experiment on the construction of CD. 20 participants were given instructions and training on either CD or the equivalent NL specification expressions, and then after each example, they responded to three questions requiring the construction of expressions in their particular notation. We built an editor to allow the construction of the two notations, which automatically logged their interactions. In general, for constructing program specification, the CD group had more accurate answers, they had spent less time in training, and their returns to the training examples were fewer than those of the NL group. Overall it was found that CD is understandable, usable, intuitive, and expressive with unambiguous semantic notation.
345

The use of belief networks in natural language understanding and dialog modeling.

January 2001 (has links)
Wai, Chi Man Carmen. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves 129-136). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview --- p.1 / Chapter 1.2 --- Natural Language Understanding --- p.3 / Chapter 1.3 --- BNs for Handling Speech Recognition Errors --- p.4 / Chapter 1.4 --- BNs for Dialog Modeling --- p.5 / Chapter 1.5 --- Thesis Goals --- p.8 / Chapter 1.6 --- Thesis Outline --- p.8 / Chapter 2 --- Background --- p.10 / Chapter 2.1 --- Natural Language Understanding --- p.11 / Chapter 2.1.1 --- Rule-based Approaches --- p.12 / Chapter 2.1.2 --- Stochastic Approaches --- p.13 / Chapter 2.1.3 --- Phrase-Spotting Approaches --- p.16 / Chapter 2.2 --- Handling Recognition Errors in Spoken Queries --- p.17 / Chapter 2.3 --- Spoken Dialog Systems --- p.19 / Chapter 2.3.1 --- Finite-State Networks --- p.21 / Chapter 2.3.2 --- The Form-based Approaches --- p.21 / Chapter 2.3.3 --- Sequential Decision Approaches --- p.22 / Chapter 2.3.4 --- Machine Learning Approaches --- p.24 / Chapter 2.4 --- Belief Networks --- p.27 / Chapter 2.4.1 --- Introduction --- p.27 / Chapter 2.4.2 --- Bayesian Inference --- p.29 / Chapter 2.4.3 --- Applications of the Belief Networks --- p.32 / Chapter 2.5 --- Chapter Summary --- p.33 / Chapter 3 --- Belief Networks for Natural Language Understanding --- p.34 / Chapter 3.1 --- The ATIS Domain --- p.35 / Chapter 3.2 --- Problem Formulation --- p.36 / Chapter 3.3 --- Semantic Tagging --- p.37 / Chapter 3.4 --- Belief Networks Development --- p.38 / Chapter 3.4.1 --- Concept Selection --- p.39 / Chapter 3.4.2 --- Bayesian Inferencing --- p.40 / Chapter 3.4.3 --- Thresholding --- p.40 / Chapter 3.4.4 --- Goal Identification --- p.41 / Chapter 3.5 --- Experiments on Natural Language Understanding --- p.42 / Chapter 3.5.1 --- Comparison between Mutual Information and Informa- tion Gain --- p.42 / Chapter 3.5.2 --- Varying the Input Dimensionality --- p.44 / Chapter 3.5.3 --- Multiple Goals and Rejection --- p.46 / Chapter 3.5.4 --- Comparing Grammars --- p.47 / Chapter 3.6 --- Benchmark with Decision Trees --- p.48 / Chapter 3.7 --- Performance on Natural Language Understanding --- p.51 / Chapter 3.8 --- Handling Speech Recognition Errors in Spoken Queries --- p.52 / Chapter 3.8.1 --- Corpus Preparation --- p.53 / Chapter 3.8.2 --- Enhanced Belief Network Topology --- p.54 / Chapter 3.8.3 --- BNs for Handling Speech Recognition Errors --- p.55 / Chapter 3.8.4 --- Experiments on Handling Speech Recognition Errors --- p.60 / Chapter 3.8.5 --- Significance Testing --- p.64 / Chapter 3.8.6 --- Error Analysis --- p.65 / Chapter 3.9 --- Chapter Summary --- p.67 / Chapter 4 --- Belief Networks for Mixed-Initiative Dialog Modeling --- p.68 / Chapter 4.1 --- The CU FOREX Domain --- p.69 / Chapter 4.1.1 --- Domain-Specific Constraints --- p.69 / Chapter 4.1.2 --- Two Interaction Modalities --- p.70 / Chapter 4.2 --- The Belief Networks --- p.70 / Chapter 4.2.1 --- Informational Goal Inference --- p.72 / Chapter 4.2.2 --- Detection of Missing / Spurious Concepts --- p.74 / Chapter 4.3 --- Integrating Two Interaction Modalities --- p.78 / Chapter 4.4 --- Incorporating Out-of-Vocabulary Words --- p.80 / Chapter 4.4.1 --- Natural Language Queries --- p.80 / Chapter 4.4.2 --- Directed Queries --- p.82 / Chapter 4.5 --- Evaluation of the BN-based Dialog Model --- p.84 / Chapter 4.6 --- Chapter Summary --- p.87 / Chapter 5 --- Scalability and Portability of Belief Network-based Dialog Model --- p.88 / Chapter 5.1 --- Migration to the ATIS Domain --- p.89 / Chapter 5.2 --- Scalability of the BN-based Dialog Model --- p.90 / Chapter 5.2.1 --- Informational Goal Inference --- p.90 / Chapter 5.2.2 --- Detection of Missing / Spurious Concepts --- p.92 / Chapter 5.2.3 --- Context Inheritance --- p.94 / Chapter 5.3 --- Portability of the BN-based Dialog Model --- p.101 / Chapter 5.3.1 --- General Principles for Probability Assignment --- p.101 / Chapter 5.3.2 --- Performance of the BN-based Dialog Model with Hand- Assigned Probabilities --- p.105 / Chapter 5.3.3 --- Error Analysis --- p.108 / Chapter 5.4 --- Enhancements for Discourse Query Understanding --- p.110 / Chapter 5.4.1 --- Combining Trained and Handcrafted Probabilities --- p.110 / Chapter 5.4.2 --- Handcrafted Topology for BNs --- p.111 / Chapter 5.4.3 --- Performance of the Enhanced BN-based Dialog Model --- p.117 / Chapter 5.5 --- Chapter Summary --- p.120 / Chapter 6 --- Conclusions --- p.122 / Chapter 6.1 --- Summary --- p.122 / Chapter 6.2 --- Contributions --- p.126 / Chapter 6.3 --- Future Work --- p.127 / Bibliography --- p.129 / Chapter A --- The Two Original SQL Query --- p.137 / Chapter B --- "The Two Grammars, GH and GsA" --- p.139 / Chapter C --- Probability Propagation in Belief Networks --- p.149 / Chapter C.1 --- Computing the aposteriori probability of P*(G) based on in- put concepts --- p.151 / Chapter C.2 --- Computing the aposteriori probability of P*(Cj) by backward inference --- p.154 / Chapter D --- Total 23 Concepts for the Handcrafted BN --- p.156
346

A robust unification-based parser for Chinese natural language processing.

January 2001 (has links)
Chan Shuen-ti Roy. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves 168-175). / Abstracts in English and Chinese. / Chapter 1. --- Introduction --- p.12 / Chapter 1.1. --- The nature of natural language processing --- p.12 / Chapter 1.2. --- Applications of natural language processing --- p.14 / Chapter 1.3. --- Purpose of study --- p.17 / Chapter 1.4. --- Organization of this thesis --- p.18 / Chapter 2. --- Organization and methods in natural language processing --- p.20 / Chapter 2.1. --- Organization of natural language processing system --- p.20 / Chapter 2.2. --- Methods employed --- p.22 / Chapter 2.3. --- Unification-based grammar processing --- p.22 / Chapter 2.3.1. --- Generalized Phase Structure Grammar (GPSG) --- p.27 / Chapter 2.3.2. --- Head-driven Phrase Structure Grammar (HPSG) --- p.31 / Chapter 2.3.3. --- Common drawbacks of UBGs --- p.33 / Chapter 2.4. --- Corpus-based processing --- p.34 / Chapter 2.4.1. --- Drawback of corpus-based processing --- p.35 / Chapter 3. --- Difficulties in Chinese language processing and its related works --- p.37 / Chapter 3.1. --- A glance at the history --- p.37 / Chapter 3.2. --- Difficulties in syntactic analysis of Chinese --- p.37 / Chapter 3.2.1. --- Writing system of Chinese causes segmentation problem --- p.38 / Chapter 3.2.2. --- Words serving multiple grammatical functions without inflection --- p.40 / Chapter 3.2.3. --- Word order of Chinese --- p.42 / Chapter 3.2.4. --- The Chinese grammatical word --- p.43 / Chapter 3.3. --- Related works --- p.45 / Chapter 3.3.1. --- Unification grammar processing approach --- p.45 / Chapter 3.3.2. --- Corpus-based processing approach --- p.48 / Chapter 3.4. --- Restatement of goal --- p.50 / Chapter 4. --- SERUP: Statistical-Enhanced Robust Unification Parser --- p.54 / Chapter 5. --- Step One: automatic preprocessing --- p.57 / Chapter 5.1. --- Segmentation of lexical tokens --- p.57 / Chapter 5.2. --- "Conversion of date, time and numerals" --- p.61 / Chapter 5.3. --- Identification of new words --- p.62 / Chapter 5.3.1. --- Proper nouns ´ؤ Chinese names --- p.63 / Chapter 5.3.2. --- Other proper nouns and multi-syllabic words --- p.67 / Chapter 5.4. --- Defining smallest parsing unit --- p.82 / Chapter 5.4.1. --- The Chinese sentence --- p.82 / Chapter 5.4.2. --- Breaking down the paragraphs --- p.84 / Chapter 5.4.3. --- Implementation --- p.87 / Chapter 6. --- Step Two: grammar construction --- p.91 / Chapter 6.1. --- Criteria in choosing a UBG model --- p.91 / Chapter 6.2. --- The grammar in details --- p.92 / Chapter 6.2.1. --- The PHON feature --- p.93 / Chapter 6.2.2. --- The SYN feature --- p.94 / Chapter 6.2.3. --- The SEM feature --- p.98 / Chapter 6.2.4. --- Grammar rules and features principles --- p.99 / Chapter 6.2.5. --- Verb phrases --- p.101 / Chapter 6.2.6. --- Noun phrases --- p.104 / Chapter 6.2.7. --- Prepositional phrases --- p.113 / Chapter 6.2.8. --- """Ba2"" and ""Bei4"" constructions" --- p.115 / Chapter 6.2.9. --- The terminal node S --- p.119 / Chapter 6.2.10. --- Summary of phrasal rules --- p.121 / Chapter 6.2.11. --- Morphological rules --- p.122 / Chapter 7. --- Step Three: resolving structural ambiguities --- p.128 / Chapter 7.1. --- Sources of ambiguities --- p.128 / Chapter 7.2. --- The traditional practices: an illustration --- p.132 / Chapter 7.3. --- Deficiency of current practices --- p.134 / Chapter 7.4. --- A new point of view: Wu (1999) --- p.140 / Chapter 7.5. --- Improvement over Wu (1999) --- p.142 / Chapter 7.6. --- Conclusion on semantic features --- p.146 / Chapter 8. --- "Implementation, performance and evaluation" --- p.148 / Chapter 8.1. --- Implementation --- p.148 / Chapter 8.2. --- Performance and evaluation --- p.150 / Chapter 8.2.1. --- The test set --- p.150 / Chapter 8.2.2. --- Segmentation of lexical tokens --- p.150 / Chapter 8.2.3. --- New word identification --- p.152 / Chapter 8.2.4. --- Parsing unit segmentation --- p.156 / Chapter 8.2.5. --- The grammar --- p.158 / Chapter 8.3. --- Overall performance of SERUP --- p.162 / Chapter 9. --- Conclusion --- p.164 / Chapter 9.1. --- Summary of this thesis --- p.164 / Chapter 9.2. --- Contribution of this thesis --- p.165 / Chapter 9.3. --- Future work --- p.166 / References --- p.168 / Appendix I --- p.176 / Appendix II --- p.181 / Appendix III --- p.183
347

Automatic construction and adaptation of wrappers for semi-structured web documents.

January 2003 (has links)
Wong Tak Lam. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 88-94). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Wrapper Induction for Semi-structured Web Documents --- p.1 / Chapter 1.2 --- Adapting Wrappers to Unseen Web Sites --- p.6 / Chapter 1.3 --- Thesis Contributions --- p.7 / Chapter 1.4 --- Thesis Organization --- p.8 / Chapter 2 --- Related Work --- p.10 / Chapter 2.1 --- Related Work on Wrapper Induction --- p.10 / Chapter 2.2 --- Related Work on Wrapper Adaptation --- p.16 / Chapter 3 --- Automatic Construction of Hierarchical Wrappers --- p.20 / Chapter 3.1 --- Hierarchical Record Structure Inference --- p.22 / Chapter 3.2 --- Extraction Rule Induction --- p.30 / Chapter 3.3 --- Applying Hierarchical Wrappers --- p.38 / Chapter 4 --- Experimental Results for Wrapper Induction --- p.40 / Chapter 5 --- Adaptation of Wrappers for Unseen Web Sites --- p.52 / Chapter 5.1 --- Problem Definition --- p.52 / Chapter 5.2 --- Overview of Wrapper Adaptation Framework --- p.55 / Chapter 5.3 --- Potential Training Example Candidate Identification --- p.58 / Chapter 5.3.1 --- Useful Text Fragments --- p.58 / Chapter 5.3.2 --- Training Example Generation from the Unseen Web Site --- p.60 / Chapter 5.3.3 --- Modified Nearest Neighbour Classification --- p.63 / Chapter 5.4 --- Machine Annotated Training Example Discovery and New Wrap- per Learning --- p.64 / Chapter 5.4.1 --- Text Fragment Classification --- p.64 / Chapter 5.4.2 --- New Wrapper Learning --- p.69 / Chapter 6 --- Case Study and Experimental Results for Wrapper Adapta- tion --- p.71 / Chapter 6.1 --- Case Study on Wrapper Adaptation --- p.71 / Chapter 6.2 --- Experimental Results --- p.73 / Chapter 6.2.1 --- Book Domain --- p.74 / Chapter 6.2.2 --- Consumer Electronic Appliance Domain --- p.79 / Chapter 7 --- Conclusions and Future Work --- p.83 / Bibliography --- p.88 / Chapter A --- Detailed Performance of Wrapper Induction for Book Do- main --- p.95 / Chapter B --- Detailed Performance of Wrapper Induction for Consumer Electronic Appliance Domain --- p.99
348

Ontology Learning and Information Extraction for the Semantic Web

Kavalec, Martin January 2006 (has links)
The work gives overview of its three main topics: semantic web, information extraction and ontology learning. A method for identification relevant information on web pages is described and experimentally tested on pages of companies offering products and services. The method is based on analysis of a sample web pages and their position in the Open Directory catalogue. Furthermore, a modfication of association rules mining algorithm is proposed and experimentally tested. In addition to an identification of a relation between ontology concepts, it suggest possible naming of the relation.
349

Investigação de métodos de desambiguação lexical de sentidos de verbos do português do Brasil / Research of word sense disambiguation methods for verbs in brazilian portuguese

Marco Antonio Sobrevilla Cabezudo 28 August 2015 (has links)
A Desambiguação Lexical de Sentido (DLS) consiste em determinar o sentido mais apropriado da palavra em um contexto determinado, utilizando-se um repositório de sentidos pré-especificado. Esta tarefa é importante para outras aplicações, por exemplo, a tradução automática. Para o inglês, a DLS tem sido amplamente explorada, utilizando diferentes abordagens e técnicas, contudo, esta tarefa ainda é um desafio para os pesquisadores em semântica. Analisando os resultados dos métodos por classes gramaticais, nota-se que todas as classes não apresentam os mesmos resultados, sendo que os verbos são os que apresentam os piores resultados. Estudos ressaltam que os métodos de DLS usam informações superficiais e os verbos precisam de informação mais profunda para sua desambiguação, como frames sintáticos ou restrições seletivas. Para o português, existem poucos trabalhos nesta área e só recentemente tem-se investigado métodos de uso geral. Além disso, salienta-se que, nos últimos anos, têm sido desenvolvidos recursos lexicais focados nos verbos. Nesse contexto, neste trabalho de mestrado, visou-se investigar métodos de DLS de verbos em textos escritos em português do Brasil. Em particular, foram explorados alguns métodos tradicionais da área e, posteriormente, foi incorporado conhecimento linguístico proveniente da Verbnet.Br. Para subsidiar esta investigação, o córpus CSTNews foi anotado com sentidos de verbos usando a WordNet-Pr como repositório de sentidos. Os resultados obtidos mostraram que os métodos de DLS investigados não conseguiram superar o baseline mais forte e que a incorporação de conhecimento da VerbNet.Br produziu melhorias nos métodos, porém, estas melhorias não foram estatisticamente significantes. Algumas contribuições deste trabalho de mestrado foram um córpus anotado com sentidos de verbos, a criação de uma ferramenta que auxilie a anotação de sentidos, a investigação de métodos de DLS e o uso de informações especificas de verbos (provenientes da VerbNet.Br) na DLS de verbos. / Word Sense Disambiguation (WSD) aims at identifying the appropriate sense of a word in a given context, using a pre-specified sense-repository. This task is important to other applications as Machine Translation. For English, WSD has been widely studied, using different approaches and techniques, however, this task is still a challenge for researchers in Semantics. Analyzing the performance of different methods by the morphosyntactic class, note that not all classes have the same results, and the worst results are obtained for Verbs. Studies highlight that WSD methods use shallow information and Verbs need deeper information for its disambiguation, like syntactic frames or selectional restrictions. For Portuguese, there are few works in WSD and, recently, some works for general purpose. In addition, it is noted that, recently, have been developed lexical resources focused on Verbs. In this context, this master work aimed at researching WSD methods for verbs in texts written in Brazilian Portuguese. In particular, traditional WSD methods were explored and, subsequently, linguistic knowledge of VerbNet.Br was incorporated in these methods. To support this research, CSTNews corpus was annotated with verb senses using the WordNet-Pr as a sense-repository. The results showed that explored WSD methods did not outperform the hard baseline and the incorporation of VerbNet.Br knowledge yielded improvements in the methods, however, these improvements were not statistically significant. Some contributions of this work were the sense-annotated corpus, the creation of a tool for support the sense-annotation, the research of WSD methods for verbs and the use of specific information of verbs (from VerbNet.Br) in the WSD of verbs.
350

Sumarização automática de opiniões baseada em aspectos / Automatic aspect-based opinion summarization

Roque Enrique López Condori 24 August 2015 (has links)
A sumarização de opiniões, também conhecida como sumarização de sentimentos, é a tarefa que consiste em gerar automaticamente sumários para um conjunto de opiniões sobre uma entidade específica. Uma das principais abordagens para gerar sumários de opiniões é a sumarização baseada em aspectos. A sumarização baseada em aspectos produz sumários das opiniões para os principais aspectos de uma entidade. As entidades normalmente referem-se a produtos, serviços, organizações, entre outros, e os aspectos são atributos ou componentes das entidades. Nos últimos anos, essa tarefa tem ganhado muita relevância diante da grande quantidade de informação online disponível na web e do interesse cada vez maior em conhecer a avaliação dos usuários sobre produtos, empresas, pessoas e outros. Infelizmente, para o Português do Brasil, pouco se tem pesquisado nessa área. Nesse cenário, neste projeto de mestrado, investigou-se o desenvolvimento de alguns métodos de sumarização de opiniões com base em aspectos. Em particular, foram implementados quatro métodos clássicos da literatura, extrativos e abstrativos. Esses métodos foram analisados em cada uma de suas fases e, como consequência dessa análise, produziram-se duas propostas para gerar sumários de opiniões. Essas duas propostas tentam utilizar as principais vantagens dos métodos clássicos para gerar melhores sumários. A fim de analisar o desempenho dos métodos implementados, foram realizados experimentos em função de três medidas de avaliação tradicionais da área: informatividade, qualidade linguística e utilidade do sumário. Os resultados obtidos mostram que os métodos propostos neste trabalho são competitivos com os métodos da literatura e, em vários casos, os superam. / Opinion summarization, also known as sentiment summarization, is the task of automatically generating summaries for a set of opinions about a specific entity. One of the main approaches to generate opinion summaries is aspect-based opinion summarization. Aspect-based opinion summarization generates summaries of opinions for the main aspects of an entity. Entities could be products, services, organizations or others, and aspects are attributes or components of them. In the last years, this task has gained much importance because of the large amount of online information available on the web and the increasing interest in learning the user evaluation about products, companies, people and others. Unfortunately, for Brazilian Portuguese language, there are few researches in that area. In this scenario, this master\'s project investigated the development of some aspect-based opinion summarization methods. In particular, it was implemented four classical methods of the literature, extractive and abstractive ones. These methods were analyzed in each of its phases and, as a result of this analysis, it was produced two proposals to generate summaries of opinions. Both proposals attempt to use the main advantages of the classical methods to generate better summaries. In order to analyze the performance of the implemented methods, experiments were carried out according to three traditional evaluation measures: informativeness, linguistic quality and usefulness of the summary. The results show that the proposed methods in this work are competitive with the classical methods and, in many cases, they got the best performance.

Page generated in 0.1256 seconds