• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 28
  • 11
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 59
  • 59
  • 55
  • 27
  • 25
  • 22
  • 20
  • 20
  • 19
  • 16
  • 15
  • 11
  • 11
  • 11
  • 10
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Desambiguação lexical de revisões de itens aplicada em sistemas de recomendação / Word sense disambiguation of items revisions applied in recommendation systems

Marinho, Ronnie Shida 14 May 2018 (has links)
Com o intuito de auxiliar usuários na procura por produtos relevantes, sistemas Web integraram módulos de recomendação de itens, que selecionam automaticamente conteúdo de acordo com os interesses de cada indivíduo. Apesar de existirem diversas abordagens para calcular recomendações de acordo com interações disponíveis no sistema, a maioria delas sofre com a carência de informações utilizadas para caracterizar as preferências dos usuários e as descrições dos itens. Trabalhos recentes sobre sistemas de recomendação têm estudado a possibilidade de utilizar revisões de usuários como fonte de metadados, já que são criadas colaborativamente pelos indivíduos. Entretanto, ainda carecem de estudos sobre como organizar e estruturar os dados de maneira semântica. Desta maneira, este trabalho tem como objetivo desenvolver técnicas de construção de representação de itens baseadas em descrições colaborativas para um sistema de recomendação. Objetiva-se analisar o impacto que métodos distintos de desambiguação lexical de sentido causam na precisão da recomendação, sendo avaliada no cenário de predição de notas. A partir dessa estruturação, é possível caracterizar os itens e usuários de maneira mais eficiente, favorecendo o cálculo da recomendação de acordo com as preferências do indivíduo. / Web systems integrate recommending modules for items, which automatically select content according to the interest of each individual in order to help users in the search for relevant products. Although there are diverse recommending approaches to calculate recommendations according to users preferences, most of them lack information to characterize users preferences and item descriptions. Recent researches on recommender systems have studied the possibility of using users reviews as source of metadata, because users create them collaboratively. However, the literature still lacks studies about how to organize and structure data in a semantic manner. Therefore, this study aims to develop techniques for constructing the representation of items based on collaborative descriptions for recommender systems. For this reason, it is also aimed to analyze the impact caused by distinct methods of word sense disambiguation on the precision of recommendations, which we analyzed in the scenario of ratings predictions. Our results showed that we can characterize users and items in a more efficient way, favoring the calculation of recommendations according to users preferences.
32

Desambiguação lexical de revisões de itens aplicada em sistemas de recomendação / Word sense disambiguation of items revisions applied in recommendation systems

Ronnie Shida Marinho 14 May 2018 (has links)
Com o intuito de auxiliar usuários na procura por produtos relevantes, sistemas Web integraram módulos de recomendação de itens, que selecionam automaticamente conteúdo de acordo com os interesses de cada indivíduo. Apesar de existirem diversas abordagens para calcular recomendações de acordo com interações disponíveis no sistema, a maioria delas sofre com a carência de informações utilizadas para caracterizar as preferências dos usuários e as descrições dos itens. Trabalhos recentes sobre sistemas de recomendação têm estudado a possibilidade de utilizar revisões de usuários como fonte de metadados, já que são criadas colaborativamente pelos indivíduos. Entretanto, ainda carecem de estudos sobre como organizar e estruturar os dados de maneira semântica. Desta maneira, este trabalho tem como objetivo desenvolver técnicas de construção de representação de itens baseadas em descrições colaborativas para um sistema de recomendação. Objetiva-se analisar o impacto que métodos distintos de desambiguação lexical de sentido causam na precisão da recomendação, sendo avaliada no cenário de predição de notas. A partir dessa estruturação, é possível caracterizar os itens e usuários de maneira mais eficiente, favorecendo o cálculo da recomendação de acordo com as preferências do indivíduo. / Web systems integrate recommending modules for items, which automatically select content according to the interest of each individual in order to help users in the search for relevant products. Although there are diverse recommending approaches to calculate recommendations according to users preferences, most of them lack information to characterize users preferences and item descriptions. Recent researches on recommender systems have studied the possibility of using users reviews as source of metadata, because users create them collaboratively. However, the literature still lacks studies about how to organize and structure data in a semantic manner. Therefore, this study aims to develop techniques for constructing the representation of items based on collaborative descriptions for recommender systems. For this reason, it is also aimed to analyze the impact caused by distinct methods of word sense disambiguation on the precision of recommendations, which we analyzed in the scenario of ratings predictions. Our results showed that we can characterize users and items in a more efficient way, favoring the calculation of recommendations according to users preferences.
33

Desambiguação automática de substantivos em corpus do português brasileiro / Word sense disambiguation in Brazilian Portuguese corpus

Viviane Santos da Silva 19 August 2016 (has links)
O fenômeno da ambiguidade lexical foi o tópico central desta pesquisa, especialmente no que diz respeito às relações entre acepções de formas gráficas ambíguas e aos padrões de distribuição de acepções de palavras polissêmicas na língua, isto é, de palavras cujas acepções são semanticamente relacionadas. Este trabalho situa-se como uma proposta de interface entre explorações computacionais da ambiguidade lexical, especificamente de processamento de linguagem natural, e investigações de cunho teórico sobre o fenômeno do significado lexical. Partimos das noções de polissemia e de homonímia como correspondentes, respectivamente, ao caso de uma palavra com múltiplas acepções relacionadas e ao de duas (ou mais) palavras cujas formas gráficas coincidem, mas que apresentam acepções não relacionadas sincronicamente. Como objetivo último deste estudo, pretendia-se confirmar se as palavras mais polissêmicas teriam acepções menos uniformemente distribuídas no corpus, apresentando acepções predominantes, que ocorreriam com maior frequência. Para analisar esses aspectos, implementamos um algoritmo de desambiguação lexical, uma versão adaptada do algoritmo de Lesk (Lesk, 1986; Jurafsky & Martin, 2000), escolhido com base nos recursos linguísticos disponíveis para o português. Tendo como hipótese a noção de que palavras mais frequentes na língua tenderiam a ser mais polissêmicas, selecionamos do corpus (Mac-Morpho) aquelas com maiores ocorrências. Considerando-se o interesse em palavras de conteúdo e em casos de ambiguidade mais estritamente em nível semântico, optamos por realizar os testes apresentados neste trabalho apenas para substantivos. Os resultados obtidos com o algoritmo de desambiguação que implementamos superaram o método baseline baseado na heurística da acepção mais frequente: obtivemos 63% de acertos contra 50% do baseline para o total dos dados desambiguados. Esses resultados foram obtidos através do procedimento de desambiguação de pseudo-palavras (formadas ao acaso), utilizado em casos em que não se tem à disposição corpora semanticamente anotados. No entanto, em razão da dependência de inventários fixos de acepções oriundos de dicionários, pesquisamos maneiras alternativas de categorizar as acepções de uma palavra. Tomando como base o trabalho de Sproat & VanSanten (2001), implementamos um método que permite atribuir valores numéricos que atestam o quanto uma palavra se afastou da monossemia dentro de um determinado corpus. Essa medida, cunhada pelos autores do trabalho original como índice de polissemia, baseia-se no agrupamento de palavras co-ocorrentes à palavra-alvo da desambiguação de acordo com suas similaridades contextuais. Propusemos, neste trabalho, o uso de uma segunda medida, mencionada pelos autores apenas como um exemplo das aplicações potenciais do método a serem exploradas: a clusterização de co-ocorrentes com base em similaridades de contextos de uso. Essa segunda medida é obtida de forma que se possa verificar a proximidade entre acepções e a quantidade de acepções que uma palavra exibe no corpus. Alguns aspectos apontados nos resultados indicam o potencial do método de clusterização: os agrupamentos de co-ocorrentes obtidos são ponderados, ressaltando os grupos mais proeminentes de vizinhos da palavra-alvo; o fato de que os agrupamentos aproximam-se uns dos outros por medidas de similaridade contextual, o que pode servir para distinguir tendências homonímicas ou polissêmicas. Como exemplo, temos os clusters obtidos para a palavra produção: um relativo à ideia de produção literária e outro relativo à de produção agrícola. Esses dois clusters apresentaram distanciamento considerável, situando-se na faixa do que seria considerado um caso de polissemia, e apresentaram ambos pesos significativos, isto é, foram compostos por palavras mais relevantes. Identificamos três fatores principais que limitaram as análises a partir dos dados obtidos: o viés político-jornalístico do corpus que utilizamos (Mac-Morpho) e a necessidade de serem feitos mais testes variando os parâmetros de seleção de coocorrentes, uma vez que os parâmetros que utilizamos devem variar para outros corpora e, especialmente, pelo fato de termos realizados poucos testes para definir quais valores utilizaríamos para esses parâmetro, que são decisivos para a quantidade de palavras co-ocorrentes relevantes para os contextos de uso da palavra-alvo. Considerando-se tanto as vantagens quanto as limitações que observamos a partir dos resultados da clusterização, planejamos delinear um método sincrônico (que prescinde da documentação histórica das palavras) e computacional que permita distinguir casos de polissemia e de homonímia de forma mais sistemática e abrangendo uma maior quantidade de dados. Entendemos que um método dessa natureza pode ser de grade valia para os estudos do significado no nível lexical, permitindo o estabelecimento de um método objetivo e baseado em dados de uso da língua que vão além de exemplos pontuais. / The phenomenon of lexical ambiguity was the central topic of this research, especially with regard to relations between meanings of ambiguous graphic forms, and to patterns of distribution of the meanings of polysemous words in the language, that is, of words whose meanings are semantically related. This work is set on the interface between computational explorations of lexical ambiguity, specifically natural language processing, and theoretical investigations on the nature of research on the lexical meaning phenomenon. We assume the notions of polysemy and homonymy as corresponding, respectively, to the case of a word with multiple related meanings, and two (or more) words whose graphic forms coincide, but have unrelated meanings. The ultimate goal of this study was to confirm that the most polysemous words have meanings less evenly distributed in the corpus, with predominant meanings which occur more frequently. To examine these aspects, we implemented a word sense disambiguation algorithm, an adapted version of Lesk algorithm (Lesk, 1986; Jurafsky & Martin, 2000), chosen on the basis of the availability of language resources in Portuguese. From the hypothesis that the most frequent words in the language tend to be more polysemic, we selected from the corpus (Mac-Morpho) those words with the highest number occurrences. Considering our interest in content words and in cases of ambiguity more strictly to the semantic level, we decided to conduct the tests presented in this research only for nouns. The results obtained with the disambiguation algorithm implemented surpassed those of the baseline method based on the heuristics of the most frequent sense: we obtained 63% accuracy against 50% of baseline for all the disambiguated data. These results were obtained with the disambiguation procedure of pseudowords (formed at random), which used in cases where semantically annotated corpora are not available. However, due to the dependence of this disambiguation method on fixed inventories of meanings from dictionaries, we searched for alternative ways of categorizing the meanings of a word. Based on the work of Sproat & VanSanten (2001), we implemented a method for assigning numerical values that indicate how much one word is away from monosemy within a certain corpus. This measure, named by the authors of the original work as polysemy index, groups co-occurring words of the target noun according to their contextual similarities. We proposed in this paper the use of a second measure, mentioned by the authors as an example of the potential applications of the method to be explored: the clustering of the co-occurrent words based on their similarities of contexts of use. This second measurement is obtained so as to show the closeness of meanings and the amount of meanings that a word displays in the corpus. Some aspects pointed out in the results indicate the potential of the clustering method: the obtained co-occurring clusters are weighted, highlighting the most prominent groups of neighbors of the target word; the fact that the clusters aproximate from each other to each other on the basis of contextual similarity measures, which can be used to distinguish homonymic from polysemic trends. As an example, we have the clusters obtained for the word production, one referring to the idea of literary production, and the other referring to the notion of agricultural production. These two clusters exhibited considerable distance, standing in the range of what would be considered a case of polysemy, and both showed significant weights, that is, were composed of significant and distintictive words. We identified three main factors that have limited the analysis of the data: the political-journalistic bias of the corpus we use (Mac-Morpho) and the need for further testing by varying the selection parameters of relevant cooccurent words, since the parameters used shall vary for other corpora, and especially because of the fact that we conducted only a few tests to determine the values for these parameters, which are decisive for the amount of relevant co-occurring words for the target word. Considering both the advantages and the limitations we observe from the results of the clusterization method, we plan to design a synchronous (which dispenses with the historical documentation of the words) and, computational method to distinguish cases of polysemy and homonymy more systematically and covering a larger amount of data. We understand that a method of this nature can be invaluable for studies of the meaning on the lexical level, allowing the establishment of an objective method based on language usage data and, that goes beyond specific examples.
34

A Hybrid Environment for Syntax-Semantic Tagging

Padró, Lluís 06 February 1998 (has links)
The thesis describes the application of the relaxation labelling algorithm to NLP disambiguation. Language is modelled through context constraint inspired on Constraint Grammars. The constraints enable the use of a real value statind "compatibility". The technique is applied to POS tagging, Shallow Parsing and Word Sense Disambigation. Experiments and results are reported. The proposed approach enables the use of multi-feature constraint models, the simultaneous resolution of several NL disambiguation tasks, and the collaboration of linguistic and statistical models. / La tesi descriu l'aplicació de l'algorisme d'etiquetat per relaxacio (relaxation labelling) a la desambiguació del llenguatge natural. La llengua es modela mitjançant restriccions de context inspirades en les Constraint Grammars. Les restriccions permeten l'ús d'un valor real que n'expressa la "compatibilitat". La tècnica s'aplica a la desambiguació morfosintàctica (POS tagging), a l'anàlisi sintàctica superficial (Shallow Parsing) i a la desambiguació semàntica (Word Sense Disambigation), i se'n presenten experiments i resultats. L'enfoc proposat permet la utilització de models de restriccions amb trets múltiples, la resolució simultània de diverses tasques de desambiguació del llenguatge natural, i la col·laboració de models linguístics i estadístics.
35

Μεθοδολογία αυτόματου σημασιολογικού σχολιασμού στο περιεχόμενο ιστοσελίδων

Σπύρος, Γεώργιος 14 December 2009 (has links)
Στις μέρες μας η χρήση του παγκόσμιου ιστού έχει εξελιχθεί σε ένα κοινωνικό φαινόμενο. Η εξάπλωσή του είναι συνεχής και εκθετικά αυξανόμενη. Στα χρόνια που έχουν μεσολαβήσει από την εμφάνισή του, οι χρήστες έχουν αποκτήσει ένα βαθμό εμπειρίας και έχει γίνει από πλευράς τους ένα σύνολο αποδοχών βασισμένων σε αυτή ακριβώς την εμπειρία από τη χρήση του παγκόσμιου ιστού. Πιο συγκεκριμένα έχει γίνει αντιληπτό από τους χρήστες το γεγονός ότι οι ιστοσελίδες με τις οποίες αλληλεπιδρούν καθημερινά σχεδόν είναι δημιουργήματα κάποιων άλλων χρηστών. Επίσης έχει γίνει αντιληπτό ότι ο κάθε χρήστης μπορεί να δημιουργήσει τη δική του ιστοσελίδα και μάλιστα να περιλάβει σε αυτή αναφορές προς μια άλλη ιστοσελίδα κάποιου άλλου χρήστη. Οι αναφορές αυτές όμως, συνήθως δεν εμφανίζονται απλά και μόνο με τη μορφή ενός υπερσυνδέσμου. Τις περισσότερες φορές υπάρχει και κείμενο που τις συνοδεύει και που παρέχει πληροφορίες για το περιεχόμενο της αναφερόμενης ιστοσελίδας. Σε αυτή τη διπλωματική εργασία περιγράφουμε μια μεθοδολογία για τον αυτόματο σημασιολογικό σχολιασμό του περιεχομένου ιστοσελίδων. Τα εργαλεία και οι τεχνικές που περιγράφονται βασίζονται σε δύο κύριες υποθέσεις. Πρώτον, οι άνθρωποι που δημιουργούν και διατηρούν ιστοσελίδες περιγράφουν άλλες ιστοσελίδες μέσα σε αυτές. Δεύτερον, οι άνθρωποι συνδέουν τις ιστοσελίδες τους με την εκάστοτε ιστοσελίδα την οποία περιγράφουν μέσω ενός συνδέσμου αγκύρωσης (anchor link) που είναι καθαρά σημαδεμένος με μία συγκεκριμένη ετικέτα (tag) μέσα στον εκάστοτε HTML κώδικα. Ο αυτόματος σημασιολογικός σχολιασμός που επιχειρούμε για μια ιστοσελίδα ισοδυναμεί με την εύρεση μιας ετικέτας (tag) ικανής να περιγράψει το περιεχόμενο της. Η εύρεση αυτής της ετικέτας είναι μια διαδικασία που βασίζεται σε μία συγκεκριμένη μεθοδολογία που αποτελείται από ένα συγκεκριμένο αριθμό βημάτων. Κάθε βήμα από αυτά υλοποιείται με τη χρήση διαφόρων εργαλείων και τεχνικών και τροφοδοτεί με την έξοδό του την είσοδο του επόμενου βήματος. Βασική ιδέα της μεθοδολογίας είναι η συλλογή αρκετών κειμένων αγκύρωσης (anchor texts), καθώς και ενός μέρους του γειτονικού τους κειμένου, για μία ιστοσελίδα. Η συλλογή αυτή προκύπτει ύστερα από επεξεργασία αρκετών ιστοσελίδων που περιέχουν υπερσυνδέσμους προς τη συγκεκριμένη ιστοσελίδα. Η σημασιολογική ετικέτα για μια ιστοσελίδα προκύπτει από την εφαρμογή διαφόρων τεχνικών γλωσσολογικής επεξεργασίας στη συλλογή των κειμένων που την αφορούν. Έτσι προκύπτει το τελικό συμπέρασμα για το σημασιολογικό σχολιασμό του περιεχομένου της ιστοσελίδας. / Nowadays the World Wide Web usage has evolved into a social phenomenon. It’s spread is constant and it’s increasing exponentially. During the years that have passed since it’s first appearance, the users have gained a certain level of experience and they have made some acceptances through this experience. They have understood that the web pages with which they interact in their everyday web activities, are creations from some other users. It has also become clear that every user can create his own web page and include in it references to some other pages of his liking. These references don’t simply exist as hyperlinks. Most of the time they are accompanied by some text which provides useful information about the referenced page’s content. In this diploma thesis we describe a methodology for the automatic annotation of a web page’s contents. The tools and techniques that are described, are based in two main hypotheses. First, humans that create web pages describe other web pages inside them. Second, humans connect their web pages with any web page they describe via an anchor link which is clearly described with a tag in each page’s HTML code. The automatic semantic annotation that we attempt here for a web page is the process of finding a tag able to describe the page’s contents. The finding of this tag is a process based in a certain methodology which consists of a number of steps. Each step of these is implemented using various tools and techniques and his output is the next step’s input. The basic idea behind our methodology is to collect as many anchor texts as possible, along with a window of words around them, for each web page. This collection is the result of a procedure which involves the processing of many web pages that contain hyperlinks to the web page which we want to annotate. The semantic tag for a web page is derived from the usage of certain natural language processing techniques in the collection of documents that refer to the web page. Thus the final conclusion for the web page’s contents annotation is extracted.
36

The Rumble in the Disambiguation Jungle : Towards the comparison of a traditional word sense disambiguation system with a novel paraphrasing system

Smith, Kelly January 2011 (has links)
Word sense disambiguation (WSD) is the process of computationally identifying and labeling poly- semous words in context with their correct meaning, known as a sense. WSD is riddled with various obstacles that must be overcome in order to reach its full potential. One of these problems is the aspect of the representation of word meaning. Traditional WSD algorithms make the assumption that a word in a given context has only one meaning and therfore can return only one discrete sense. On the other hand, a novel approach is that a given word can have multiple senses. Studies on graded word sense assignment (Erk et al., 2009) as well as in cognitive science (Hampton, 2007; Murphy, 2002) support this theory. It has therefore been adopted in a novel, paraphrasing system which performs word sense disambiguation by returning a probability distribution over potential paraphrases (in this case synonyms) of a given word. However, it is unknown how well this type of algorithm fares against the traditional one. The current study thus examines if and how it is possible to make a comparison of the two. A method of comparison is evaluated and subsequently rejected. Reasons for this as well as suggestions for a fair and accurate comparison are presented.
37

A Minimally Supervised Word Sense Disambiguation Algorithm Using Syntactic Dependencies and Semantic Generalizations

Faruque, Md. Ehsanul 12 1900 (has links)
Natural language is inherently ambiguous. For example, the word "bank" can mean a financial institution or a river shore. Finding the correct meaning of a word in a particular context is a task known as word sense disambiguation (WSD), which is essential for many natural language processing applications such as machine translation, information retrieval, and others. While most current WSD methods try to disambiguate a small number of words for which enough annotated examples are available, the method proposed in this thesis attempts to address all words in unrestricted text. The method is based on constraints imposed by syntactic dependencies and concept generalizations drawn from an external dictionary. The method was tested on standard benchmarks as used during the SENSEVAL-2 and SENSEVAL-3 WSD international evaluation exercises, and was found to be competitive.
38

Unsupervised Knowledge-based Word Sense Disambiguation: Exploration & Evaluation of Semantic Subgraphs

Manion, Steve Lawrence January 2014 (has links)
Hypothetically, if you were told: Apple uses the apple as its logo . You would immediately detect two different senses of the word apple , these being the company and the fruit respectively. Making this distinction is the formidable challenge of Word Sense Disambiguation (WSD), which is the subtask of many Natural Language Processing (NLP) applications. This thesis is a multi-branched investigation into WSD, that explores and evaluates unsupervised knowledge-based methods that exploit semantic subgraphs. The nature of research covered by this thesis can be broken down to: 1. Mining data from the encyclopedic resource Wikipedia, to visually prove the existence of context embedded in semantic subgraphs 2. Achieving disambiguation in order to merge concepts that originate from heterogeneous semantic graphs 3. Participation in international evaluations of WSD across a range of languages 4. Treating WSD as a classification task, that can be optimised through the iterative construction of semantic subgraphs The contributions of each chapter are ranged, but can be summarised by what has been produced, learnt, and raised throughout the thesis. Furthermore an API and several resources have been developed as a by-product of this research, all of which can be accessed by visiting the author’s home page at http://www.stevemanion.com. This should enable researchers to replicate the results achieved in this thesis and build on them if they wish.
39

Uma abordagem híbrida relacional para a desambiguação lexical de sentido na tradução automática / A hybrid relational approach for word sense disambiguation in machine translation

Specia, Lucia 28 September 2007 (has links)
A comunicação multilíngue é uma tarefa cada vez mais imperativa no cenário atual de grande disseminação de informações em diversas línguas. Nesse contexto, são de grande relevância os sistemas de tradução automática, que auxiliam tal comunicação, automatizando-a. Apesar de ser uma área de pesquisa bastante antiga, a Tradução Automática ainda apresenta muitos problemas. Um dos principais problemas é a ambigüidade lexical, ou seja, a necessidade de escolha de uma palavra, na língua alvo, para traduzir uma palavra da língua fonte quando há várias opções de tradução. Esse problema se mostra ainda mais complexo quando são identificadas apenas variações de sentido nas opções de tradução. Ele é denominado, nesse caso, \"ambigüidade lexical de sentido\". Várias abordagens têm sido propostas para a desambiguação lexical de sentido, mas elas são, em geral, monolíngues (para o inglês) e independentes de aplicação. Além disso, apresentam limitações no que diz respeito às fontes de conhecimento que podem ser exploradas. Em se tratando da língua portuguesa, em especial, não há pesquisas significativas voltadas para a resolução desse problema. O objetivo deste trabalho é a proposta e desenvolvimento de uma nova abordagem de desambiguação lexical de sentido, voltada especificamente para a tradução automática, que segue uma metodologia híbrida (baseada em conhecimento e em córpus) e utiliza um formalismo relacional para a representação de vários tipos de conhecimentos e de exemplos de desambiguação, por meio da técnica de Programação Lógica Indutiva. Experimentos diversos mostraram que a abordagem proposta supera abordagens alternativas para a desambiguação multilíngue e apresenta desempenho superior ou comparável ao do estado da arte em desambiguação monolíngue. Adicionalmente, tal abordagem se mostrou efetiva como mecanismo auxiliar para a escolha lexical na tradução automática estatística / Crosslingual communication has become a very imperative task in the current scenario with the increasing amount of information dissemination in several languages. In this context, machine translation systems, which can facilitate such communication by providing automatic translations, are of great importance. Although research in Machine Translation dates back to the 1950\'s, the area still has many problems. One of the main problems is that of lexical ambiguity, that is, the need for lexical choice when translating a source language word that has several translation options in the target language. This problem is even more complex when only sense variations are found in the translation options, a problem named \"sense ambiguity\". Several approaches have been proposed for word sense disambiguation, but they are in general monolingual (for English) and application-independent. Moreover, they have limitations regarding the types of knowledge sources that can be exploited. Particularly, there is no significant research aiming to word sense disambiguation involving Portuguese. The goal of this PhD work is the proposal and development of a novel approach for word sense disambiguation which is specifically designed for machine translation, follows a hybrid methodology (knowledge and corpus-based), and employs a relational formalism to represent various kinds of knowledge sources and disambiguation examples, by using Inductive Logic Programming. Several experiments have shown that the proposed approach overcomes alternative approaches in multilingual disambiguation and achieves higher or comparable results to the state of the art in monolingual disambiguation. Additionally, the approach has shown to effectively assist lexical choice in a statistical machine translation system
40

Klasifikátor pro sémantické vzory užívání anglických sloves / Classifier for semantic patterns of English verbs

Kríž, Vincent January 2012 (has links)
The goal of the diploma thesis is to design, implement and evaluate classifiers for automatic classification of semantic patterns of English verbs according to a pattern lexicon that draws on the Corpus Pattern Analysis. We use a pilot collection of 30 sample English verbs as training and test data sets. We employ standard methods of machine learning. In our experiments we use decision trees, k-nearest neighbourghs (kNN), support vector machines (SVM) and Adaboost algorithms. Among other things we concentrate on feature design and selection. We experiment with both morpho-syntactic and semantic features. Our results show that the morpho-syntactic features are the most important for statistically-driven semantic disambiguation. Nevertheless, for some verbs the use of semantic features plays an important role.

Page generated in 0.3713 seconds