Global ETD Search

51	A metáfora e a sua representação em sistemas de processamento automático de línguas naturais Oliveira, Ana Eliza Barbosa de [UNESP] 14 March 2006 (has links) (PDF) Made available in DSpace on 2014-06-11T19:26:50Z (GMT). No. of bitstreams: 0 Previous issue date: 2006-03-14Bitstream added on 2014-06-13T19:14:10Z : No. of bitstreams: 1 oliveira_aeb_me_arafcl.pdf: 1292834 bytes, checksum: e5fd8004cbadf61fb895ca243381d7a0 (MD5) / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) / Este trabalho tem como proposta (i) o estudo da metáfora per se (em oposição, por exemplo, a um estudo aplicado da metáfora) da perspectiva lingüística, isto é, o estudo da metáfora enquanto uma expressão da linguagem natural e (ii) a investigação de uma representação formal da metáfora para fins de implementação em sistemas de processamento automático de línguas naturais. A metodologia que norteia o desenvolvimento da proposta, que se insere em um contexto interdisciplinar, focaliza dois domínios: o Domínio Lingüístico-Cognitivo, em que se investiga a expressão lingüística e o suporte cognitivo da metáfora, ou seja, a metáfora enquanto um produto resultante de recursos lingüísticos e não- lingüísticos; e o Domínio Lingüístico-Computacional, em que se investiga a representação formal da produção e da interpretação da metáfora para fins computacionais. Como delimitadores dessas investigações, adotam-se os seguintes enfoques: Retórico-Filosófico, Interacionista, Semântico, Pragmático, Cognitivista e Computacional. / This MS thesis concerns the study of metaphor per se, (as opposed to applied metaphor) from the linguistic point of view, and the investigation of a formal metaphor representation for Natural Language Processing systems. The overall methodology focuses on two domains: a Cognitive- Linguistic Domain, in which we investigate the metaphor linguistic expression and its cognitive import, i.e., metaphor as a linguistic product and as a nonlinguistic mechanism; and a Computational- Linguistic Domain, in which we investigate a formal representation for the metaphor production and interpretation. The theoretical approaches that constrain the scope of this work are: philosophical- rhetoric, interactionist, semantic, pragmatic, cognitive and computational assessment to metaphor. Linguística Metafora Processamento automático de línguas Domínio lingüístico-cognitivo Domínio lingüístico-computacional Wordnet Representação formal Metaphor Cognitive-linguistic domain Computational-linguistic domain Formal representation Natural Language processing
52	VerbNet.Br: construção semiautomática de um léxico verbal online e independente de domínio para o português do Brasil / VerbNet.BR: the semi-automatic construction of an on-line and domain-independent Verb Lexicon for Brazilian Portuguese Carolina Evaristo Scarton 28 January 2013 (has links) A criação de recursos linguístico-computacionais de base, como é o caso dos léxicos computacionais, é um dos focos da área de Processamento de Línguas Naturais (PLN). Porém, a maioria dos recursos léxicos computacionais existentes é específica da língua inglesa. Dentre os recursos já desenvolvidos para a língua inglesa, tem-se a VerbNet, que é um léxico com informações semânticas e sintáticas dos verbos do inglês, independente de domínio, construído com base nas classes verbais de Levin, além de possuir mapeamentos para a WordNet de Princeton (WordNet). Considerando que há poucos estudos computacionais sobre as classes de Levin, que é a base da VerbNet, para línguas diferentes do inglês, e dada a carência de um léxico para o português nos moldes da VerbNet do inglês, este trabalho teve como objetivo a criação de um recurso léxico para o português do Brasil (chamado VerbNet.Br), semelhante à VerbNet. A construção manual destes recursos geralmente é inviável devido ao tempo gasto e aos erros inseridos pelo autor humano. Portanto, há um grande esforço na área para a criação destes recursos apoiada por técnicas computacionais. Uma técnica reconhecida e bastante usada é o uso de aprendizado de máquina em córpus para extrair informação linguística. A outra é o uso de recursos já existentes para outras línguas, em geral o inglês, visando à construção de um novo recurso alinhado, aproveitando-se de atributos multilíngues/cross-linguísticos (cross-linguistic) (como é o caso da classificação verbal de Levin). O método proposto neste mestrado para a construção da VerbNet.Br é genérico, porque pode ser utilizado para a construção de recursos semelhantes para outras línguas, além do português do Brasil. Além disso, futuramente, será possível estender este recurso via criação de subclasses de conceitos. O método para criação da VerbNet.Br é fundamentado em quatro etapas: três automáticas e uma manual. Porém, também foram realizados experimentos sem o uso da etapa manual, constatando-se, com isso, que ela pode ser descartada sem afetar a precisão e abrangência dos resultados. A avaliação do recurso criado foi realizada de forma intrínseca qualitativa e quantitativa. A avaliação qualitativa consistiu: (a) da análise manual de algumas classes da VerbNet, criando um gold standard para o português do Brasil; (b) da comparação do gold standard criado com os resultados da VerbNet.Br, obtendo resultados promissores, por volta de 60% de f-measure; e (c) da comparação dos resultados da VerbNet.Br com resultados de agrupamento de verbos, concluindo que ambos os métodos apresentam resultados similares. A avaliação quantitativa considerou a taxa de aceitação dos membros das classes da VerbNet.Br, apresentando resultados na faixa de 90% de aceitação dos membros em cada classe. Uma das contribuições deste mestrado é a primeira versão da VerbNet.Br, que precisa de validação linguística, mas que já contém informação para ser utilizada em tarefas de PLN, com precisão e abrangência de 44% e 92,89%, respectivamente / Building computational-linguistic base resources, like computational lexical resources (CLR), is one of the goals of Natural Language Processing (NLP). However, most computational lexicons are specific to English. One of the resources already developed for English is the VerbNet, a lexicon with domain-independent semantic and syntactic information of English verbs. It is based on Levin\'s verb classification, with mappings to Princeton\'s WordNet (WordNet). Since only a few computational studies for languages other than English have been made about Levin\'s classification, and given the lack of a Portuguese CLR similar to VerbNet, the goal of this research was to create a CLR for Brazilian Portuguese (called VerbNet.Br). The manual building of these resources is usually unfeasible because it is time consuming and it can include many human-made errors. Therefore, great efforts have been made to build such resources with the aid of computational techniques. One of these techniques is machine learning, a widely known and used method for extracting linguistic information from corpora. Another one is the use of pre-existing resources for other languages, most commonly English, to support the building of new aligned resources, taking advantage of some multilingual/cross-linguistic features (like the ones in Levin\'s verb classification). The method proposed here for the creation of VerbNet.Br is generic, therefore it may be used to build similar resources for languages other than Brazilian Portuguese. Moreover, the proposed method also allows for a future extension of the resource via subclasses of concepts. The VerbNet.Br has a four-step method: three automatic and one manual. However, experiments were also carried out without the manual step, which can be discarded without affecting precision and recall. The evaluation of the resource was intrinsic, both qualitative and quantitative. The qualitative evaluation consisted in: (a) manual analysis of some VerbNet classes, resulting in a Brazilian Portuguese gold standard; (b) comparison of this gold standard with the VerbNet.Br results, presenting promising results (almost 60% of f-measure); and (c), comparison of the VerbNet.Br results to verb clustering results, showing that both methods achieved similar results. The quantitative evaluation considered the acceptance rate of candidate members of VerbNet.Br, showing results around 90% of acceptance. One of the contributions of this research is to present the first version of VerbNet.Br. Although it still requires linguistic validation, it already provides information to be used in NLP tasks, with precision and recall of 44% and 92.89%, respectively Alternâncias sintáticas Classes de Levin Papéis temáticos Recursos léxicos computacionais VerbNet VerbNet.Br WordNet.Br WorldNet Computational lexical resources Diathesis alternations Levin verb classes Thematic roles VerbNet VerbNet.Br WordNet WordNet.Br
53	Databáze XML pro správu slovníkových dat / XML Databases for Dictionary Data Management Samia, Michel January 2011 (has links) The following diploma thesis deals with dictionary data processing, especially those in XML based formats. At first, the reader is acquainted with linguistic and lexicographical terms used in this work. Then particular lexicographical data format types and specific formats are introduced. Their advantages and disadvantages are discussed as well. According to previously set criteria, the LMF format has been chosen for design and implementation of Python application, which focuses especially on intelligent merging of more dictionaries into one. After passing all unit tests, this application has been used for processing LMF dictionaries, located on the faculty server of the research group for natural language processing. Finally, the advantages and disadvantages of this application are discussed and ways of further usage and extension are suggested.
54	Dolování dat v prostředí sociálních sítí / Data Mining in Social Networks Raška, Jiří January 2013 (has links) This thesis deals with knowledge discovery from social media. This thesis is focused on feature based opinion mining from user reviews. In theoretical part were described methods of opinion mining and natural language processing. Main parts of this thesis were design and implementation of library for opinion mining based on Stanford Parser and lexicon WordNet. For feature identi cation was used dependency grammar, implicit features were mined with method CoAR and opinions were classi ed with supervised algorithm. Finally were given experiments with implemented library and examples of usage.
55	CLustering of Web Services Based on Semantic Similarity Konduri, Aparna 12 May 2008 (has links) No description available. Computer Science SIMILARITY OF WEB SERVICES Stemming Word sense disambiguation WORDNET BASED SEMANTIC SIMILARITY CLUSTERING OF WEB SERVICES Prediction of similar web services LERS-M algorithm
56	Effect of polysemy and homography on sentiment analysis / Effekten av polysemi och homografi på sentimentanalys Ljung, Oskar January 2024 (has links) This bachelor's thesis studied the difference in sentiment between different homographic or polysemous senses of individual words. It did this by training a linear regression model on a version of the British National corpus that had been disambiguated along WordNet word senses (synsets) and analysing sentiment data from SentiWordNet. Results were partial, but indicated that word senses differ somewhat in sentiment. In the process of this study, a new and improved version of the Lesk disambiguation algorithm was also developed, named Nomalised Lesk. The validation of that algorithm compared to the regular Lesk algorithm is presented here as well. sentiment analysis disambiguation vector space polysemy homonomy homography linear regression British National Corpus SentiWordNet WordNet Lesk Algorithm Normalised Lesk sentimentanalys disambiguering vecorrymd polysemi homonymi homografi linjär regression British National Corpus SentiWordNet WordNet Lesk-algoritmen Normalised Lesk General Language Studies and Linguistics
57	L’acquisition et l’extraction de connaissances dans un contexte patrimoniale peu documenté / Knowledge acquisition and extraction in the context of poorly documented cultural heritage Amad, Ashraf 06 December 2017 (has links) L’importance de la documentation du patrimoine culturel croit parallèlement aux risques auxquels il est exposé tels que les guerres, le développement urbain incontrôlé, les catastrophes naturelles, la négligence et les techniques ou stratégies de conservation inappropriées. De plus, la documentation constitue un outil fondamental pour l'évaluation, la conservation, le suivi et la gestion du patrimoine culturel. Dès lors, cet outil majeur nous permet d’estimer la valeur historique, scientifique, sociale et économique de ce patrimoine. Selon plusieurs institutions internationales dédiées à la conservation du patrimoine culturel, il y a un besoin réel de développer et d’adapter de solutions informatiques capables de faciliter et de soutenir la documentation du patrimoine culturel peu documenté surtout dans les pays en développement où il y a un manque flagrant de ressources. Parmi ces pays, la Palestine représente un cas d’étude pertinent dans cette problématique de carence en documentation de son patrimoine. Pour répondre à cette problématique, nous proposons une approche d’acquisition et d’extraction de connaissances patrimoniales dans un contexte peu documenté. Nous prenons comme cas d’étude l’église de la Nativité en Palestine et nous mettons en place notre approche théorique par le développement d’une plateforme d’acquisition et d’extraction de connaissances patrimoniales à l’aide d’un Framework pour la documentation de patrimoine culturel.Notre solution est basée sur les technologies sémantiques, ce qui nous donne la possibilité, dès le début, de fournir une description ontologique riche, une meilleure structuration de l'information, un niveau élevé d'interopérabilité et un meilleur traitement automatique (lisibilité par les machines) sans efforts additionnels.De plus, notre approche est évolutive et réciproque car l’acquisition de connaissance (sous forme structurée) améliore l’extraction de connaissances patrimoniales à partir de texte non structuré et vice versa. Dès lors, l’interaction entre les deux composants de notre système ainsi que les connaissances patrimoniales se développent et s’améliorent au fil de temps surtout que notre système utilise les contributions manuelles et validations des résultats automatiques (dans les deux composants) par les experts afin d’optimiser sa performance. / The importance of cultural heritage documentation increases in parallel with the risks to which it is exposed, such as wars, uncontrolled urban development, natural disasters, neglect and inappropriate conservation techniques or strategies. In addition, this documentation is a fundamental tool for the assessment, the conservation, and the management of cultural heritage. Consequently, this tool allows us to estimate the historical, scientific, social and economic value of this heritage. According to several international institutions dedicated to the preservation of cultural heritage, there is an urgent need to develop computer solutions to facilitate and support the documentation of poorly documented cultural heritage especially in developing countries where there is a lack of resources. Among these countries, Palestine represents a relevant case study in this issue of lack of documentation of its heritage. To address this issue, we propose an approach of knowledge acquisition and extraction in the context of poorly documented heritage. We take as a case study the church of the Nativity in Palestine and we put in place our theoretical approach by the development of a platform for the acquisition and extraction of heritage knowledge. Our solution is based on the semantic technologies, which gives us the possibility, from the beginning, to provide a rich ontological description, a better structuring of the information, a high level of interoperability and a better automatic processing without additional efforts.Additionally, our approach is evolutionary and reciprocal because the acquisition of knowledge (in structured form) improves the extraction of heritage knowledge from unstructured text and vice versa. Therefore, the interaction between the two components of our system as well as the heritage knowledge develop and improve over time especially that our system uses manual contributions and validations of the automatic results (in both components) by the experts to optimize its performance. Gestion des connaissances Patrimoine culturel mal documenté Knowledge management, Poorly documented cultural heritage Ontology based information extraction Data acquisition Linked open data Natural language processing Ontology Wordnet
58	Contribution à l’analyse sémantique des textes arabes Lebboss, Georges 08 July 2016 (has links) La langue arabe est pauvre en ressources sémantiques électroniques. Il y a bien la ressource Arabic WordNet, mais il est pauvre en mots et en relations. Cette thèse porte sur l’enrichissement d’Arabic WordNet par des synsets (un synset est un ensemble de mots synonymes) à partir d’un corpus général de grande taille. Ce type de corpus n’existe pas en arabe, il a donc fallu le construire, avant de lui faire subir un certain nombre de prétraitements.Nous avons élaboré, Gilles Bernard et moi-même, une méthode de vectorisation des mots, GraPaVec, qui puisse servir ici. J’ai donc construit un système incluant un module Add2Corpus, des prétraitements, une vectorisation des mots à l’aide de patterns fréquentiels générés automatiquement, qui aboutit à une matrice de données avec en ligne les mots et en colonne les patterns, chaque composante représente la fréquence du mot dans le pattern.Les vecteurs de mots sont soumis au modèle neuronal Self Organizing Map SOM ; la classification produite par SOM construit des synsets. Pour validation, il a fallu créer un corpus de référence (il n’en existe pas en arabe pour ce domaine) à partir d’Arabic WordNet, puis comparer la méthode GraPaVec avec Word2Vec et Glove. Le résultat montre que GraPaVec donne pour ce problème les meilleurs résultats avec une F-mesure supérieure de 25 % aux deux autres. Les classes produites seront utilisées pour créer de nouveaux synsets intégrés à Arabic WordNet / The Arabic language is poor in electronic semantic resources. Among those resources there is Arabic WordNet which is also poor in words and relationships.This thesis focuses on enriching Arabic WordNet by synsets (a synset is a set of synonymous words) taken from a large general corpus. This type of corpus does not exist in Arabic, so we had to build it, before subjecting it to a number of pretreatments.We developed, Gilles Bernard and myself, a method of word vectorization called GraPaVec which can be used here. I built a system which includes a module Add2Corpus, pretreatments, word vectorization using automatically generated frequency patterns, which yields a data matrix whose rows are the words and columns the patterns, each component representing the frequency of a word in a pattern.The word vectors are fed to the neural model Self Organizing Map (SOM) ;the classification produced constructs synsets. In order to validate the method, we had to create a gold standard corpus (there are none in Arabic for this area) from Arabic WordNet, and then compare the GraPaVec method with Word2Vec and Glove ones. The result shows that GraPaVec gives for this problem the best results with a F-measure 25 % higher than the others. The generated classes will be used to create new synsets to be included in Arabic WordNet. Relations sémantiques Ressource sémantique arabe Corpus arabe Synsets Prétraitement des données Vectorisation des mots Classification des mots Carte de Kohonen SOM Semantics relations Semantics Arabic resources Arabic WordNet Synsets Arabic Corpus Data preprocessing Vectorization of words Words Classification
59	Multi-Agent User-Centric Specialization and Collaboration for Information Retrieval Mooman, Abdelniser January 2012 (has links) The amount of information on the World Wide Web (WWW) is rapidly growing in pace and topic diversity. This has made it increasingly difficult, and often frustrating, for information seekers to retrieve the content they are looking for as information retrieval systems (e.g., search engines) are unable to decipher the relevance of the retrieved information as it pertains to the information they are searching for. This issue can be decomposed into two aspects: 1) variability of information relevance as it pertains to an information seeker. In other words, different information seekers may enter the same search text, or keywords, but expect completely different results. It is therefore, imperative that information retrieval systems possess an ability to incorporate a model of the information seeker in order to estimate the relevance and context of use of information before presenting results. Of course, in this context, by a model we mean the capture of trends in the information seeker's search behaviour. This is what many researchers refer to as the personalized search. 2) Information diversity. Information available on the World Wide Web today spans multitudes of inherently overlapping topics, and it is difficult for any information retrieval system to decide effectively on the relevance of the information retrieved in response to an information seeker's query. For example, the information seeker who wishes to use WWW to learn about a cure for a certain illness would receive a more relevant answer if the search engine was optimized into such domains of topics. This is what is being referred to in the WWW nomenclature as a 'specialized search'. This thesis maintains that the information seeker's search is not intended to be completely random and therefore tends to portray itself as consistent patterns of behaviour. Nonetheless, this behaviour, despite being consistent, can be quite complex to capture. To accomplish this goal the thesis proposes a Multi-Agent Personalized Information Retrieval with Specialization Ontology (MAPIRSO). MAPIRSO offers a complete learning framework that is able to model the end user's search behaviour and interests and to organize information into categorized domains so as to ensure maximum relevance of its responses as they pertain to the end user queries. Specialization and personalization are accomplished using a group of collaborative agents. Each agent employs a Reinforcement Learning (RL) strategy to capture end user's behaviour and interests. Reinforcement learning allows the agents to evolve their knowledge of the end user behaviour and interests as they function to serve him or her. Furthermore, REL allows each agent to adapt to changes in an end user's behaviour and interests. Specialization is the process by which new information domains are created based on existing information topics, allowing new kinds of content to be built exclusively for information seekers. One of the key characteristics of specialization domains is the seeker centric - which allows intelligent agents to create new information based on the information seekers' feedback and their behaviours. Specialized domains are created by intelligent agents that collect information from a specific domain topic. The task of these specialized agents is to map the user's query to a repository of specific domains in order to present users with relevant information. As a result, mapping users' queries to only relevant information is one of the fundamental challenges in Artificial Intelligent (AI) and machine learning research. Our approach employs intelligent cooperative agents that specialize in building personalized ontology information domains that pertain to each information seeker's specific needs. Specializing and categorizing information into unique domains is one of the challenge areas that have been addressed and various proposed solutions were evaluated and adopted to address growing information. However, categorizing information into unique domains does not satisfy each individualized information seeker. Information seekers might search for similar topics, but each would have different interests. For example, medical information of a specific medical domain has different importance to both the doctor and patients. The thesis presents a novel solution that will resolve the growing and diverse information by building seeker centric specialized information domains that are personalized through the information seekers' feedback and behaviours. To address this challenge, the research examines the fundamental components that constitute the specialized agent: an intelligent machine learning system, user input queries, an intelligent agent, and information resources constructed through specialized domains. Experimental work is reported to demonstrate the efficiency of the proposed solution in addressing the overlapping information growth. The experimental work utilizes extensive user-centric specialized domain topics. This work employs personalized and collaborative multi learning agents and ontology techniques thereby enriching the queries and domains of the user. Therefore, experiments and results have shown that building specialized ontology domains, pertinent to the information seekers' needs, are more precise and efficient compared to other information retrieval applications and existing search engines. Electrical and Computer Engineering
60	Multi-Agent User-Centric Specialization and Collaboration for Information Retrieval Mooman, Abdelniser January 2012 (has links) The amount of information on the World Wide Web (WWW) is rapidly growing in pace and topic diversity. This has made it increasingly difficult, and often frustrating, for information seekers to retrieve the content they are looking for as information retrieval systems (e.g., search engines) are unable to decipher the relevance of the retrieved information as it pertains to the information they are searching for. This issue can be decomposed into two aspects: 1) variability of information relevance as it pertains to an information seeker. In other words, different information seekers may enter the same search text, or keywords, but expect completely different results. It is therefore, imperative that information retrieval systems possess an ability to incorporate a model of the information seeker in order to estimate the relevance and context of use of information before presenting results. Of course, in this context, by a model we mean the capture of trends in the information seeker's search behaviour. This is what many researchers refer to as the personalized search. 2) Information diversity. Information available on the World Wide Web today spans multitudes of inherently overlapping topics, and it is difficult for any information retrieval system to decide effectively on the relevance of the information retrieved in response to an information seeker's query. For example, the information seeker who wishes to use WWW to learn about a cure for a certain illness would receive a more relevant answer if the search engine was optimized into such domains of topics. This is what is being referred to in the WWW nomenclature as a 'specialized search'. This thesis maintains that the information seeker's search is not intended to be completely random and therefore tends to portray itself as consistent patterns of behaviour. Nonetheless, this behaviour, despite being consistent, can be quite complex to capture. To accomplish this goal the thesis proposes a Multi-Agent Personalized Information Retrieval with Specialization Ontology (MAPIRSO). MAPIRSO offers a complete learning framework that is able to model the end user's search behaviour and interests and to organize information into categorized domains so as to ensure maximum relevance of its responses as they pertain to the end user queries. Specialization and personalization are accomplished using a group of collaborative agents. Each agent employs a Reinforcement Learning (RL) strategy to capture end user's behaviour and interests. Reinforcement learning allows the agents to evolve their knowledge of the end user behaviour and interests as they function to serve him or her. Furthermore, REL allows each agent to adapt to changes in an end user's behaviour and interests. Specialization is the process by which new information domains are created based on existing information topics, allowing new kinds of content to be built exclusively for information seekers. One of the key characteristics of specialization domains is the seeker centric - which allows intelligent agents to create new information based on the information seekers' feedback and their behaviours. Specialized domains are created by intelligent agents that collect information from a specific domain topic. The task of these specialized agents is to map the user's query to a repository of specific domains in order to present users with relevant information. As a result, mapping users' queries to only relevant information is one of the fundamental challenges in Artificial Intelligent (AI) and machine learning research. Our approach employs intelligent cooperative agents that specialize in building personalized ontology information domains that pertain to each information seeker's specific needs. Specializing and categorizing information into unique domains is one of the challenge areas that have been addressed and various proposed solutions were evaluated and adopted to address growing information. However, categorizing information into unique domains does not satisfy each individualized information seeker. Information seekers might search for similar topics, but each would have different interests. For example, medical information of a specific medical domain has different importance to both the doctor and patients. The thesis presents a novel solution that will resolve the growing and diverse information by building seeker centric specialized information domains that are personalized through the information seekers' feedback and behaviours. To address this challenge, the research examines the fundamental components that constitute the specialized agent: an intelligent machine learning system, user input queries, an intelligent agent, and information resources constructed through specialized domains. Experimental work is reported to demonstrate the efficiency of the proposed solution in addressing the overlapping information growth. The experimental work utilizes extensive user-centric specialized domain topics. This work employs personalized and collaborative multi learning agents and ontology techniques thereby enriching the queries and domains of the user. Therefore, experiments and results have shown that building specialized ontology domains, pertinent to the information seekers' needs, are more precise and efficient compared to other information retrieval applications and existing search engines. Electrical and Computer Engineering

Search results