• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 11
  • 8
  • 7
  • 2
  • 2
  • 1
  • Tagged with
  • 33
  • 33
  • 9
  • 9
  • 9
  • 7
  • 6
  • 6
  • 6
  • 5
  • 5
  • 5
  • 5
  • 5
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Traços linguísticos e culturais de Goiás no século XVIII: vertentes lexicais no diário de viagem do barão de Mossâmedes / Linguistic and cultural trails of Goiás in the eighteenth century: lexical aspects in the travel diary of baron Mossâmedes

Assunção, Daniane da Silva 31 August 2016 (has links)
Submitted by Cássia Santos (cassia.bcufg@gmail.com) on 2017-06-22T11:59:12Z No. of bitstreams: 2 Dissertação - Daniane da Silva Assunção - 2016.pdf: 6581967 bytes, checksum: eb5e3c3b3ae47236372aab7a66cd7acc (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2017-07-10T13:45:39Z (GMT) No. of bitstreams: 2 Dissertação - Daniane da Silva Assunção - 2016.pdf: 6581967 bytes, checksum: eb5e3c3b3ae47236372aab7a66cd7acc (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2017-07-10T13:45:39Z (GMT). No. of bitstreams: 2 Dissertação - Daniane da Silva Assunção - 2016.pdf: 6581967 bytes, checksum: eb5e3c3b3ae47236372aab7a66cd7acc (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2016-08-31 / This study investigated and analyzed the lexical fields from the book Travel Diary of Baron Mossâmedes: 1771-1773, in relation to geographic, military, the descriptions of places, parties, religiosity and all of cultural aspects present in corpus, consisting of 63 folios. The book, edited and commented by Antonio César Caldas Pinheiro and Gustavo Neiva Coelho (2006), reports the two trips for the fourth governor of Mines of Goyazes, José de Almeida de Vasconcellos Soveral and Carvalho: the first trip departing from Rio de Janeiro to Vila Boa, then capital of Goyaz State and the second through the interior thereof. The inventory and analysis of lexias were performed according to the theory of lexical fields, developed by Eugenio Coseriu (1977) and Horst Geckeler (1971). Through this diary, you can know the history about culture in Goiás and analyze the most important lexical fields and characteristic of the people, the culture and the political and administrative organization of the Captaincy of Goyaz in that time. There was a study of the historical context about this period that can better describe the relationship between lexicon, culture and society. They were prepared the lexicographical records of lexias inventoried consulting the Bluteau dictionaries (1712-1728) and Silva (1813). Finally, it built up a glossary in order to get a better understanding of the history described in the Journal. / Este estudo investigou e analisou os campos lexicais a partir do livro Diário de Viagem do Barão de Mossâmedes: 1771-1773, em relação aos aspectos geográficos, militares, às descrições dos lugares, das festas, da religiosidade e de todos os aspectos culturais presentes no corpus, que é constituído por 63 fólios. O livro, editado e comentado por Antônio César Caldas Pinheiro e Gustavo Neiva Coelho (2006), relata as duas viagens realizadas pelo quarto governador das Minas dos Goyazes, José de Almeida de Vasconcellos Soveral e Carvalho: a primeira viagem partindo do Rio de Janeiro até Vila Boa, então capital da capitania de Goyaz; e a segunda pelo interior da mesma. A inventariação e a análise das lexias foram realizadas segundo a teoria dos campos lexicais, desenvolvida por Eugenio Coseriu (1977) e Horst Geckeler (1971). Através desse Diário, é possível conhecer a história da cultura goiana e analisar os campos lexicais mais importantes e característicos das pessoas, da cultura e da organização político-administrativa da Capitania de Goyaz na época referida. Fez-se um estudo do contexto histórico desse período para melhor descrever a relação entre léxico, cultura e sociedade. Foram elaboradas as fichas lexicográficas das lexias inventariadas consultando os dicionários Bluteau (1712-1728) e Silva (1813). Por fim, construiu-se um glossário com o intuito de se ter uma melhor compreensão da história descrita no Diário.
12

Structuration automatique de documents audio / Automatic structuring of audio documents

Bouchekif, Abdesselam 03 November 2016 (has links)
La structuration en thèmes est un domaine de recherche très prisé dans le traitement automatique du langage naturel car elle est le point de départ de plusieurs applications comme la recherche d’information, le résumé automatique et la modélisation des thèmes. Dans cette thèse, nous avons proposé un système de structuration automatique des journaux d’informations. Notre système contient deux modules : segmentation thématique et titrage. La segmentation thématique consiste à effectuer un pavage de l’émission en segments thématiquement homogènes. Ces derniers, sont généralement identifiés par des étiquettes anonymes, c’est alors le rôle du module de titrage d’affecter un titre à chaque segment.Ces travaux ont permis plusieurs contributions originales tel que l’exploitation conjointe de la distribution des mots et des locuteurs (cohésion de la parole) ainsi que l’utilisation des relations sémantiques de type diachronique. Après l’étape de segmentation, nous proposons d’apparier chaque segment avec les articles de presse du même jour. Le titre associé au segment est celui de l’article le plus proche thématiquement. Finalement, nous avons proposé deux nouvelles métriques d’évaluation, l’une pour la segmentation thématique et l’autre pour le titrage. Les expériences sont menées sur trois corpus caractérisés par leur richesse et leur diversité. Ils sont constitués de 168 journaux télévisés issus de 10 chaînes françaises transcrits automatiquement. / The topic structuring is an area that has attracted much attention in the Natural Language Processing community. Indeed, topic structuring is considered as the starting point of several applications such as information retrieval, summarization and topic modeling.In this thesis, we proposed a generic topic structuring system i.e. that has the ability to deal with any TV Broadcast News.Our system contains two steps: topic segmentation and title assignment. Topic segmentation consists in splitting the document into thematically homogeneous fragments. The latter are generally identified by anonymous labels and the last step has to assign a title to each segment.Several original contributions are proposed like the use of a joint exploitation of the distribution of speakers and words (speech cohesion) and also the use of diachronic semantic relations. After the topic segmentation step, the generated segments are assigned a title corresponding to an article collected from Google News during the same day. Finally, we proposed the evaluation of two new metrics, the first is dedicated to the topic segmentation and the second to title assignment.The experiments are carried out on three corpora. They consisted of 168 TV Broadcast News from 10 French channels automatically transcribed. Our corpus is characterized by his richness and diversity.
13

Extraction de motifs dans la rédaction collaborative sur les Wikis

Uwatowenimana, Jeanne d'Arc January 2008 (has links)
Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal.
14

Recherche de réponses précises à des questions médicales : le système de questions-réponses MEANS / Finding precise answers to medical questions : the question-answering system MEANS

Ben Abacha, Asma 28 June 2012 (has links)
La recherche de réponses précises à des questions formulées en langue naturelle renouvelle le champ de la recherche d’information. De nombreux travaux ont eu lieu sur la recherche de réponses à des questions factuelles en domaine ouvert. Moins de travaux ont porté sur la recherche de réponses en domaine de spécialité, en particulier dans le domaine médical ou biomédical. Plusieurs conditions différentes sont rencontrées en domaine de spécialité comme les lexiques et terminologies spécialisés, les types particuliers de questions, entités et relations du domaine ou les caractéristiques des documents ciblés. Dans une première partie, nous étudions les méthodes permettant d’analyser sémantiquement les questions posées par l’utilisateur ainsi que les textes utilisés pour trouver les réponses. Pour ce faire nous utilisons des méthodes hybrides pour deux tâches principales : (i) la reconnaissance des entités médicales et (ii) l’extraction de relations sémantiques. Ces méthodes combinent des règles et patrons construits manuellement, des connaissances du domaine et des techniques d’apprentissage statistique utilisant différents classifieurs. Ces méthodes hybrides, expérimentées sur différents corpus, permettent de pallier les inconvénients des deux types de méthodes d’extraction d’information, à savoir le manque de couverture potentiel des méthodes à base de règles et la dépendance aux données annotées des méthodes statistiques. Dans une seconde partie, nous étudions l’apport des technologies du web sémantique pour la portabilité et l’expressivité des systèmes de questions-réponses. Dans le cadre de notre approche, nous exploitons les technologies du web sémantique pour annoter les informations extraites en premier lieu et pour interroger sémantiquement ces annotations en second lieu. Enfin, nous présentons notre système de questions-réponses, appelé MEANS, qui utilise à la fois des techniques de TAL, des connaissances du domaine et les technologies du web sémantique pour répondre automatiquement aux questions médicales. / With the dramatic growth of digital information, finding precise answers to natural language questions is more and more essential for retrieving domain knowledge in real time. Many research works tackled answer retrieval for factual questions in open domain. Less works were performed for domain-specific question answering such as the medical domain. Compared to the open domain, several different conditions are met in the medical domain such as specialized vocabularies, specific types of questions, different kinds of domain entities and relations. Document characteristics are also a matter of importance, as, for example, clinical texts may tend to use a lot of technical abbreviations while forum pages may use long “approximate” terms. We focus on finding precise answers to natural language questions in the medical field. A key process for this task is to analyze the questions and the source documents semantically and to use standard formalisms to represent the obtained annotations. We propose a medical question-answering approach based on: (i) NLP methods combing domain knowledge, rule-based methods and statistical ones to extract relevant information from questions and documents and (ii) Semantic Web technologies to represent and interrogate the extracted information.
15

A coesão textual em narrativas de alunos do 7° ano do ensino fundamental

Matei, Maria Helena Corrêa da Silva 19 October 2012 (has links)
Made available in DSpace on 2016-04-28T19:33:37Z (GMT). No. of bitstreams: 1 Maria Helena Correa da Silva Matei.pdf: 3516642 bytes, checksum: 30c0fdf76699477b6904ae35602c0d0f (MD5) Previous issue date: 2012-10-19 / This dissertation is linked to a line of research Leitura, Escrita e Ensino of the Program of Post Graduation in the Portuguese Language of the Pontifícia Universidade Católica de São Paulo. Having in mind the Learning Expectations present in the Pedagogic Project of two public schools in the East Metropolitan Area of the city of São Paulo, at the first place, it has been checked the models used by the students to build up the semantic relations responsible for the cohesion of their texts. Immediately after that, it has been defined, in the production of the text of these students the resources that present inadequacies and to close, it has been checked whether the students of the 7th grade of the secondary school of these schools have established the cohesion in the adequate form in the school years that correspond to their age group (11-13 years old). For the examination, it has been opted for genre short stories, as it has been contemplated in the schools since the first grade. The choice has been based in the hypotheses that is easier to analyse the student s writing, when using textual outputs, which structure is or should be familiar to their authors. For that, it has been adopted the concept of text advocated by Beaugrande (1997): communicative event where the actions of linguist order, cognitive and socials, are related. From those actions, it has been highlighted the ones of linguist order and prioritization has been given to the concepts of cohesion proposed by Fávero (2010), who conceives cohesion in a level different from the one of the coherence. The analyse has shown us that the strategy of the cohesion are presented in the productions. However, the students from one school, as well as the students from the other one, have shown certain cohesive inabilities in relation to what is expected from them at the end of school years spent in the secondary school, considering that the Learning Expectation orient the teaching of the Portuguese language in the Municipal schools of the city of São Paulo / de Estudos Pós-Graduados em Língua Portuguesa da Pontifícia Universidade Católica de São Paulo. Tendo em vista as Expectativas de Aprendizagem presentes no Projeto Pedagógico de duas escolas públicas da Região Leste do Município de São Paulo, verificamos, em primeiro lugar, os modos pelos quais alguns alunos constroem as relações semânticas responsáveis pela coesão em seus textos. Em seguida, definimos, nas produções textuais desses alunos, os recursos que apresentam inadequações e, por fim, averiguamos se os alunos do 7º ano do Ensino Fundamental II dessas escolas estabelecem a coesão de forma adequada à língua escrita ensinada nos anos escolares que correspondem à sua faixa etária (entre 11 e 13 anos). Para o exame, optamos pelo gênero conto, pois este vem sendo contemplado desde o Ciclo I nas escolas. A escolha foi baseada na hipótese de que é mais fácil analisar a escrita discente quando partimos de produções textuais cuja estrutura é, ou deveria, ser familiar aos seus autores. Para tanto, adotamos o conceito de texto preconizado por Beaugrande (1997): evento comunicativo em que as ações de ordem linguística, cognitivas e sociais estão relacionadas. Dessas ações, destacamos as de ordem linguística e priorizamos os conceitos de coesão, propostos por Fávero (2010), que concebe a coesão em um nível distinto do da coerência. A análise mostrou-nos que as estratégias de coesão estão presentes nas produções. Porém, tanto alunos de uma das escolas como os de outra apresentam certas inabilidades coesivas em relação ao que deles é esperado ao final desses anos do Ciclo II, considerando que as Expectativas de Aprendizagem norteiam o ensino de Língua Portuguesa nas escolas municipais de São Paulo
16

Restoring the balance between stuff and things in scene understanding

Caesar, Holger January 2018 (has links)
Scene understanding is a central field in computer vision that attempts to detect objects in a scene and reason about their spatial, functional and semantic relations. While many works focus on things (objects with a well-defined shape), less attention has been given to stuff classes (amorphous background regions). However, stuff classes are important as they allow to explain many aspects of an image, including the scene type, thing classes likely to be present and physical attributes of all objects in the scene. The goal of this thesis is to restore the balance between stuff and things in scene understanding. In particular, we investigate how the recognition of stuff differs from things and develop methods that are suitable to deal with both. We use stuff to find things and annotate a large-scale dataset to study stuff and things in context. First, we present two methods for semantic segmentation of stuff and things. Most methods require manual class weighting to counter imbalanced class frequency distributions, particularly on datasets with stuff and thing classes. We develop a novel joint calibration technique that takes into account class imbalance, class competition and overlapping regions by calibrating for the pixel-level evaluation criterion. The second method shows how to unify the advantages of region-based approaches (accurately delineated object boundaries) and fully convolutional approaches (end-to-end training). Both are combined in a universal framework that is equally suitable to deal with stuff and things. Second, we propose to help weakly supervised object localization for classes where location annotations are not available, by transferring things and stuff knowledge from a source set with available annotations. This is particularly important if we want to scale scene understanding to real-world applications with thousands of classes, without having to exhaustively annotate millions of images. Finally, we present COCO-Stuff - the largest existing dataset with dense stuff and thing annotations. Existing datasets are much smaller and were made with expensive polygon-based annotation. We use a very efficient stuff annotation protocol to densely annotate 164K images. Using this new dataset, we provide a detailed analysis of the dataset and visualize how stuff and things co-occur spatially in an image. We revisit the question whether stuff or things are easier to detect and which is more important based on visual and linguistic analysis.
17

Aide à l'identification de relations lexicales au moyen de la sémantique distributionnelle et son application à un corpus bilingue du domaine de l'environnement

Bernier-Colborne, Gabriel 08 1900 (has links)
L’analyse des relations lexicales est une des étapes principales du travail terminologique. Cette tâche, qui consiste à établir des liens entre des termes dont les sens sont reliés, peut être facilitée par des méthodes computationnelles, notamment les techniques de la sémantique distributionnelle. En estimant la similarité sémantique des mots à partir d’un corpus, ces techniques peuvent faciliter l’analyse des relations lexicales. La qualité des résultats offerts par les méthodes distributionnelles dépend, entre autres, des nombreuses décisions qui doivent être prises lors de leur mise en œuvre, notamment le choix et le paramétrage du modèle. Ces décisions dépendent, à leur tour, de divers facteurs liés à l’objectif visé, tels que la nature des relations lexicales que l’on souhaite détecter; celles-ci peuvent comprendre des relations paradigmatiques classiques telles que la (quasi-)synonymie (p. ex. conserver -> préserver), mais aussi d’autres relations telles que la dérivation syntaxique (p. ex. conserver -> conservation). Cette thèse vise à développer un cadre méthodologique basé sur la sémantique distributionnelle pour l’analyse des relations lexicales à partir de corpus spécialisés. À cette fin, nous vérifions comment le choix, le paramétrage et l’interrogation d’un modèle distributionnel doivent tenir compte de divers facteurs liés au projet terminologique envisagé : le cadre descriptif adopté, les relations ciblées, la partie du discours des termes à décrire et la langue traitée (en l’occurrence, le français ou l’anglais). Nous montrons que deux des relations les mieux détectées par l’approche distributionnelle sont la (quasi-)synonymie et la dérivation syntaxique, mais que les modèles qui captent le mieux ces deux types de relations sont très différents. Ainsi, les relations ciblées ont une influence importante sur la façon dont on doit paramétrer le modèle pour obtenir les meilleurs résultats possibles. Un autre facteur à considérer est la partie du discours des termes à décrire. Nos résultats indiquent notamment que les relations entre verbes sont moins bien modélisées par cette approche que celles entre adjectifs ou entre noms. Le cadre descriptif adopté pour un projet terminologique est également un facteur important à considérer lors de l’application de l’approche distributionnelle. Dans ce travail, nous comparons deux cadres descriptifs, l’un étant basé sur la sémantique lexicale et l’autre, sur la sémantique des cadres. Nos résultats indiquent que les méthodes distributionnelles détectent les termes qui évoquent le même cadre sémantique moins bien que certaines relations lexicales telles que la synonymie. Nous montrons que cet écart est attribuable au fait que les termes qui évoquent des cadres sémantiques comprennent une proportion importante de verbes et aux différences importantes entre les modèles qui produisent les meilleurs résultats pour la dérivation syntaxique et les relations paradigmatiques classiques telles que la synonymie. En somme, nous évaluons deux modèles distributionnels différents, analysons systématiquement l’influence de leurs paramètres et vérifions comment cette influence varie en fonction de divers aspects du projet terminologique. Nous montrons de nombreux exemples de voisinages distributionnels, que nous explorons au moyen de graphes, et discutons les sources d’erreurs. Ce travail fournit ainsi des balises importantes pour l’application de méthodes distributionnelles dans le cadre du travail terminologique. / Identifying semantic relations is one of the main tasks involved in terminology work. This task, which aims to establish links between terms whose meanings are related, can be assisted by computational methods, including those based on distributional semantics. These methods estimate the semantic similarity of words based on corpus data, which can help terminologists identify semantic relations. The quality of the results produced by distributional methods depends on several decisions that must be made when applying them, such as choosing a model and selecting its parameters. In turn, these decisions depend on various factors related to the target application, such as the types of semantic relations one wishes to identify. These can include typical paradigmatic relations such as (near-)synonymy (e.g. preserve -> protect), but also other relations such as syntactic derivation (e.g. preserve -> preservation). This dissertation aims to further the development of a methodological framework based on distributional semantics for the identification of semantic relations using specialized corpora. To this end, we investigate how various aspects of terminology work must be accounted for when selecting a distributional semantic model and its parameters, as well as those of the method used to query the model. These aspects include the descriptive framework, the target relations, the part of speech of the terms being described, and the language (in this case, French or English). Our results show that two of the relations that distributional semantic models capture most accurately are (near-)synonymy and syntactic derivation. However, the models that produce the best results for these two relations are very different. Thus, the target relations are an important factor to consider when choosing a model and tuning it to obtain the most accurate results. Another factor that should be considered is the part of speech of the terms that are being worked on. Among other things, our results suggest that relations between verbs are not captured as accurately as those between nouns or adjectives by distributional semantic models. The descriptive framework used for a given project is also an important factor to consider. In this work, we compare two descriptive frameworks, one based on lexical semantics and another based on frame semantics. Our results show that terms that evoke the same semantic frame are not captured as accurately as certain semantic relations, such as synonymy. We show that this is due to (at least) two reasons: a high percentage of frame-evoking terms are verbs, and the models that capture syntactic derivation most accurately are very different than those that work best for typical paradigmatic relations such as synonymy. In summary, we evaluate two different distributional semantic models, we analyze the influence of their parameters, and we investigate how this influence varies with respect to various aspects of terminology work. We show many examples of distributional neighbourhoods, which we explore using graphs, and discuss sources of noise. This dissertation thus provides important guidelines for the use of distributional semantic models for terminology work.
18

Extraction de motifs dans la rédaction collaborative sur les Wikis

Uwatowenimana, Jeanne d'Arc January 2008 (has links)
Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal
19

Personalised ontology learning and mining for web information gathering

Tao, Xiaohui January 2009 (has links)
Over the last decade, the rapid growth and adoption of the World Wide Web has further exacerbated user needs for e±cient mechanisms for information and knowledge location, selection, and retrieval. How to gather useful and meaningful information from the Web becomes challenging to users. The capture of user information needs is key to delivering users' desired information, and user pro¯les can help to capture information needs. However, e®ectively acquiring user pro¯les is di±cult. It is argued that if user background knowledge can be speci¯ed by ontolo- gies, more accurate user pro¯les can be acquired and thus information needs can be captured e®ectively. Web users implicitly possess concept models that are obtained from their experience and education, and use the concept models in information gathering. Prior to this work, much research has attempted to use ontologies to specify user background knowledge and user concept models. However, these works have a drawback in that they cannot move beyond the subsumption of super - and sub-class structure to emphasising the speci¯c se- mantic relations in a single computational model. This has also been a challenge for years in the knowledge engineering community. Thus, using ontologies to represent user concept models and to acquire user pro¯les remains an unsolved problem in personalised Web information gathering and knowledge engineering. In this thesis, an ontology learning and mining model is proposed to acquire user pro¯les for personalised Web information gathering. The proposed compu- tational model emphasises the speci¯c is-a and part-of semantic relations in one computational model. The world knowledge and users' Local Instance Reposito- ries are used to attempt to discover and specify user background knowledge. From a world knowledge base, personalised ontologies are constructed by adopting au- tomatic or semi-automatic techniques to extract user interest concepts, focusing on user information needs. A multidimensional ontology mining method, Speci- ¯city and Exhaustivity, is also introduced in this thesis for analysing the user background knowledge discovered and speci¯ed in user personalised ontologies. The ontology learning and mining model is evaluated by comparing with human- based and state-of-the-art computational models in experiments, using a large, standard data set. The experimental results are promising for evaluation. The proposed ontology learning and mining model in this thesis helps to develop a better understanding of user pro¯le acquisition, thus providing better design of personalised Web information gathering systems. The contributions are increasingly signi¯cant, given both the rapid explosion of Web information in recent years and today's accessibility to the Internet and the full text world.
20

Aprendizado automático de relações semânticas entre tags de folksonomias.

RÊGO, Alex Sandro da Cunha. 05 June 2018 (has links)
Submitted by Maria Medeiros (maria.dilva1@ufcg.edu.br) on 2018-06-05T14:49:44Z No. of bitstreams: 1 ALEX SANDRO DA CUNHA RÊGO - TESE (PPGCC) 2016.pdf: 1783053 bytes, checksum: 4ae3b5d42dde739cfd57afaa25fd7e63 (MD5) / Made available in DSpace on 2018-06-05T14:49:44Z (GMT). No. of bitstreams: 1 ALEX SANDRO DA CUNHA RÊGO - TESE (PPGCC) 2016.pdf: 1783053 bytes, checksum: 4ae3b5d42dde739cfd57afaa25fd7e63 (MD5) Previous issue date: 2016 / As folksonomias têm despontado como ferramentas úteis de gerenciamento online de conteúdo digital. A exemplo dos populares websites Delicious, Flickr e BibSonomy, diariamente os usuários utilizam esses sistemas para efetuar upload de recursos web (e.g., url, fotos, vídeos e referências bibliográficas) e categorizá-los por meio de tags. A ausência de relações semânticas do tipo sinonímia e hiperonímia/hiponímia no espaço de tags das folksonomias reduz a capacidade do usuário de encontrar recursos relevantes. Para mitigar esse problema, muitos trabalhos de pesquisa se apoiam na aplicação de medidas de similaridade para detecção de sinonímia e construção automática de hierarquias de tags por meio de algoritmos heurísticos. Nesta tese de doutorado, o problema de detecção de sinonímia e hiperonímia/hiponímia entre pares de tags é modelado como um problema de classificação em Aprendizado de Máquina. A partir da literatura, várias medidas de similaridade consideradas boas indicadoras de sinonímia e hiperonímia/hiponímia foram identificadas e empregadas como atributos de aprendizagem. A incidência de um severo desbalanceamento e sobreposição de classes motivou a investigação de técnicas de balanceamento para superar ambos os problemas. Resultados experimentais usando dados reais das folksonomias BibSonomy e Delicious mostraram que a abordagem proposta denominada CPDST supera em termos de acurácia o baseline de melhor desempenho nas tarefas de detecção de sinonímia e hiperonímia/hiponímia. Também, aplicou-se a abordagem CPDST no contexto de geração de listas de tags semanticamente relacionadas, com o intuito de prover acesso a recursos adicionais anotados com outros conceitos pertencentes ao domínio da busca. Além da abordagem CPDST, foram propostos dois algoritmos fundamentados no acesso ao WordNet e ConceptNet para sugestão de listas especializadas com tags sinônimas e hipônimas. O resultado de uma avaliação quantitativa demonstrou que a abordagem CPDST provê listas de tags relevantes em relação às listas providas pelos métodos comparados. / Folksonomies have emerged as useful tools for online management of digital content. Popular websites as Delicious, Flickr and BibSonomy are now widespread with thousands of users using them daily to upload digital content (e.g., webpages, photos, videos and bibliographic information) and tagging for later retrieval. The lack of semantic relations such as synonym and hypernym/hyponym in the tag space may diminish the ability of users in finding relevant resources. Many research works in the literature employ similarity measures to detect synonymy and to build hierarchies of tags automatically by means of heuristic algorithms. In this thesis, the problems of synonym and subsumption detection between pairs of tags are cast as a pairwise classification problem. From the literature, several similarity measures that are good indicators of synonymy and subsumption were identified, which are used as learning features. Under this setting, there is a severe class imbalance and class overlapping which motivated us to investigate and employ class imbalance techniques to overcome these problems. A comprehensive set of experiments were conducted on two large real-world datasets of BibSonomy and Delicious systems, showing that the proposed approach named CPDST outperforms the best performing heuristic-based baseline in the tasks of synonym and subsumption detection. CPDST is also applied in the context of tag list generation for providing access to additional resources annotated with other semantically related tags. Besides CPDST approach, two algorithms based on WordNet and ConceptNet accesses are proposed for capturing specifically synonyms and hyponyms. The outcome of an evaluative quantitative analysis showed that CPDST approach yields relevant tag lists in relation to the produced ones by the compared methods.

Page generated in 0.1206 seconds