• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 58
  • 16
  • 9
  • 6
  • 5
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 116
  • 61
  • 61
  • 41
  • 37
  • 30
  • 30
  • 28
  • 26
  • 22
  • 20
  • 18
  • 17
  • 15
  • 14
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

Extraktion geographischer Entitäten zur Suche nutzergenerierter Inhalte für Nachrichtenereignisse

Katz, Philipp 27 November 2014 (has links) (PDF)
Der Einfluss sogenannter nutzergenerierter Inhalte im Web hat in den letzten Jahren stetig zugenommen. Auf Plattformen wie Blogs, sozialen Netzwerken oder Medienportalen werden durch Anwender kontinuierlich Textnachrichten, Bilder oder Videos publiziert. Auch Inhalte, die aktuelle gesellschaftliche Ereignisse, wie beispielsweise den Euromaidan in Kiew dokumentieren, werden durch diese Plattformen verbreitet. Nutzergenerierte Inhalte bieten folglich das Potential, zusätzliche Hintergrundinformationen über Ereignisse direkt vom Ort des Geschehens zu liefern. Diese Arbeit verfolgt die Vision einer Nachrichtenplattform, die unter Verwendung von Methoden des Information Retrievals und der Informationsextraktion Nachrichtenereignisse erkennt, diese automatisiert mit relevanten nutzergenerierten Inhalten anreichert und dem Leser präsentiert. Zur Suche nutzergenerierter Inhalte kommen in dieser Arbeit maßgeblich geographische Entitäten, also Ortsbezeichnungen zum Einsatz. Für die Extraktion dieser Entitäten aus gegebenen Nachrichtendokumenten stellt die Arbeit verschiedene neue Methoden vor. Die Entitäten werden genutzt, um zielgerichtete Suchanfragen zu erzeugen. Es wird gezeigt, dass sich eine geounterstützte Suche für das Auffinden nutzergenerierter Inhalte besser eignet als eine konventionelle schlüsselwortbasierte Suche.
82

Learning with Markov logic networks : transfer learning, structure learning, and an application to Web query disambiguation

Mihalkova, Lilyana Simeonova 18 March 2011 (has links)
Traditionally, machine learning algorithms assume that training data is provided as a set of independent instances, each of which can be described as a feature vector. In contrast, many domains of interest are inherently multi-relational, consisting of entities connected by a rich set of relations. For example, the participants in a social network are linked by friendships, collaborations, and shared interests. Likewise, the users of a search engine are related by searches for similar items and clicks to shared sites. The ability to model and reason about such relations is essential not only because better predictive accuracy is achieved by exploiting this additional information, but also because frequently the goal is to predict whether a set of entities are related in a particular way. This thesis falls within the area of Statistical Relational Learning (SRL), which combines ideas from two traditions within artificial intelligence, first-order logic and probabilistic graphical models to address the challenge of learning from multi-relational data. We build on one particular SRL model, Markov logic networks (MLNs), which consist of a set of weighted first-order-logic formulae and provide a principled way of defining a probability distribution over possible worlds. We develop algorithms for learning of MLN structure both from scratch and by transferring a previously learned model, as well as an application of MLNs to the problem of Web query disambiguation. The ideas we present are unified by two main themes: the need to deal with limited training data and the use of bottom-up learning techniques. Structure learning, the task of automatically acquiring a set of dependencies among the relations in the domain, is a central problem in SRL. We introduce BUSL, an algorithm for learning MLN structure from scratch that proceeds in a more bottom-up fashion, breaking away from the tradition of top-down learning typical in SRL. Our approach first constructs a novel data structure called a Markov network template that is used to restrict the search space for clauses. Our experiments in three relational domains demonstrate that BUSL dramatically reduces the search space for clauses and attains a significantly higher accuracy than a structure learner that follows a top-down approach. Accurate and efficient structure learning can also be achieved by transferring a model obtained in a source domain related to the current target domain of interest. We view transfer as a revision task and present an algorithm that diagnoses a source MLN to determine which of its parts transfer directly to the target domain and which need to be updated. This analysis focuses the search for revisions on the incorrect portions of the source structure, thus speeding up learning. Transfer learning is particularly important when target-domain data is limited, such as when data on only a few individuals is available from domains with hundreds of entities connected by a variety of relations. We also address this challenging case and develop a general transfer learning approach that makes effective use of such limited target data in several social network domains. Finally, we develop an application of MLNs to the problem of Web query disambiguation in a more privacy-aware setting where the only information available about a user is that captured in a short search session of 5-6 previous queries on average. This setting contrasts with previous work that typically assumes the availability of long user-specific search histories. To compensate for the scarcity of user-specific information, our approach exploits the relations between users, search terms, and URLs. We demonstrate the effectiveness of our approach in the presence of noise and show that it outperforms several natural baselines on a large data set collected from the MSN search engine. / text
83

Αυτόματη επιλογή σημασιολογικά συγγενών όρων για την επαναδιατύπωση των ερωτημάτων σε μηχανές αναζήτησης πληροφορίας / Automatic selection of semantic related terms for reformulating a query into a search engine

Κοζανίδης, Ελευθέριος 14 September 2007 (has links)
Η βελτίωση ερωτημάτων (Query refinement) είναι η διαδικασία πρότασης εναλλακτικών όρων στους χρήστες των μηχανών αναζήτησης του Διαδικτύου για την διατύπωση της πληροφοριακής τους ανάγκης. Παρόλο που εναλλακτικοί σχηματισμοί ερωτημάτων μπορούν να συνεισφέρουν στην βελτίωση των ανακτηθέντων αποτελεσμάτων, η χρησιμοποίησή τους από χρήστες του Διαδικτύου είναι ιδιαίτερα περιορισμένη καθώς οι όροι των βελτιωμένων ερωτημάτων δεν περιέχουν σχεδόν καθόλου πληροφορία αναφορικά με τον βαθμό ομοιότητάς τους με τους όρους του αρχικού ερωτήματος, ενώ συγχρόνως δεν καταδεικνύουν το βαθμό συσχέτισής τους με τα πληροφοριακά ενδιαφέροντα των χρηστών. Παραδοσιακά, οι εναλλακτικοί σχηματισμοί ερωτημάτων καθορίζονται κατ’ αποκλειστικότητα από τη σημασιολογική σχέση που επιδεικνύουν οι συμπληρωματικοί όροι με τους αρχικούς όρους του ερωτήματος, χωρίς να λαμβάνουν υπόψη τον επιδιωκόμενο στόχο της αναζήτησης που υπολανθάνει πίσω από ένα ερώτημα του χρήστη. Στην παρούσα εργασία θα παρουσιάσουμε μια πρότυπη τεχνική βελτίωσης ερωτημάτων η οποία χρησιμοποιεί μια λεξική οντολογία προκειμένου να εντοπίσει εναλλακτικούς σχηματισμούς ερωτημάτων οι οποίοι αφενός, θα περιγράφουν το αντικείμενο της αναζήτησης του χρήστη και αφετέρου θα σχετίζονται με τα ερωτήματα που υπέβαλε ο χρήστης. Το πιο πρωτοποριακό χαρακτηριστικό της τεχνικής μας είναι η οπτική αναπαράσταση του εναλλακτικού ερωτήματος με την μορφή ενός ιεραρχικά δομημένου γράφου. Η αναπαράσταση αυτή παρέχει σαφείς πληροφορίες για την σημασιολογική σχέση μεταξύ των όρων του βελτιωμένου ερωτήματος και των όρων που χρησιμοποίησε ο χρήστης για να εκφράσει την πληροφοριακή του ανάγκη ενώ παράλληλα παρέχει την δυνατότητα στον χρήστη να επιλέξει ποιοι από τους υποψήφιους όρους θα συμμετέχουν τελικά στην διαδικασία βελτιστοποίησης δημιουργώντας διαδραστικά το νέο ερώτημα. Τα αποτελέσματα των πειραμάτων που διενεργήσαμε για να αξιολογήσουμε την απόδοση της τεχνικής μας, είναι ιδιαίτερα ικανοποιητικά και μας οδηγούν στο συμπέρασμα ότι η μέθοδός μας μπορεί να βοηθήσει σημαντικά στη διευκόλυνση του χρήστη κατά τη διαδικασία επιλογής ερωτημάτων για την ανάκτηση πληροφορίας από τα δεδομένα του Παγκόσμιου Ιστού. / Query refinement is the process of providing Web information seekers with alternative wordings for expressing their information needs. Although alternative query formulations may contribute to the improvement of retrieval results, nevertheless their realization by Web users is intrinsically limited in that alternative query wordings do not convey explicit information about neither their degree nor their type of correlation to the user-issued queries. Moreover, alternative query formulations are determined based on the semantics of the issued query alone and they do not consider anything about the search intentions of the user issuing that query. In this paper, we introduce a novel query refinement technique which uses a lexical ontology for identifying alternative query formulations that are both informative of the user’s interests and related to the user selected queries. The most innovative feature of our technique is the visualization of the alternative query wordings in a graphical representation form, which conveys explicit information about the refined queries correlation to the user issued requests and which allows the user select which terms to participate in the refinement process. Experimental results demonstrate that our method has a significant potential in improving the user search experience.
84

Context-aware semantic analysis of video metadata

Steinmetz, Nadine January 2013 (has links)
Im Vergleich zu einer stichwortbasierten Suche ermöglicht die semantische Suche ein präziseres und anspruchsvolleres Durchsuchen von (Web)-Dokumenten, weil durch die explizite Semantik Mehrdeutigkeiten von natürlicher Sprache vermieden und semantische Beziehungen in das Suchergebnis einbezogen werden können. Eine semantische, Entitäten-basierte Suche geht von einer Anfrage mit festgelegter Bedeutung aus und liefert nur Dokumente, die mit dieser Entität annotiert sind als Suchergebnis. Die wichtigste Voraussetzung für eine Entitäten-zentrierte Suche stellt die Annotation der Dokumente im Archiv mit Entitäten und Kategorien dar. Textuelle Informationen werden analysiert und mit den entsprechenden Entitäten und Kategorien versehen, um den Inhalt semantisch erschließen zu können. Eine manuelle Annotation erfordert Domänenwissen und ist sehr zeitaufwendig. Die semantische Annotation von Videodokumenten erfordert besondere Aufmerksamkeit, da inhaltsbasierte Metadaten von Videos aus verschiedenen Quellen stammen, verschiedene Eigenschaften und Zuverlässigkeiten besitzen und daher nicht wie Fließtext behandelt werden können. Die vorliegende Arbeit stellt einen semantischen Analyseprozess für Video-Metadaten vor. Die Eigenschaften der verschiedenen Metadatentypen werden analysiert und ein Konfidenzwert ermittelt. Dieser Wert spiegelt die Korrektheit und die wahrscheinliche Mehrdeutigkeit eines Metadatums wieder. Beginnend mit dem Metadatum mit dem höchsten Konfidenzwert wird der Analyseprozess innerhalb eines Kontexts in absteigender Reihenfolge des Konfidenzwerts durchgeführt. Die bereits analysierten Metadaten dienen als Referenzpunkt für die weiteren Analysen. So kann eine möglichst korrekte Analyse der heterogen strukturierten Daten eines Kontexts sichergestellt werden. Am Ende der Analyse eines Metadatums wird die für den Kontext relevanteste Entität aus einer Liste von Kandidaten identifiziert - das Metadatum wird disambiguiert. Hierfür wurden verschiedene Disambiguierungsalgorithmen entwickelt, die Beschreibungstexte und semantische Beziehungen der Entitätenkandidaten zum gegebenen Kontext in Betracht ziehen. Der Kontext für die Disambiguierung wird für jedes Metadatum anhand der Eigenschaften und Konfidenzwerte zusammengestellt. Der vorgestellte Analyseprozess ist an zwei Hypothesen angelehnt: Um die Analyseergebnisse verbessern zu können, sollten die Metadaten eines Kontexts in absteigender Reihenfolge ihres Konfidenzwertes verarbeitet werden und die Kontextgrenzen von Videometadaten sollten durch Segmentgrenzen definiert werden, um möglichst Kontexte mit kohärentem Inhalt zu erhalten. Durch ausführliche Evaluationen konnten die gestellten Hypothesen bestätigt werden. Der Analyseprozess wurden gegen mehrere State-of-the-Art Methoden verglichen und erzielt verbesserte Ergebnisse in Bezug auf Recall und Precision, besonders für Metadaten, die aus weniger zuverlässigen Quellen stammen. Der Analyseprozess ist Teil eines Videoanalyse-Frameworks und wurde bereits erfolgreich in verschiedenen Projekten eingesetzt. / The Semantic Web provides information contained in the World Wide Web as machine-readable facts. In comparison to a keyword-based inquiry, semantic search enables a more sophisticated exploration of web documents. By clarifying the meaning behind entities, search results are more precise and the semantics simultaneously enable an exploration of semantic relationships. However, unlike keyword searches, a semantic entity-focused search requires that web documents are annotated with semantic representations of common words and named entities. Manual semantic annotation of (web) documents is time-consuming; in response, automatic annotation services have emerged in recent years. These annotation services take continuous text as input, detect important key terms and named entities and annotate them with semantic entities contained in widely used semantic knowledge bases, such as Freebase or DBpedia. Metadata of video documents require special attention. Semantic analysis approaches for continuous text cannot be applied, because information of a context in video documents originates from multiple sources possessing different reliabilities and characteristics. This thesis presents a semantic analysis approach consisting of a context model and a disambiguation algorithm for video metadata. The context model takes into account the characteristics of video metadata and derives a confidence value for each metadata item. The confidence value represents the level of correctness and ambiguity of the textual information of the metadata item. The lower the ambiguity and the higher the prospective correctness, the higher the confidence value. The metadata items derived from the video metadata are analyzed in a specific order from high to low confidence level. Previously analyzed metadata are used as reference points in the context for subsequent disambiguation. The contextually most relevant entity is identified by means of descriptive texts and semantic relationships to the context. The context is created dynamically for each metadata item, taking into account the confidence value and other characteristics. The proposed semantic analysis follows two hypotheses: metadata items of a context should be processed in descendent order of their confidence value, and the metadata that pertains to a context should be limited by content-based segmentation boundaries. The evaluation results support the proposed hypotheses and show increased recall and precision for annotated entities, especially for metadata that originates from sources with low reliability. The algorithms have been evaluated against several state-of-the-art annotation approaches. The presented semantic analysis process is integrated into a video analysis framework and has been successfully applied in several projects for the purpose of semantic video exploration of videos.
85

Peer to peer English/Chinese cross-language information retrieval

Lu, Chengye January 2008 (has links)
Peer to peer systems have been widely used in the internet. However, most of the peer to peer information systems are still missing some of the important features, for example cross-language IR (Information Retrieval) and collection selection / fusion features. Cross-language IR is the state-of-art research area in IR research community. It has not been used in any real world IR systems yet. Cross-language IR has the ability to issue a query in one language and receive documents in other languages. In typical peer to peer environment, users are from multiple countries. Their collections are definitely in multiple languages. Cross-language IR can help users to find documents more easily. E.g. many Chinese researchers will search research papers in both Chinese and English. With Cross-language IR, they can do one query in Chinese and get documents in two languages. The Out Of Vocabulary (OOV) problem is one of the key research areas in crosslanguage information retrieval. In recent years, web mining was shown to be one of the effective approaches to solving this problem. However, how to extract Multiword Lexical Units (MLUs) from the web content and how to select the correct translations from the extracted candidate MLUs are still two difficult problems in web mining based automated translation approaches. Discovering resource descriptions and merging results obtained from remote search engines are two key issues in distributed information retrieval studies. In uncooperative environments, query-based sampling and normalized-score based merging strategies are well-known approaches to solve such problems. However, such approaches only consider the content of the remote database but do not consider the retrieval performance of the remote search engine. This thesis presents research on building a peer to peer IR system with crosslanguage IR and advance collection profiling technique for fusion features. Particularly, this thesis first presents a new Chinese term measurement and new Chinese MLU extraction process that works well on small corpora. An approach to selection of MLUs in a more accurate manner is also presented. After that, this thesis proposes a collection profiling strategy which can discover not only collection content but also retrieval performance of the remote search engine. Based on collection profiling, a web-based query classification method and two collection fusion approaches are developed and presented in this thesis. Our experiments show that the proposed strategies are effective in merging results in uncooperative peer to peer environments. Here, an uncooperative environment is defined as each peer in the system is autonomous. Peer like to share documents but they do not share collection statistics. This environment is a typical peer to peer IR environment. Finally, all those approaches are grouped together to build up a secure peer to peer multilingual IR system that cooperates through X.509 and email system.
86

Interaction à distance en environnement physique augmenté / Distant interaction in an augmented physical environment

Delamare, William 02 November 2015 (has links)
Nous nous intéressons à l'interaction dans le contexte d'environnements physiques augmentés, plus précisément avec les objets physiques qui les composent. Bien que l'augmentation de ces objets offre de nouvelles possibilités d'interaction, notamment celle d'interagir à distance, le monde physique possède des caractéristiques propres rendant difficile l'adaptation de techniques d'interaction existantes en environnements virtuels. Il convient alors d'identifier ces caractéristiques afin de concevoir des techniques d'interaction à la fois efficaces et plaisantes dédiées à ces environnements physiques augmentés. Dans nos travaux, nous décomposons cette interaction à distance avec des objets physiques augmentés en deux étapes complémentaires : la sélection et le contrôle. Nous apportons deux contributions à chacun de ces champs de recherche. Ces contributions sont à la fois conceptuelles, avec la création d'espaces de conception, et pratiques, avec la conception, la réalisation logicielle et l'évaluation expérimentale de techniques d'interaction :- Pour l'étape de sélection, nous explorons la désambiguïsation potentielle après un geste de pointage à distance définissant un volume de sélection comme avec une télécommande infrarouge par exemple. En effet, bien que ce type de pointage sollicite moins de précision de la part de l'utilisateur, il peut néanmoins impliquer la sélection de plusieurs objets dans le volume de sélection et donc nécessiter une phase de désambiguïsation. Nous définissons et utilisons un espace de conception afin de concevoir et évaluer expérimentalement deux techniques de désambiguïsation visant à maintenir l'attention visuelle de l'utilisateur sur les objets physiques.- Pour l'étape de contrôle, nous explorons le guidage de gestes 3D lors d'une interaction gestuelle afin de spécifier des commandes à distance. Ce guidage est nécessaire afin d'indiquer à l'utilisateur les commandes disponibles ainsi que les gestes associés. Nous définissons un espace de conception capturant les caractéristiques comportementales d'un large ensemble de guides ainsi qu'un outil en ligne facilitant son utilisation. Nous explorons ensuite plusieurs options de conception afin d'étudier expérimentalement leurs impacts sur la qualité du guidage de gestes 3D. / We explore interaction with augmented physical objects within physical environments. Augmented physical objects allow new ways of interaction, including distant interaction. However, the physical world has specificities making difficult the adaptation of interaction techniques already existing in virtual environments. These specificities need to be identified in order to design efficient and enjoyable interaction techniques dedicated to augmented physical environments. In our work, we split up distant interaction into two complementary stages: the selection and the control of augmented physical objects. For each of these stages, our contribution is two-fold. These contributions are both theoretical, with the establishment of design spaces, and practical, with the design, the implementation and the experimental evaluation of interaction techniques:- For the selection stage, we study the disambiguation potentially needed after a distal pointing gesture using a volume selection such as an infrared remote controller. Indeed, although the volume selection can facilitate the aiming action, several objects can fall into the selected volume. Thus, users should disambiguate this coarse pointing selection. We define and use a design space in order to design and experimentally evaluate two disambiguation techniques that maintain the user's focus on the physical objects.- For the control stage, we study the guidance of 3D hand gestures in order to trigger commands at a distance. Such guidance is essential in order to reveal available commands and the associated gestures. We define a design space capturing specificities of a wide range of guiding systems. We also provide an online tool, easing the use of such a large design space. We then explore the impact of several design options on the quality of 3D gestures guidance.
87

LUDI: um framework para desambiguação lexical com base no enriquecimento da semântica de frames

Matos, Ely Edison da Silva 27 June 2014 (has links)
Submitted by Renata Lopes (renatasil82@gmail.com) on 2016-02-05T16:40:06Z No. of bitstreams: 1 elyedisondasilvamatos.pdf: 5520917 bytes, checksum: c9e7d798d96928a6ad4f2ee48d912531 (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2016-02-26T11:51:46Z (GMT) No. of bitstreams: 1 elyedisondasilvamatos.pdf: 5520917 bytes, checksum: c9e7d798d96928a6ad4f2ee48d912531 (MD5) / Made available in DSpace on 2016-02-26T11:51:47Z (GMT). No. of bitstreams: 1 elyedisondasilvamatos.pdf: 5520917 bytes, checksum: c9e7d798d96928a6ad4f2ee48d912531 (MD5) Previous issue date: 2014-06-27 / Enquanto no âmbito da Sintaxe, as técnicas, os algoritmos e as aplicações em Processamento da Língua Natural são bem estudados e já estão relativamente bem estabelecidos, no âmbito da Semântica não é possível observar ainda a mesma maturidade. Visando, então, contribuir para os estudos em Semântica Computacional, este trabalho busca maneiras de implementar algumas das ideias e dos insights propostos pela Linguística Cognitiva, que é, por si, uma alternativa à Linguística Gerativa. A tentativa é reunir algumas das ferramentas disponíveis, seja no viés computacional (Bancos de Dados, Teoria dos Grafos, Ontologias, Mecanismos de inferências, Modelos Conexionistas), seja no viés linguístico (Semântica de Frames e Teoria do Léxico Gerativo), seja no viés de aplicações (FrameNet e ontologia SIMPLE), a fim de abordar as questões semânticas de forma mais flexível. O objeto de estudo é o processo de desambiguação de Unidades Lexicais. O resultado da pesquisa realizada é corporificado na forma de uma aplicação computacional, chamada Framework LUDI (Lexical Unit Discovery through Inference), composta por algoritmos e estruturas de dados usados na desambiguação. O framework é uma aplicação de Compreensão da Língua Natural, que pode ser integrada em ferramentas para recuperação de informação e sumarização, bem como em processos de Etiquetagem de Papéis Semânticos (SRL - Semantic Role Labeling). / While in the field of Syntax techniques, algorithms and applications in Natural Language Processing are well known and relatively well established, the same situation does not hold for the field of Semantics. Aiming at contributing to the studies in Computational Semantics, this work implements ideas and insights offered by Cognitive Linguistics, which is itself an alternative to Generative Linguistics. We attempt to bring together contributions from the computational domain (Databases, Graph Theory, Ontologies, inference mechanisms, Connectionists Models), the linguistic domain (Frame Semantics and the Generative Lexicon), and the application domain (FrameNet and SIMPLE Ontology) in order to address the semantic issues more flexibly. The object of study is the process of disambiguation of Lexical Units. The results of the research are embodied in the form of a computer application, called Framework LUDI (Lexical Unit Discovery through Inference), and composed of algorithms and data structures used for Lexical Unit disambiguation. The framework is an application of Natural Language Understanding, which can be integrated into information retrieval and summarization tools, as well as into processes of Semantic Role Labeling (SRL).
88

Um estudo comparativo entre abordagens supervisionadas para a resolução de referências a autores / A comparative study of supervised approaches for author reference resolution

CANUTO, Sérgio Daniel Carvalho 25 August 2011 (has links)
Made available in DSpace on 2014-07-29T14:57:49Z (GMT). No. of bitstreams: 1 Dissertacao Sergio Daniel Carvalho Canuto.pdf: 584503 bytes, checksum: 6a393853a561ed8fec4bd9e4eef56628 (MD5) Previous issue date: 2011-08-25 / In this work we investigate two classes of solutions for the problem of author name disambiguation.We refer to the approach of the first class as relational based on attributes (RBA) solutions. These approaches use similarity measures based on attributes of the two references being compared or based on the attributes of other references connected to them by authorship. The other class of approaches uses information on semantic relationships among entities in addition to attribute based similarity measures to decide if two references refer to the same author. We refer to the approaches of this class as relational based on entities (RBE) solutions. We present a supervised version of the RBE based on the work introduced by Bhattacharya and Gettor [7]. In the experiments we conducted our RBE solution presented statistically significant gains in efficacy over all the other methods studied. However, the gains are only marginal over the RBA methods experimented. On the other hand, the execution time of both training and testing phases of the RBE methods are notably greater than those of the RBA methods. As far as we know there is no other similar study reported in literature and we consider the results reported here are relevant because they inspire research about enhancing RBA solutions. / Neste trabalho investigamos duas classes de soluções supervisionadas para o problema de resolver se duas ou mais referências a autores (nomes de autores) correspondem à mesma pessoa. Denominamos abordagens relacionais baseadas em atributo (RBA) as abordagens da primeira classe. Nessas abordagens são utilizadas medidas de similaridades entre atributos textuais de duas referências ou de referências ligadas a elas por coautoria. A outra classe de soluções estudada utiliza informações de relacionamento semântico entre entidades, em adição às similaridades por atributos, para decidir quando duas ou mais referências devem ser consideradas correferentes. Denominamos as abordagens dessa classe de relacionais baseadas em entidades (RBE). Apresentamos uma versão supervisionada de solução RBE que se baseia na proposta apresentada por Bhattacharya e Gettor [7]. Experimentos utilizando duas coleções reais e uma coleção artificial mostram que a solução RBE proposta neste trabalho apresenta ganhos de eficácia estatisticamente comprovados em relação a todos os métodos analisados. Entretanto, o ganho é apenas marginal em relação aos métodos da classe RBA analisados. Por outro lado, o custo computacional tanto de treino quanto de teste das abordagens RBE é consideravelmente maior que o custo dos métodos RBA. Consideramos que esse estudo comparativo é inédito e que as conclusões são importantes, pois incentivam pesquisas para o aprimoramento das soluções RBA.
89

Análise de sentimento e desambiguação no contexto da tv social

Lima, Ana Carolina Espírito Santo 14 December 2012 (has links)
Made available in DSpace on 2016-03-15T19:37:43Z (GMT). No. of bitstreams: 1 Ana Carolina Espirito Santo Lima.pdf: 2485278 bytes, checksum: 9843b9f756f82c023af6a2ee291f2b1d (MD5) Previous issue date: 2012-12-14 / Fundação de Amparo a Pesquisa do Estado de São Paulo / Social media have become a way of expressing collective interests. People are motivated by the sharing of information and the feedback from friends and colleagues. Among the many social media tools available, the Twitter microblog is gaining popularity as a platform for in-stantaneous communication. Millions of messages are generated daily, from over 100 million users, about the most varied subjects. As it is a rapid communication platform, this microblog spurred a phenomenon called television storytellers, where surfers comment on what they watch on TV while the programs are being transmitted. The Social TV emerged from this integration between social media and television. The amount of data generated on the TV shows is a rich material for data analysis. Broadcasters may use such information to improve their programs and increase interaction with their audience. Among the main challenges in social media data analysis there is sentiment analysis (to determine the polarity of a text, for instance, positive or negative), and sense disambiguation (to determine the right context of polysemic words). This dissertation aims to use machine learning techniques to create a tool to support Social TV, contributing specifically to the automation of sentiment analysis and disambiguation of Twitter messages. / As mídias sociais são uma forma de expressão dos interesses coletivos, as pessoas gostam de compartilhar informações e sentem-se valorizadas por causa disso. Entre as mídias sociais o microblog Twitter vem ganhando popularidade como uma plataforma para comunicação ins-tantânea. São milhões de mensagens geradas todos os dias, por cerca de 100 milhões de usuá-rios, carregadas dos mais diversos assuntos. Por ser uma plataforma de comunicação rápida esse microblog estimulou um fenômeno denominado narradores televisivos, em que os inter-nautas comentam sobre o que assistem na TV no momento em que é transmitido. Dessa inte-gração entre as mídias sociais e a televisão emergiu a TV Social. A quantidade de dados gera-dos sobre os programas de TV formam um rico material para análise de dados. Emissoras podem usar tais informações para aperfeiçoar seus programas e aumentar a interação com seu público. Dentre os principais desafios da análise de dados de mídias sociais encontram-se a análise de sentimento (determinação de polaridade em um texto, por exemplo, positivo ou negativo) e a desambiguação de sentido (determinação do contexto correto de palavras polis-sêmicas). Essa dissertação tem como objetivo usar técnicas de aprendizagem de máquina para a criação de uma ferramenta de apoio à TV Social com contribuições na automatização dos processos de análise de sentimento e desambiguação de sentido de mensagens postadas no Twitter.
90

Uma abordagem híbrida relacional para a desambiguação lexical de sentido na tradução automática / A hybrid relational approach for word sense disambiguation in machine translation

Lucia Specia 28 September 2007 (has links)
A comunicação multilíngue é uma tarefa cada vez mais imperativa no cenário atual de grande disseminação de informações em diversas línguas. Nesse contexto, são de grande relevância os sistemas de tradução automática, que auxiliam tal comunicação, automatizando-a. Apesar de ser uma área de pesquisa bastante antiga, a Tradução Automática ainda apresenta muitos problemas. Um dos principais problemas é a ambigüidade lexical, ou seja, a necessidade de escolha de uma palavra, na língua alvo, para traduzir uma palavra da língua fonte quando há várias opções de tradução. Esse problema se mostra ainda mais complexo quando são identificadas apenas variações de sentido nas opções de tradução. Ele é denominado, nesse caso, \"ambigüidade lexical de sentido\". Várias abordagens têm sido propostas para a desambiguação lexical de sentido, mas elas são, em geral, monolíngues (para o inglês) e independentes de aplicação. Além disso, apresentam limitações no que diz respeito às fontes de conhecimento que podem ser exploradas. Em se tratando da língua portuguesa, em especial, não há pesquisas significativas voltadas para a resolução desse problema. O objetivo deste trabalho é a proposta e desenvolvimento de uma nova abordagem de desambiguação lexical de sentido, voltada especificamente para a tradução automática, que segue uma metodologia híbrida (baseada em conhecimento e em córpus) e utiliza um formalismo relacional para a representação de vários tipos de conhecimentos e de exemplos de desambiguação, por meio da técnica de Programação Lógica Indutiva. Experimentos diversos mostraram que a abordagem proposta supera abordagens alternativas para a desambiguação multilíngue e apresenta desempenho superior ou comparável ao do estado da arte em desambiguação monolíngue. Adicionalmente, tal abordagem se mostrou efetiva como mecanismo auxiliar para a escolha lexical na tradução automática estatística / Crosslingual communication has become a very imperative task in the current scenario with the increasing amount of information dissemination in several languages. In this context, machine translation systems, which can facilitate such communication by providing automatic translations, are of great importance. Although research in Machine Translation dates back to the 1950\'s, the area still has many problems. One of the main problems is that of lexical ambiguity, that is, the need for lexical choice when translating a source language word that has several translation options in the target language. This problem is even more complex when only sense variations are found in the translation options, a problem named \"sense ambiguity\". Several approaches have been proposed for word sense disambiguation, but they are in general monolingual (for English) and application-independent. Moreover, they have limitations regarding the types of knowledge sources that can be exploited. Particularly, there is no significant research aiming to word sense disambiguation involving Portuguese. The goal of this PhD work is the proposal and development of a novel approach for word sense disambiguation which is specifically designed for machine translation, follows a hybrid methodology (knowledge and corpus-based), and employs a relational formalism to represent various kinds of knowledge sources and disambiguation examples, by using Inductive Logic Programming. Several experiments have shown that the proposed approach overcomes alternative approaches in multilingual disambiguation and achieves higher or comparable results to the state of the art in monolingual disambiguation. Additionally, the approach has shown to effectively assist lexical choice in a statistical machine translation system

Page generated in 0.076 seconds