Global ETD Search

1	Vyvozování v přirozeném jazyce s využitím obrazových dat / Grounding Natural Language Inference on Images Vu Trong, Hoa January 2018 (has links) Grounding Natural Language Inference on Images Hoa Trong VU July 20, 2018 Abstract Despite the surge of research interest in problems involving linguistic and vi- sual information, exploring multimodal data for Natural Language Inference remains unexplored. Natural Language Inference, regarded as the basic step towards Natural Language Understanding, is extremely challenging due to the natural complexity of human languages. However, we believe this issue can be alleviated by using multimodal data. Given an image and its description, our proposed task is to determined whether a natural language hypothesis contra- dicts, entails or is neutral with regards to the image and its description. To address this problem, we develop a multimodal framework based on the Bilat- eral Multi-perspective Matching framework. Data is collected by mapping the SNLI dataset with the image dataset Flickr30k. The result dataset, made pub- licly available, has more than 565k instances. Experiments on this dataset show that the multimodal model outperforms the state-of-the-art textual model. References 1
2	Textual entailment for modern standard Arabic Alabbas, Maytham Abualhail Shahed January 2013 (has links) This thesis explores a range of approaches to the task of recognising textual entailment (RTE), i.e. determining whether one text snippet entails another, for Arabic, where we are faced with an exceptional level of lexical and structural ambiguity. To the best of our knowledge, this is the first attempt to carry out this task for Arabic. Tree edit distance (TED) has been widely used as a component of natural language processing (NLP) systems that attempt to achieve the goal above, with the distance between pairs of dependency trees being taken as a measure of the likelihood that one entails the other. Such a technique relies on having accurate linguistic analyses. Obtaining such analyses for Arabic is notoriously difficult. To overcome these problems we have investigated strategies for improving tagging and parsing depending on system combination techniques. These strategies lead to substantially better performance than any of the contributing tools. We describe also a semi-automatic technique for creating a first dataset for RTE for Arabic using an extension of the ‘headline-lead paragraph’ technique because there are, again to the best of our knowledge, no such datasets available. We sketch the difficulties inherent in volunteer annotators-based judgment, and describe a regime to ameliorate some of these. The major contribution of this thesis is the introduction of two ways of improving the standard TED: (i) we present a novel approach, extended TED (ETED), for extending the standard TED algorithm for calculating the distance between two trees by allowing operations to apply to subtrees, rather than just to single nodes. This leads to useful improvements over the performance of the standard TED for determining entailment. The key here is that subtrees tend to correspond to single information units. By treating operations on subtrees as less costly than the corresponding set of individual node operations, ETED concentrates on entire information units, which are a more appropriate granularity than individual words for considering entailment relations; and (ii) we use the artificial bee colony (ABC) algorithm to automatically estimate the cost of edit operations for single nodes and subtrees and to determine thresholds, since assigning an appropriate cost to each edit operation manually can become a tricky task.The current findings are encouraging. These extensions can substantially affect the F-score and accuracy and achieve a better RTE model when compared with a number of string-based algorithms and the standard TED approaches. The relative performance of the standard techniques on our Arabic test set replicates the results reported for these techniques for English test sets. We have also applied ETED with ABC to the English RTE2 test set, where it again outperforms the standard TED. 006.3
3	Sentence Pair Modeling and Beyond Lan, Wuwei January 2021 (has links) No description available. Computer Science
4	Understanding the Importance of Entities and Roles in Natural Language Inference : A Model and Datasets January 2019 (has links) abstract: In this thesis, I present two new datasets and a modification to the existing models in the form of a novel attention mechanism for Natural Language Inference (NLI). The new datasets have been carefully synthesized from various existing corpora released for different tasks. The task of NLI is to determine the possibility of a sentence referred to as “Hypothesis” being true given that another sentence referred to as “Premise” is true. In other words, the task is to identify whether the “Premise” entails, contradicts or remains neutral with regards to the “Hypothesis”. NLI is a precursor to solving many Natural Language Processing (NLP) tasks such as Question Answering and Semantic Search. For example, in Question Answering systems, the question is paraphrased to form a declarative statement which is treated as the hypothesis. The options are treated as the premise. The option with the maximum entailment score is considered as the answer. Considering the applications of NLI, the importance of having a strong NLI system can't be stressed enough. Many large-scale datasets and models have been released in order to advance the field of NLI. While all of these models do get good accuracy on the test sets of the datasets they were trained on, they fail to capture the basic understanding of “Entities” and “Roles”. They often make the mistake of inferring that “John went to the market.” from “Peter went to the market.” failing to capture the notion of “Entities”. In other cases, these models don't understand the difference in the “Roles” played by the same entities in “Premise” and “Hypothesis” sentences and end up wrongly inferring that “Peter drove John to the stadium.” from “John drove Peter to the stadium.” The lack of understanding of “Roles” can be attributed to the lack of such examples in the various existing datasets. The reason for the existing model’s failure in capturing the notion of “Entities” is not just due to the lack of such examples in the existing NLI datasets. It can also be attributed to the strict use of vector similarity in the “word-to-word” attention mechanism being used in the existing architectures. To overcome these issues, I present two new datasets to help make the NLI systems capture the notion of “Entities” and “Roles”. The “NER Changed” (NC) dataset and the “Role-Switched” (RS) dataset contains examples of Premise-Hypothesis pairs that require the understanding of “Entities” and “Roles” respectively in order to be able to make correct inferences. This work shows how the existing architectures perform poorly on the “NER Changed” (NC) dataset even after being trained on the new datasets. In order to help the existing architectures, understand the notion of “Entities”, this work proposes a modification to the “word-to-word” attention mechanism. Instead of relying on vector similarity alone, the modified architectures learn to incorporate the “Symbolic Similarity” as well by using the Named-Entity features of the Premise and Hypothesis sentences. The new modified architectures not only perform significantly better than the unmodified architectures on the “NER Changed” (NC) dataset but also performs as well on the existing datasets. / Dissertation/Thesis / Masters Thesis Computer Science 2019 Artificial intelligence Computer science Information technology Artificial Intelligence Deep Learning Entailment Natural Language Inference Natural Language Processing Natural Language Understanding
5	DS-Fake : a data stream mining approach for fake news detection Mputu Boleilanga, Henri-Cedric 08 1900 (has links) L’avènement d’internet suivi des réseaux sociaux a permis un accès facile et une diffusion rapide de l’information par toute personne disposant d’une connexion internet. L’une des conséquences néfastes de cela est la propagation de fausses informations appelées «fake news». Les fake news représentent aujourd’hui un enjeu majeur au regard de ces conséquences. De nombreuses personnes affirment encore aujourd’hui que sans la diffusion massive de fake news sur Hillary Clinton lors de la campagne présidentielle de 2016, Donald Trump n’aurait peut-être pas été le vainqueur de cette élection. Le sujet de ce mémoire concerne donc la détection automatique des fake news. De nos jours, il existe un grand nombre de travaux à ce sujet. La majorité des approches présentées se basent soit sur l’exploitation du contenu du texte d’entrée, soit sur le contexte social du texte ou encore sur un mélange entre ces deux types d’approches. Néanmoins, il existe très peu d’outils ou de systèmes efficaces qui détecte une fausse information dans la vie réelle, tout en incluant l’évolution de l’information au cours du temps. De plus, il y a un manque criant de systèmes conçues dans le but d’aider les utilisateurs des réseaux sociaux à adopter un comportement qui leur permettrait de détecter les fausses nouvelles. Afin d’atténuer ce problème, nous proposons un système appelé DS-Fake. À notre connaissance, ce système est le premier à inclure l’exploration de flux de données. Un flux de données est une séquence infinie et dénombrable d’éléments et est utilisée pour représenter des données rendues disponibles au fil du temps. DS-Fake explore à la fois l’entrée et le contenu d’un flux de données. L’entrée est une publication sur Twitter donnée au système afin qu’il puisse déterminer si le tweet est digne de confiance. Le flux de données est extrait à l’aide de techniques d’extraction du contenu de sites Web. Le contenu reçu par ce flux est lié à l’entrée en termes de sujets ou d’entités nommées mentionnées dans le texte d’entrée. DS-Fake aide également les utilisateurs à développer de bons réflexes face à toute information qui se propage sur les réseaux sociaux. DS-Fake attribue un score de crédibilité aux utilisateurs des réseaux sociaux. Ce score décrit la probabilité qu’un utilisateur puisse publier de fausses informations. La plupart des systèmes utilisent des caractéristiques comme le nombre de followers, la localisation, l’emploi, etc. Seuls quelques systèmes utilisent l’historique des publications précédentes d’un utilisateur afin d’attribuer un score. Pour déterminer ce score, la majorité des systèmes utilisent la moyenne. DS-Fake renvoie un pourcentage de confiance qui détermine la probabilité que l’entrée soit fiable. Contrairement au petit nombre de systèmes qui utilisent l’historique des publications en ne prenant pas en compte que les tweets précédents d’un utilisateur, DS-Fake calcule le score de crédibilité sur la base des tweets précédents de tous les utilisateurs. Nous avons renommé le score de crédibilité par score de légitimité. Ce dernier est basé sur la technique de la moyenne Bayésienne. Cette façon de calculer le score permet d’atténuer l’impact des résultats des publications précédentes en fonction du nombre de publications dans l’historique. Un utilisateur donné ayant un plus grand nombre de tweets dans son historique qu’un autre utilisateur, même si les tweets des deux sont tous vrais, le premier utilisateur est plus crédible que le second. Son score de légitimité sera donc plus élevé. À notre connaissance, ce travail est le premier qui utilise la moyenne Bayésienne basée sur l’historique de tweets de toutes les sources pour attribuer un score à chaque source. De plus, les modules de DS-Fake ont la capacité d’encapsuler le résultat de deux tâches, à savoir la similarité de texte et l’inférence en langage naturel hl(en anglais Natural Language Inference). Ce type de modèle qui combine ces deux tâches de TAL est également nouveau pour la problématique de la détection des fake news. DS-Fake surpasse en termes de performance toutes les approches de l’état de l’art qui ont utilisé FakeNewsNet et qui se sont basées sur diverses métriques. Il y a très peu d’ensembles de données complets avec une variété d’attributs, ce qui constitue un des défis de la recherche sur les fausses nouvelles. Shu et al. ont introduit en 2018 l’ensemble de données FakeNewsNet pour résoudre ce problème. Le score de légitimité et les tweets récupérés ajoutent des attributs à l’ensemble de données FakeNewsNet. / The advent of the internet, followed by online social networks, has allowed easy access and rapid propagation of information by anyone with an internet connection. One of the harmful consequences of this is the spread of false information, which is well-known by the term "fake news". Fake news represent a major challenge due to their consequences. Some people still affirm that without the massive spread of fake news about Hillary Clinton during the 2016 presidential campaign, Donald Trump would not have been the winner of the 2016 United States presidential election. The subject of this thesis concerns the automatic detection of fake news. Nowadays, there is a lot of research on this subject. The vast majority of the approaches presented in these works are based either on the exploitation of the input text content or the social context of the text or even on a mixture of these two types of approaches. Nevertheless, there are only a few practical tools or systems that detect false information in real life, and that includes the evolution of information over time. Moreover, no system yet offers an explanation to help social network users adopt a behaviour that will allow them to detect fake news. In order to mitigate this problem, we propose a system called DS-Fake. To the best of our knowledge, this system is the first to include data stream mining. A data stream is a sequence of elements used to represent data elements over time. This system explores both the input and the contents of a data stream. The input is a post on Twitter given to the system that determines if the tweet can be trusted. The data stream is extracted using web scraping techniques. The content received by this flow is related to the input in terms of topics or named entities mentioned in the input text. This system also helps users develop good reflexes when faced with any information that spreads on social networks. DS-Fake assigns a credibility score to users of social networks. This score describes how likely a user can publish false information. Most of the systems use features like the number of followers, the localization, the job title, etc. Only a few systems use the history of a user’s previous publications to assign a score. To determine this score, most systems use the average. DS-Fake returns a percentage of confidence that determines how likely the input is reliable. Unlike the small number of systems that use the publication history by taking into account only the previous tweets of a user, DS-Fake calculates the credibility score based on the previous tweets of all users. We renamed the credibility score legitimacy score. The latter is based on the Bayesian averaging technique. This way of calculating the score allows attenuating the impact of the results from previous posts according to the number of posts in the history. A user who has more tweets in his history than another user, even if the tweets of both are all true, the first user is more credible than the second. His legitimacy score will therefore be higher. To our knowledge, this work is the first that uses the Bayesian average based on the post history of all sources to assign a score to each source. DS-Fake modules have the ability to encapsulate the output of two tasks, namely text similarity and natural language inference. This type of model that combines these two NLP tasks is also new for the problem of fake news detection. There are very few complete datasets with a variety of attributes, which is one of the challenges of fake news research. Shu et al. introduce in 2018 the FakeNewsNet dataset to tackle this issue. Our work uses and enriches this dataset. The legitimacy score and the retrieved tweets from named entities mentioned in the input texts add features to the FakeNewsNet dataset. DS-Fake outperforms all state-of-the-art approaches that have used FakeNewsNet and that are based on various metrics. Détection de fausses nouvelles Exploration de flux de données IA explicable score de légitimité Traitement Automatique du Langage Inférence du langage naturel Similarité de texte Reconnaissance d’entité nommée Réseaux de neurones Fake news detection Data stream mining Explainable AI Legitimacy score Natural Language Processing Natural Language Inference Text similarity Named Entity Recognition Neural Networks
6	Deep neural networks for natural language processing and its acceleration Lin, Zhouhan 08 1900 (has links) Cette thèse par article comprend quatre articles qui contribuent au domaine de l'apprentissage profond, en particulier à l'accélération de l’apprentissage par le biais de réseaux à faible précision et à l'application de réseaux de neurones profonds au traitement du langage naturel. Dans le premier article, nous étudions un schéma d’entraînement de réseau de neurones qui élimine la plupart des multiplications en virgule flottante. Cette approche consiste à binariser ou à ternariser les poids dans la propagation en avant et à quantifier les états cachés dans la propagation arrière, ce qui convertit les multiplications en changements de signe et en décalages binaires. Les résultats expérimentaux sur des jeux de données de petite à moyenne taille montrent que cette approche produit des performances encore meilleures que l’approche standard de descente de gradient stochastique, ouvrant la voie à un entraînement des réseaux de neurones rapide et efficace au niveau du matériel. Dans le deuxième article, nous avons proposé un mécanisme structuré d’auto-attention d’enchâssement de phrases qui extrait des représentations interprétables de phrases sous forme matricielle. Nous démontrons des améliorations dans 3 tâches différentes: le profilage de l'auteur, la classification des sentiments et l'implication textuelle. Les résultats expérimentaux montrent que notre modèle génère un gain en performance significatif par rapport aux autres méthodes d’enchâssement de phrases dans les 3 tâches. Dans le troisième article, nous proposons un modèle hiérarchique avec graphe de calcul dynamique, pour les données séquentielles, qui apprend à construire un arbre lors de la lecture de la séquence. Le modèle apprend à créer des connexions de saut adaptatives, ce qui facilitent l'apprentissage des dépendances à long terme en construisant des cellules récurrentes de manière récursive. L’entraînement du réseau peut être fait soit par entraînement supervisée en donnant des structures d’arbres dorés, soit par apprentissage par renforcement. Nous proposons des expériences préliminaires dans 3 tâches différentes: une nouvelle tâche d'évaluation de l'expression mathématique (MEE), une tâche bien connue de la logique propositionnelle et des tâches de modélisation du langage. Les résultats expérimentaux montrent le potentiel de l'approche proposée. Dans le quatrième article, nous proposons une nouvelle méthode d’analyse par circonscription utilisant les réseaux de neurones. Le modèle prédit la structure de l'arbre d'analyse en prédisant un scalaire à valeur réelle, soit la distance syntaxique, pour chaque position de division dans la phrase d'entrée. L'ordre des valeurs relatives de ces distances syntaxiques détermine ensuite la structure de l'arbre d'analyse en spécifiant l'ordre dans lequel les points de division seront sélectionnés, en partitionnant l'entrée de manière récursive et descendante. L’approche proposée obtient une performance compétitive sur le jeu de données Penn Treebank et réalise l’état de l’art sur le jeu de données Chinese Treebank. / This thesis by article consists of four articles which contribute to the field of deep learning, specifically in the acceleration of training through low-precision networks, and the application of deep neural networks on natural language processing. In the first article, we investigate a neural network training scheme that eliminates most of the floating-point multiplications. This approach consists of binarizing or ternarizing the weights in the forward propagation and quantizing the hidden states in the backward propagation, which converts multiplications to sign changes and binary shifts. Experimental results on datasets from small to medium size show that this approach result in even better performance than standard stochastic gradient descent training, paving the way to fast, hardware-friendly training of neural networks. In the second article, we proposed a structured self-attentive sentence embedding that extracts interpretable sentence representations in matrix form. We demonstrate improvements on 3 different tasks: author profiling, sentiment classification and textual entailment. Experimental results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks. In the third article, we propose a hierarchical model with dynamical computation graph for sequential data that learns to construct a tree while reading the sequence. The model learns to create adaptive skip-connections that ease the learning of long-term dependencies through constructing recurrent cells in a recursive manner. The training of the network can either be supervised training by giving golden tree structures, or through reinforcement learning. We provide preliminary experiments in 3 different tasks: a novel Math Expression Evaluation (MEE) task, a well-known propositional logic task, and language modelling tasks. Experimental results show the potential of the proposed approach. In the fourth article, we propose a novel constituency parsing method with neural networks. The model predicts the parse tree structure by predicting a real valued scalar, named syntactic distance, for each split position in the input sentence. The order of the relative values of these syntactic distances then determine the parse tree structure by specifying the order in which the split points will be selected, recursively partitioning the input, in a top-down fashion. Our proposed approach was demonstrated with competitive performance on Penn Treebank dataset, and the state-of-the-art performance on Chinese Treebank dataset. Machine Learning Natural Language Processing Deep Learning Neural Networks Syntactic Parser Constituency Parsing Recursive Networks Recurrent Networks Dynamic Computational Graph Sentiment Analysis Natural Language Inference Self-Attention Sentence Embedding Binary Connect Ternary Connect Quantized Neural Networks Apprentissage Automatique Langage Naturel Traitement Apprentissage Profond Réseaux Neuronaux Analyseur Syntaxique Réseaux Récurrents Graphe de Calcul Dynamique Analyse des Sentiments Inférence en Langage Naturel Auto-Attention Enchâssement de Phrase Connexion Binaire Connexion Ternaire Réseaux Neuronaux Quantifiés
7	Neural approaches to dialog modeling Sankar, Chinnadhurai 08 1900 (has links) Cette thèse par article se compose de quatre articles qui contribuent au domaine de l’apprentissage profond, en particulier dans la compréhension et l’apprentissage des ap- proches neuronales des systèmes de dialogue. Le premier article fait un pas vers la compréhension si les architectures de dialogue neuronal couramment utilisées capturent efficacement les informations présentes dans l’historique des conversations. Grâce à une série d’expériences de perturbation sur des ensembles de données de dialogue populaires, nous constatons que les architectures de dialogue neuronal couramment utilisées comme les modèles seq2seq récurrents et basés sur des transformateurs sont rarement sensibles à la plupart des perturbations du contexte d’entrée telles que les énoncés manquants ou réorganisés, les mots mélangés, etc. Le deuxième article propose d’améliorer la qualité de génération de réponse dans les systèmes de dialogue de domaine ouvert en modélisant conjointement les énoncés avec les attributs de dialogue de chaque énoncé. Les attributs de dialogue d’un énoncé se réfèrent à des caractéristiques ou des aspects discrets associés à un énoncé comme les actes de dialogue, le sentiment, l’émotion, l’identité du locuteur, la personnalité du locuteur, etc. Le troisième article présente un moyen simple et économique de collecter des ensembles de données à grande échelle pour modéliser des systèmes de dialogue orientés tâche. Cette approche évite l’exigence d’un schéma d’annotation d’arguments complexes. La version initiale de l’ensemble de données comprend 13 215 dialogues basés sur des tâches comprenant six domaines et environ 8 000 entités nommées uniques, presque 8 fois plus que l’ensemble de données MultiWOZ populaire. / This thesis by article consists of four articles which contribute to the ﬁeld of deep learning, speciﬁcally in understanding and learning neural approaches to dialog systems. The ﬁrst article takes a step towards understanding if commonly used neural dialog architectures eﬀectively capture the information present in the conversation history. Through a series of perturbation experiments on popular dialog datasets, weﬁndthatcommonly used neural dialog architectures like recurrent and transformer-based seq2seq models are rarely sensitive to most input context perturbations such as missing or reordering utterances, shuﬄing words, etc. The second article introduces a simple and cost-eﬀective way to collect large scale datasets for modeling task-oriented dialog systems. This approach avoids the requirement of a com-plex argument annotation schema. The initial release of the dataset includes 13,215 task-based dialogs comprising six domains and around 8k unique named entities, almost 8 times more than the popular MultiWOZ dataset. The third article proposes to improve response generation quality in open domain dialog systems by jointly modeling the utterances with the dialog attributes of each utterance. Dialog attributes of an utterance refer to discrete features or aspects associated with an utterance like dialog-acts, sentiment, emotion, speaker identity, speaker personality, etc. The ﬁnal article introduces an embedding-free method to compute word representations on-the-ﬂy. This approach signiﬁcantly reduces the memory footprint which facilitates de-ployment in on-device (memory constraints) devices. Apart from being independent of the vocabulary size, we ﬁnd this approach to be inherently resilient to common misspellings. task-oriented dialog systems dialog-acts multiwoz locality sensitive hashing self-attention recurrent networks neural networks deep learning natural language processing reinforcement learning machine learning Actes de dialogue Hachage sensible àla localité Auto-attention Inférence en langage naturel Analyse dessentiments Graphique de calcul dynamique Réseaux récurrents Réseaux récursifs Réseaux de neurones Apprentissage profond Naturel traitement du langage Apprentissage par renforcement Apprentissage automatique Dynamic computational graph Recursive networks Wizard-of-oz Natural language inference Sentiment analysis

1

Page generated in 0.0772 seconds