Global ETD Search

171	Le repérage automatique des entités nommées dans la langue arabe : vers la création d'un système à base de règles Zaghouani, Wajdi January 2009 (has links) Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal Extraction d'information Fouille de textes Extraction des entités nommées Noms propres Langue arabe Traitement automatique de la langue Système à base de règles Constitution de corpus Évaluation Information extraction Text mining Named entity extraction Proper names Arabic language Natural Language Processing (NLP) Rule-based system Corpus development Evaluation
172	Analýza a získávání informací ze souboru dokumentů spojených do jednoho celku / Analysis and Data Extraction from a Set of Documents Merged Together Jarolím, Jordán January 2018 (has links) This thesis deals with mining of relevant information from documents and automatic splitting of multiple documents merged together. Moreover, it describes the design and implementation of software for data mining from documents and for automatic splitting of multiple documents. Methods for acquiring textual data from scanned documents, named entity recognition, document clustering, their supportive algorithms and metrics for automatic splitting of documents are described in this thesis. Furthermore, an algorithm of implemented software is explained and tools and techniques used by this software are described. Lastly, the success rate of the implemented software is evaluated. In conclusion, possible extensions and further development of this thesis are discussed at the end.
173	DS-Fake : a data stream mining approach for fake news detection Mputu Boleilanga, Henri-Cedric 08 1900 (has links) L’avènement d’internet suivi des réseaux sociaux a permis un accès facile et une diffusion rapide de l’information par toute personne disposant d’une connexion internet. L’une des conséquences néfastes de cela est la propagation de fausses informations appelées «fake news». Les fake news représentent aujourd’hui un enjeu majeur au regard de ces conséquences. De nombreuses personnes affirment encore aujourd’hui que sans la diffusion massive de fake news sur Hillary Clinton lors de la campagne présidentielle de 2016, Donald Trump n’aurait peut-être pas été le vainqueur de cette élection. Le sujet de ce mémoire concerne donc la détection automatique des fake news. De nos jours, il existe un grand nombre de travaux à ce sujet. La majorité des approches présentées se basent soit sur l’exploitation du contenu du texte d’entrée, soit sur le contexte social du texte ou encore sur un mélange entre ces deux types d’approches. Néanmoins, il existe très peu d’outils ou de systèmes efficaces qui détecte une fausse information dans la vie réelle, tout en incluant l’évolution de l’information au cours du temps. De plus, il y a un manque criant de systèmes conçues dans le but d’aider les utilisateurs des réseaux sociaux à adopter un comportement qui leur permettrait de détecter les fausses nouvelles. Afin d’atténuer ce problème, nous proposons un système appelé DS-Fake. À notre connaissance, ce système est le premier à inclure l’exploration de flux de données. Un flux de données est une séquence infinie et dénombrable d’éléments et est utilisée pour représenter des données rendues disponibles au fil du temps. DS-Fake explore à la fois l’entrée et le contenu d’un flux de données. L’entrée est une publication sur Twitter donnée au système afin qu’il puisse déterminer si le tweet est digne de confiance. Le flux de données est extrait à l’aide de techniques d’extraction du contenu de sites Web. Le contenu reçu par ce flux est lié à l’entrée en termes de sujets ou d’entités nommées mentionnées dans le texte d’entrée. DS-Fake aide également les utilisateurs à développer de bons réflexes face à toute information qui se propage sur les réseaux sociaux. DS-Fake attribue un score de crédibilité aux utilisateurs des réseaux sociaux. Ce score décrit la probabilité qu’un utilisateur puisse publier de fausses informations. La plupart des systèmes utilisent des caractéristiques comme le nombre de followers, la localisation, l’emploi, etc. Seuls quelques systèmes utilisent l’historique des publications précédentes d’un utilisateur afin d’attribuer un score. Pour déterminer ce score, la majorité des systèmes utilisent la moyenne. DS-Fake renvoie un pourcentage de confiance qui détermine la probabilité que l’entrée soit fiable. Contrairement au petit nombre de systèmes qui utilisent l’historique des publications en ne prenant pas en compte que les tweets précédents d’un utilisateur, DS-Fake calcule le score de crédibilité sur la base des tweets précédents de tous les utilisateurs. Nous avons renommé le score de crédibilité par score de légitimité. Ce dernier est basé sur la technique de la moyenne Bayésienne. Cette façon de calculer le score permet d’atténuer l’impact des résultats des publications précédentes en fonction du nombre de publications dans l’historique. Un utilisateur donné ayant un plus grand nombre de tweets dans son historique qu’un autre utilisateur, même si les tweets des deux sont tous vrais, le premier utilisateur est plus crédible que le second. Son score de légitimité sera donc plus élevé. À notre connaissance, ce travail est le premier qui utilise la moyenne Bayésienne basée sur l’historique de tweets de toutes les sources pour attribuer un score à chaque source. De plus, les modules de DS-Fake ont la capacité d’encapsuler le résultat de deux tâches, à savoir la similarité de texte et l’inférence en langage naturel hl(en anglais Natural Language Inference). Ce type de modèle qui combine ces deux tâches de TAL est également nouveau pour la problématique de la détection des fake news. DS-Fake surpasse en termes de performance toutes les approches de l’état de l’art qui ont utilisé FakeNewsNet et qui se sont basées sur diverses métriques. Il y a très peu d’ensembles de données complets avec une variété d’attributs, ce qui constitue un des défis de la recherche sur les fausses nouvelles. Shu et al. ont introduit en 2018 l’ensemble de données FakeNewsNet pour résoudre ce problème. Le score de légitimité et les tweets récupérés ajoutent des attributs à l’ensemble de données FakeNewsNet. / The advent of the internet, followed by online social networks, has allowed easy access and rapid propagation of information by anyone with an internet connection. One of the harmful consequences of this is the spread of false information, which is well-known by the term "fake news". Fake news represent a major challenge due to their consequences. Some people still affirm that without the massive spread of fake news about Hillary Clinton during the 2016 presidential campaign, Donald Trump would not have been the winner of the 2016 United States presidential election. The subject of this thesis concerns the automatic detection of fake news. Nowadays, there is a lot of research on this subject. The vast majority of the approaches presented in these works are based either on the exploitation of the input text content or the social context of the text or even on a mixture of these two types of approaches. Nevertheless, there are only a few practical tools or systems that detect false information in real life, and that includes the evolution of information over time. Moreover, no system yet offers an explanation to help social network users adopt a behaviour that will allow them to detect fake news. In order to mitigate this problem, we propose a system called DS-Fake. To the best of our knowledge, this system is the first to include data stream mining. A data stream is a sequence of elements used to represent data elements over time. This system explores both the input and the contents of a data stream. The input is a post on Twitter given to the system that determines if the tweet can be trusted. The data stream is extracted using web scraping techniques. The content received by this flow is related to the input in terms of topics or named entities mentioned in the input text. This system also helps users develop good reflexes when faced with any information that spreads on social networks. DS-Fake assigns a credibility score to users of social networks. This score describes how likely a user can publish false information. Most of the systems use features like the number of followers, the localization, the job title, etc. Only a few systems use the history of a user’s previous publications to assign a score. To determine this score, most systems use the average. DS-Fake returns a percentage of confidence that determines how likely the input is reliable. Unlike the small number of systems that use the publication history by taking into account only the previous tweets of a user, DS-Fake calculates the credibility score based on the previous tweets of all users. We renamed the credibility score legitimacy score. The latter is based on the Bayesian averaging technique. This way of calculating the score allows attenuating the impact of the results from previous posts according to the number of posts in the history. A user who has more tweets in his history than another user, even if the tweets of both are all true, the first user is more credible than the second. His legitimacy score will therefore be higher. To our knowledge, this work is the first that uses the Bayesian average based on the post history of all sources to assign a score to each source. DS-Fake modules have the ability to encapsulate the output of two tasks, namely text similarity and natural language inference. This type of model that combines these two NLP tasks is also new for the problem of fake news detection. There are very few complete datasets with a variety of attributes, which is one of the challenges of fake news research. Shu et al. introduce in 2018 the FakeNewsNet dataset to tackle this issue. Our work uses and enriches this dataset. The legitimacy score and the retrieved tweets from named entities mentioned in the input texts add features to the FakeNewsNet dataset. DS-Fake outperforms all state-of-the-art approaches that have used FakeNewsNet and that are based on various metrics. Détection de fausses nouvelles Exploration de flux de données IA explicable score de légitimité Traitement Automatique du Langage Inférence du langage naturel Similarité de texte Reconnaissance d’entité nommée Réseaux de neurones Fake news detection Data stream mining Explainable AI Legitimacy score Natural Language Processing Natural Language Inference Text similarity Named Entity Recognition Neural Networks
174	Analysis of the main elements of the International Court of Justice Judgment in the maritime dispute (Peru v. Chile) in the light of the parties positions / Análisis de los principales elementos de la sentencia de la Corte Internacional de Justicia en el caso de la controversia marítima (Perú c. Chile) a la luz de las posiciones de las partes Moscoso de la Cuba, Pablo 10 April 2018 (has links) On January 27, 2014 the International Court of Justice, principal judicial organ of the United Nations ruled in the case of the maritime dispute (Peru v. Chile), being Peru the one that brought forth the case in January 2008. During the proceedings in Court, the parties presented fundamentally different positions on the existence of a maritime boundary between them and how the Court should proceed solving the dispute. The Court should have considered the multiple legal reasonings presented by the States parties over the years to arrive to its ruling. Particularly, some of the legal reasonings presented by Peru were accepted by the Court and considered in the ruling, beginning from the interpretation given to the proclamations of Peru and Chile in 1947, going through the reasonings Peru presented about the 1952 Santiago Declaration (It was the main topic presented by Chile, which was discarded by the Court) until the reasoning presented by Peru saying that the 1954 Special Maritime Frontier Zone Agreement didn’t create a zone of tolerance that extends to 200 nautical miles. However, the Court considered that in the 1954 agreement the parties accepted the existence of a tacit agreement, but this existence was not presented by them in the Court even though it has a legal support in the jurisprudence of the International Court of Justice. Then, the Court had to determine the extent of the tacit agreement, a very difficult duty because the parties hadn’t considered the existence of that situation and its extension. After establishing the implied legal agreement was for 80 nautical miles along a parallel of latitude, the Court proceeded to establish a maritime boundary applying thoroughly the rules and principles of maritime delimitation presented by Peru, which applied to the case determine the presence of an equidistant line. In relation to the starting-point of the maritime boundary, the Court didn’t use the point presented by Peru but, in a correct way, made it clear that the starting-point of the maritime boundary and the starting-point of the land boundary don’t have to match necessarily. Finally, the way how the Court established the maritime boundary recognizes, with no doubt, that the area previously named “outer triangle” belongs to Peru, as this country claimed and as Chile opposed repeatedly over the years. In summary, it is a decision based on International Law and adopted under the evidence presented in Court. The Court applied and confirmed various legal arguments presented by Peru during the process, in spite of the opposing position of Chile. / El 27 de enero de 2014, la Corte Internacional de Justicia (CIJ), órgano judicial principal de la organización de las Naciones Unidas, dio su sentencia en el caso de la controversia marítima (Perú c. Chile), el cual el Perú presentó ante ella en enero de 2008. Durante el proceso ante la Corte, las partes presentaron posiciones fundamentalmente distintas sobre la existencia de un límite marítimo entre ellas y sobre cómo la Corte debía proceder para resolver este caso. Para llegar a su fallo, la Corte debió evaluar esos múltiples argumentos legales planteados por ambos Estados a lo largo de años. En particular, varios de los argumentos legales planteados por el Perú fueron aceptados por la Corte y acogidos en el fallo, desde la interpretación que dio a las proclamaciones de Perú y Chile de 1947, pasando por los argumentos que planteó el Perú sobre la Declaración de Santiago de 1952 (que había sido el núcleo del caso argumentado por Chile, el cual fue descartado por la Corte), hasta el argumento peruano en el sentido de que el Convenio sobre Zona Especial Fronteriza Marítima de 1954 no creó una zona de tolerancia que se extienda por doscientas millas marinas. Sin embargo, la Corte consideró que en ese tratado de 1954 las partes reconocieron la existencia de un acuerdo tácito, figura que no argumentaron las partes ante la Corte, pero que tiene su fundamentación legal en jurisprudencia previa de la CIJ. La Corte luego tuvo que determinar la extensión de ese acuerdo legal tácito, labor sumamente difícil ya que las partes no habían contemplado la existencia de esa figura ni argumentado hasta dónde se habría extendido la misma. Luego de establecer que el acuerdo legal tácito se extendía por ochenta millas marinas a lo largo de un paralelo de latitud, la Corte procedió a establecer un límite marítimo siguiendo exactamente las normas y principios sobre delimitación marítima planteados por el Perú, los cuales aplicados al caso determinan el establecimiento de una línea equidistante. Con relación al punto de inicio del límite marítimo, la Corte no empleó el punto planteado por el Perú pero, correctamente, dejó en claro que el punto de inicio del límite marítimo y el punto de inicio del límite terrestre no tienen necesariamente que coincidir. Finalmente, la manera como la Corte estableció el límite marítimo reconoce sin lugar a duda que el área antes llamada «triángulo exterior» corresponde exclusivamente al Perú, como ese Estado argumentó y Chile se opuso repetidas veces a lo largo de los años. En resumen, se trata de una decisión ajustada al derecho internacional y tomada sobre la base de la evidencia a disposición de la Corte, en la que esta emplea y confirma diversos de los argumentos legales planteados por el Perú durante el proceso, a pesar de todo lo que Chile argumentó contrariamente. Law International Court Of Justice Maritime Delimitation Maritime Boundary Maritime Border Supreme Decree 781 Of 1947 1952 Santiago Declaration Punto Concordia Point Named Concordia Boundary Marker Number 1 Starting Point Of The Maritime Boundary Corte Internacional De Justicia Delimitación Marítima Límite Marítimo Frontera Marítima Decreto Supremo 781 De 1947 Declaración De Santiago De 1952 Punto Concordia Hito Número Uno Punto De Inicio Del Límite Marítimo
175	Komponent pro sémantické obohacení / Semantic Enrichment Component Doležal, Jan January 2018 (has links) This master's thesis describes Semantic Enrichment Component (SEC), that searches entities (e.g., persons or places) in the input text document and returns information about them. The goals of this component are to create a single interface for named entity recognition tools, to enable parallel document processing, to save memory while using the knowledge base, and to speed up access to its content. To achieve these goals, the output of the named entity recognition tools in the text was specified, the tool for storing the preprocessed knowledge base into the shared memory was implemented, and the client-server scheme was used to create the component.

Page generated in 0.026 seconds