• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 278
  • 31
  • 25
  • 22
  • 9
  • 8
  • 5
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 432
  • 208
  • 162
  • 157
  • 150
  • 136
  • 112
  • 102
  • 92
  • 80
  • 77
  • 74
  • 73
  • 71
  • 62
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
191

Maskininlärning för dokumentklassificering av finansielladokument med fokus på fakturor / Machine Learning for Document Classification of FinancialDocuments with Focus on Invoices

Khalid Saeed, Nawar January 2022 (has links)
Automatiserad dokumentklassificering är en process eller metod som syftar till att bearbeta ochhantera dokument i digitala former. Många företag strävar efter en textklassificeringsmetodiksom kan lösa olika problem. Ett av dessa problem är att klassificera och organisera ett stort antaldokument baserat på en uppsättning av fördefinierade kategorier.Detta examensarbete syftar till att hjälpa Medius, vilket är ett företag som arbetar med fakturaarbetsflöde, att klassificera dokumenten som behandlas i deras fakturaarbetsflöde till fakturoroch icke-fakturor. Detta har åstadkommits genom att implementera och utvärdera olika klassificeringsmetoder för maskininlärning med avseende på deras noggrannhet och effektivitet för attklassificera finansiella dokument, där endast fakturor är av intresse.I denna avhandling har två dokumentrepresentationsmetoder "Term Frequency Inverse DocumentFrequency (TF-IDF) och Doc2Vec" använts för att representera dokumenten som vektorer. Representationen syftar till att minska komplexiteten i dokumenten och göra de lättare att hantera.Dessutom har tre klassificeringsmetoder använts för att automatisera dokumentklassificeringsprocessen för fakturor. Dessa metoder var Logistic Regression, Multinomial Naïve Bayes och SupportVector Machine.Resultaten från denna avhandling visade att alla klassificeringsmetoder som använde TF-IDF, föratt representera dokumenten som vektorer, gav goda resultat i from av prestanda och noggranhet.Noggrannheten för alla tre klassificeringsmetoderna var över 90%, vilket var kravet för att dennastudie skulle anses vara lyckad. Dessutom verkade Logistic Regression att ha det lättare att klassificera dokumenten jämfört med andra metoder. Ett test på riktiga data "dokument" som flödarin i Medius fakturaarbetsflöde visade att Logistic Regression lyckades att korrekt klassificeranästan 96% av dokumenten.Avslutningsvis, fastställdes Logistic Regression tillsammans med TF-IDF som de övergripandeoch mest lämpliga metoderna att klara av problmet om dokumentklassficering. Dessvärre, kundeDoc2Vec inte ge ett bra resultat p.g.a. datamängden inte var anpassad och tillräcklig för attmetoden skulle fungera bra. / Automated document classification is an essential technique that aims to process and managedocuments in digital forms. Many companies strive for a text classification methodology thatcan solve a plethora of problems. One of these problems is classifying and organizing a massiveamount of documents based on a set of predefined categories.This thesis aims to help Medius, a company that works with invoice workflow, to classify theirdocuments into invoices and non-invoices. This has been accomplished by implementing andevaluating various machine learning classification methods in terms of their accuracy and efficiencyfor the task of financial document classification, where only invoices are of interest. Furthermore,the necessary pre-processing steps for achieving good performance are considered when evaluatingthe mentioned classification methods.In this study, two document representation methods "Term Frequency Inverse Document Frequency (TF-IDF) and Doc2Vec" were used to represent the documents as fixed-length vectors.The representation aims to reduce the complexity of the documents and make them easier tohandle. In addition, three classification methods have been used to automate the document classification process for invoices. These methods were Logistic Regression, Multinomial Naïve Bayesand Support Vector Machine.The results from this thesis indicate that all classification methods used TF-IDF, to represent thedocuments as vectors, give high performance and accuracy. The accuracy of all three classificationmethods is over 90%, which is the prerequisite for the success of this study. Moreover, LogisticRegression appears to cope with this task very easily, since it classifies the documents moreefficiently compared to the other methods. A test of real data flowing into Medius’ invoiceworkflow shows that Logistic Regression is able to correctly classify up to 96% of the data.In conclusion, the Logistic Regression together with TF-IDF is determined to be the overall mostappropriate method out of the other tested methods. In addition, Doc2Vec suffers to providea good result because the data set is not customized and sufficient for the method to workwell.
192

Det inre dramat på den yttre scenen. : En studie om Dramapedagogik och Neurolingvistisk Programmering i kombination, med avsikt att bryta begränsande mönster.

Nilsson, Annacarin January 2016 (has links)
This paper is a qualitative, theory-based study with a phenomenological, hermeneutic approach and a didactic perspective. The aim of this study is to examine three educators experience of using both Drama in education (DP) and Neurolinguistic programming (NLP) in conjunction with each other, with the aim of changing limiting patterns in individuals and groups. The study examines the following research questions: 1. How does an educator describe their work, using the combination of DP and NLP, with the aim to change limiting patterns? 2. What similarities as well as differences do educators experience between the two methodologies DP and NLP? 3. Why did the educator choose this particular combination? As background there's a brief description of education from a historic point of view and the concept of DP and NLP are explained together with examples of comprehensive research in the field. Following this: learning as theory is presented from the perspective of neuroscience, DP and NLP. Finally the concept of "limiting" patterns is described in detail.Three educators, working with both methods in conjunction, were interviewed and these interviews were compiled, compared and analysed relative to the research questions and theory derived from neuroscience and brain research.The result showes that the educators who were interviewed experienced numerous benefits from using both the DP and NLP methods in conjunction with each other - thus achieving more effective results in changing limiting patterns in people's attitudes, behaviour and outlook. Whilst both methods have similarities and differences, it's noticeable that they bring very different elements to the process. DP is seen as a collective, open and interactive approach that, together with its creative educational aspect, is contributing with a dramatic frame for exploration and visualisation of a particular subject's underlying and dominant patterns. NLP – also frequently based on a dramatic frame - is focused on how these internal patterns are distributed neurologically throughout our senses. The educators believe that DP and NLP used in conjunction creates an effective collaborative formula for change of patterns. Furthermore, NLP offers insights on how the educational approach can be effectively deepened and reinforced. Finally, the study shows that the combination works equally well in both group and individual processes. / Denna studie är en kvalitativ teorinära studie med en fenomenologisk, hermeneutisk ansats och ett didaktiskt perspektiv. Dess syfte är att undersöka tre pedagogers upplevelse av Dramapedagogik (DP) och Neurolingvistisk Programmering (NLP) i kombination, med avsikt att bryta begränsande mönster. Undersökningsfrågorna är: 1. Hur beskriver en pedagog sitt arbete att bryta begränsande mönster med kombinationen DP och NLP? 2. Vad upplever pedagogerna att det finns för likheter och skillnader mellan DP och NLP? 3. Varför har pedagogen valt denna kombination? I bakgrunden redogörs kortfattat för pedagogik ur ett historiskt perspektiv, begreppet DP och NLP beskrivs samt övergripande forskning inom områdena. Därefter redogörs för lärande inom DP, NLP samt ur ett neurovetenskapligt perspektiv. Slutligen redogörs för begreppet ”begränsande mönster”. Intervjuer av tre pedagoger som arbetar med båda metoderna gjordes, sammanställdes och analyserades efter undersökningsfrågorna och teori hämtad från neurovetenskap och hjärnforskning. Resultatet visar att de intervjuade pedagogerna upplever att det finns många fördelar med att kombinera DP och NLP i syfte att bryta begränsande mönster. De har både likheter och skillnader men framför allt tillför de helt olika saker. DP ses som en kollektiv, öppen och interaktiv metod som med sin gestaltande pedagogik tillför former för utforskande och synliggörande av underliggande och styrande mönster. NLP som också många gånger utgår från gestaltande former, är mer fokuserad på hur dessa interna mönster är ordnade neurologiskt via våra sinnen. Pedagogerna anser att DP och NLP är en effektiv kombination i ett förändringsarbete. Vidare kan NLP erbjuda insikter om hur det pedagogiska förhållningssättet kan förstärkas. Slutligen visar studien att kombinationen fungerar väl på såväl grupper som på individuella processer.
193

AI i rekryteringsprocessen: En studie om användningen av AI för CV-analys / AI in the recruitment process: A study on the use of AI for CV-analysis

Al-Khamisi, Ardoan, El Khoury, Christian January 2024 (has links)
Studien undersöker vilka metoder som är mest lämpliga för rekryteringsprocesser genom att inkludera tre befintliga Artificiell intelligens (AI) verktyg samt en egenutvecklad prototyp. Tidigare studier har visat att AI kan förbättra rekryteringsprocessen genom att öka effektiviteten och minska fördomar, men också att det finns begränsningar i hur väl AI kan bedöma kandidaternas kompetenser. Målet är att bestämma de mest effektiva AI-lösningar för att matcha kvalificerade kandidater till ledande positioner. Identifierade möjligheter till förbättringar i hastighet, noggrannhet och kvalitet av rekryteringsprocessen. Fokuset för detta arbete ligger på analys av befintliga AI-lösningar parallellt med utvecklingen och testningen av en prototyp. Prototypen har designats för att hantera de brister som identifierats i de befintliga metoderna, såsom matchning av nyckelord mellan Curriculum Vitae (CV) och jobbannonsen. Denna metod har begränsningar i hur väl den kan identifiera kandidaters verkliga kompetenser och relevans för jobbet, vilket utforskas i denna studie. Resultatet från denna studie visar att AI för närvarande har en begränsad, men växande betydelse i rekryteringsprocesser. Detta pekar på en betydande potential för AI att erbjuda nya lösningar som kan leda till mer rättvisa och effektiva rekryteringsprocesser i framtiden. / The study examines which methods are most suitable for recruitment processes by including three existing artificial intelligence AI-tools as well as a custom-developed prototype. Previous studies have shown that AI can improve recruitment processes by increasing efficiency and reducing biases, but also that there are limitations in how well AI can assess candidate’s competencies. The goal is to determine the most effective AI solutions for matching qualified candidates to leading positions. Opportunities for improvement in speed, accuracy, and quality of the recruitment process have been identified. The focus of this work is on analyzing existing AI-solutions in parallel with the development and testing of a prototype. The prototype has been designed to address the deficiencies identified in existing methods, such as matching keywords between Curriculum Vitae (CV) and job advertisements. This method has limitations in how well it can identify candidate’s real competencies and relevance for the job, which is explored in this study. The results from this study show that AI currently has a limited, but growing significance in recruitment processes. This points to significant potential for AI to provide new solutions that can lead to fairer and more efficient recruitment processes in the future.
194

Stora språkmodeller för bedömning av applikationsrecensioner : Implementering och undersökning av stora språkmodeller för att sammanfatta, extrahera och analysera nyckelinformation från användarrecensioner / Large Language Models for application review data : Implementation survey of Large Language Models (LLM) to summarize, extract, and analyze key information from user reviews

von Reybekiel, Algot, Wennström, Emil January 2024 (has links)
Manuell granskning av användarrecensioner för att extrahera relevant informationkan vara en tidskrävande process. Denna rapport har undersökt om stora språkmodeller kan användas för att sammanfatta, extrahera och analysera nyckelinformation från recensioner, samt hur en sådan applikation kan konstrueras.  Det visade sig att olika modeller presterade olika bra beroende på mätvärden ochviktning mellan recall och precision. Vidare visade det sig att fine-tuning av språkmodeller som Llama 3 förbättrade prestationen vid klassifikation av användbara recensioner och ledde, enligt vissa mätvärden, till högre prestation än större språkmodeller som Chat-Bison. För engelskt översatta recensioner hade Llama 3:8b:Instruct, Chat-Bison samt den fine-tunade versionen av Llama 3:8b ett F4-makro-score på 0.89, 0.90 och 0.91 respektive. Ytterligare ett resultat är att de större modellerna Chat-Bison, Text-Bison och Gemini, presterade bättre i fallet för generering av sammanfattande texter, än de mindre modeller som testades vid inmatning av flertalet recensioner åt gången.  Generellt sett presterade språkmodellerna också bättre om recensioner först översattes till engelska innan bearbetning, snarare än då recensionerna var skrivna i originalspråk där de majoriteten av recensionerna var skrivna på svenska. En annan lärdom från förbearbetning av recensioner är att antal anrop till dessa språkmodeller kan minimeras genom att filtrera utifrån ordlängd och betyg.  Utöver språkmodeller visade resultaten att användningen av vektordatabaser och embeddings kan ge en större överblick över användbara recensioner genom vektordatabasers inbyggda förmåga att hitta semantiska likheter och samla liknande recensioner i kluster. / Manually reviewing user reviews to extract relevant information can be a time consuming process. This report investigates if large language models can be used to summarize, extract, and analyze key information from reviews, and how such anapplication can be constructed.  It was discovered that different models exhibit varying degrees of performance depending on the metrics and the weighting between recall and precision. Furthermore, fine-tuning of language models such as Llama 3 was found to improve performance in classifying useful reviews and, according to some metrics, led to higher performance than larger language models like Chat-bison. Specifically, for English translated reviews, Llama 3:8b:Instruct, Chat-bison, and Llama 3:8b fine-tuned had an F4 macro score 0.89, 0.90, 0.91 respectively. A further finding is that the larger models, Chat-Bison, Text-Bison, and Gemini performed better than the smaller models that was tested, when inputting multiple reviews at a time in the case of summary text generation.  In general, language models performed better if reviews were first translated into English before processing rather than when reviews were written in the original language where most reviews were written in Swedish. Additionally, another insight from the pre-processing phase, is that the number of API-calls to these language models can be minimized by filtering based on word length and rating. In addition to findings related to language models, the results also demonstrated that the use of vector databases and embeddings can provide a greater overview of reviews by leveraging the databases’ built-in ability to identify semantic similarities and cluster similar reviews together.
195

Dependency Syntax in the Automatic Detection of Irony and Stance

Cignarella, Alessandra Teresa 29 November 2021 (has links)
[ES] The present thesis is part of the broad panorama of studies of Natural Language Processing (NLP). In particular, it is a work of Computational Linguistics (CL) designed to study in depth the contribution of syntax in the field of sentiment analysis and, therefore, to study texts extracted from social media or, more generally, online content. Furthermore, given the recent interest of the scientific community in the Universal Dependencies (UD) project, which proposes a morphosyntactic annotation format aimed at creating a "universal" representation of the phenomena of morphology and syntax in a manifold of languages, in this work we made use of this format, thinking of a study in a multilingual perspective (Italian, English, French and Spanish). In this work we will provide an exhaustive presentation of the morphosyntactic annotation format of UD, in particular underlining the most relevant issues regarding their application to UGC. Two tasks will be presented, and used as case studies, in order to test the research hypotheses: the first case study will be in the field of automatic Irony Detection and the second in the area of Stance Detection. In both cases, historical notes will be provided that can serve as a context for the reader, an introduction to the problems faced will be outlined and the activities proposed in the computational linguistics community will be described. Furthermore, particular attention will be paid to the resources currently available as well as to those developed specifically for the study of the aforementioned phenomena. Finally, through the description of a series of experiments, both within evaluation campaigns and within independent studies, I will try to describe the contribution that syntax can provide to the resolution of such tasks. This thesis is a revised collection of my three-year PhD career and collocates within the growing trend of studies devoted to make Artificial Intelligence results more explainable, going beyond the achievement of highest scores in performing tasks, but rather making their motivations understandable and comprehensible for experts in the domain. The novel contribution of this work mainly consists in the exploitation of features that are based on morphology and dependency syntax, which were used in order to create vectorial representations of social media texts in various languages and for two different tasks. Such features have then been paired with a manifold of machine learning classifiers, with some neural networks and also with the language model BERT. Results suggest that fine-grained dependency-based syntactic information is highly informative for the detection of irony, and less informative for what concerns stance detection. Nonetheless, dependency syntax might still prove useful in the task of stance detection if firstly irony detection is considered as a preprocessing step. I also believe that the dependency syntax approach that I propose could shed some light on the explainability of a difficult pragmatic phenomenon such as irony. / [CA] La presente tesis se enmarca dentro del amplio panorama de estudios relacionados con el Procesamiento del Lenguaje Natural (NLP). En concreto, se trata de un trabajo de Lingüística Computacional (CL) cuyo objetivo principal es estudiar en profundidad la contribución de la sintaxis en el campo del análisis de sentimientos y, en concreto, aplicado a estudiar textos extraídos de las redes sociales o, más en general, de contenidos online. Además, dado el reciente interés de la comunidad científica por el proyecto Universal Dependencies (UD), en el que se propone un formato de anotación morfosintáctica destinado a crear una representación "universal" de la morfología y sintaxis aplicable a diferentes idiomas, en este trabajo se utiliza este formato con el propósito de realizar un estudio desde una perspectiva multilingüe (italiano, inglés, francés y español). En este trabajo se presenta una descripción exhaustiva del formato de anotación morfosintáctica de UD, en particular, subrayando las cuestiones más relevantes en cuanto a su aplicación a los UGC generados en las redes sociales. El objetivo final es analizar y comprobar si estas anotaciones morfosintácticas sirven para obtener información útil para los modelos de detección de la ironía y del stance o posicionamiento. Se presentarán dos tareas y se utilizarán como ejemplos de estudio para probar las hipótesis de la investigación: el primer caso se centra en el área de la detección automática de la ironía y el segundo en el área de la detección del stance o posicionamiento. En ambos casos, se proporcionan los antecendentes y trabajos relacionados notas históricas que pueden servir de contexto para el lector, se plantean los problemas encontrados y se describen las distintas actividades propuestas para resolver estos problemas en la comunidad de la lingüística computacional. Se presta especial atención a los recursos actualmente disponibles, así como a los desarrollados específicamente para el estudio de los fenómenos antes mencionados. Finalmente, a través de la descripción de una serie de experimentos, llevados a cabo tanto en campañas de evaluación como en estudios independientes, se describe la contribución que la sintaxis puede brindar a la resolución de esas tareas. Esta tesis es el resultado de toda la investigación que he llevado a cabo durante mi doctorado en una colección revisada de mi carrera de doctorado de los últimos tres años y medio, y se ubica dentro de la tendencia creciente de estudios dedicados a hacer que los resultados de la Inteligencia Artificial sean más explicables, yendo más allá del logro de puntajes más altos en la realización de tareas, sino más bien haciendo comprensibles sus motivaciones y qué los procesos sean más comprensibles para los expertos en el dominio. La contribución principal y más novedosa de este trabajo consiste en la explotación de características (o rasgos) basadas en la morfología y la sintaxis de dependencias, que se utilizaron para crear las representaciones vectoriales de textos procedentes de redes sociales en varios idiomas y para dos tareas diferentes. A continuación, estas características se han combinado con una variedad de clasificadores de aprendizaje automático, con algunas redes neuronales y también con el modelo de lenguaje BERT. Los resultados sugieren que la información sintáctica basada en dependencias utilizada es muy informativa para la detección de la ironía y menos informativa en lo que respecta a la detección del posicionamiento. No obstante, la sintaxis basada en dependencias podría resultar útil en la tarea de detección del posicionamiento si, en primer lugar, la detección de ironía se considera un paso previo al procesamiento en la detección del posicionamiento. También creo que el enfoque basado casi completamente en sintaxis de dependencias que propongo en esta tesis podría ayudar a explicar mejor un fenómeno prag / [EN] La present tesi s'emmarca dins de l'ampli panorama d'estudis relacionats amb el Processament del Llenguatge Natural (NLP). En concret, es tracta d'un treball de Lingüística Computacional (CL), l'objectiu principal del qual és estudiar en profunditat la contribució de la sintaxi en el camp de l'anàlisi de sentiments i, en concret, aplicat a l'estudi de textos extrets de les xarxes socials o, més en general, de continguts online. A més, el recent interès de la comunitat científica pel projecte Universal Dependències (UD), en el qual es proposa un format d'anotació morfosintàctica destinat a crear una representació "universal" de la morfologia i sintaxi aplicable a diferents idiomes, en aquest treball s'utilitza aquest format amb el propòsit de realitzar un estudi des d'una perspectiva multilingüe (italià, anglès, francès i espanyol). En aquest treball es presenta una descripció exhaustiva del format d'anotació morfosintàctica d'UD, en particular, posant més èmfasi en les qüestions més rellevants pel que fa a la seva aplicació als UGC generats a les xarxes socials. L'objectiu final és analitzar i comprovar si aquestes anotacions morfosintàctiques serveixen per obtenir informació útil per als sistemes de detecció de la ironia i del stance o posicionament. Es presentaran dues tasques i s'utilitzaran com a exemples d'estudi per provar les hipòtesis de la investigació: el primer cas se centra en l'àrea de la detecció automàtica de la ironia i el segon en l'àrea de la detecció del stance o posicionament. En tots dos casos es proporcionen els antecedents i treballs relacionats que poden servir de context per al lector, es plantegen els problemes trobats i es descriuen les diferents activitats proposades per resoldre aquests problemes en la comunitat de la lingüística computacional. Es fa especialment referència als recursos actualment disponibles, així com als desenvolupats específicament per a l'estudi dels fenòmens abans esmentats. Finalment, a través de la descripció d'una sèrie d'experiments, duts a terme tant en campanyes d'avaluació com en estudis independents, es descriu la contribució que la sintaxi pot oferir a la resolució d'aquestes tasques. Aquesta tesi és el resultat de tota la investigació que he dut a terme durant el meu doctorat els últims tres anys i mig, i se situa dins de la tendència creixent d'estudis dedicats a fer que els resultats de la Intel·ligència Artificial siguin més explicables, que vagin més enllà de l'assoliment de puntuacions més altes en la realització de tasques, sinó més aviat fent comprensibles les seves motivacions i què els processos siguin més comprensibles per als experts en el domini. La contribució principal i més nova d'aquest treball consisteix en l'explotació de característiques (o trets) basades en la morfologia i la sintaxi de dependències, que s'utilitzen per crear les representacions vectorials de textos procedents de xarxes socials en diversos idiomes i per a dues tasques diferents. A continuació, aquestes característiques s'han combinat amb una varietat de classificadors d'aprenentatge automàtic, amb algunes xarxes neuronals i també amb el model de llenguatge BERT. Els resultats suggereixen que la informació sintàctica utilitzada basada en dependències és molt informativa per a la detecció de la ironia i menys informativa pel que fa a la detecció del posicionament. Malgrat això, la sintaxi basada en dependències podria ser útil en la tasca de detecció del posicionament si, en primer lloc, la detecció d'ironia es considera un pas previ al processament en la detecció del posicionament. També crec que l'enfocament basat gairebé completament en sintaxi de dependències que proposo en aquesta tesi podria ajudar a explicar millor un fenomen pragmàtic tan difícil de detectar i d'interpretar com la ironia. / Cignarella, AT. (2021). Dependency Syntax in the Automatic Detection of Irony and Stance [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/177639
196

Knowledge acquisition from user reviews for interactive question answering

Konstantinova, Natalia January 2013 (has links)
Nowadays, the effective management of information is extremely important for all spheres of our lives and applications such as search engines and question answering systems help users to find the information that they need. However, even when assisted by these various applications, people sometimes struggle to find what they want. For example, when choosing a product customers can be confused by the need to consider many features before they can reach a decision. Interactive question answering (IQA) systems can help customers in this process, by answering questions about products and initiating a dialogue with the customers when their needs are not clearly defined. The focus of this thesis is how to design an interactive question answering system that will assist users in choosing a product they are looking for, in an optimal way, when a large number of similar products are available. Such an IQA system will be based on selecting a set of characteristics (also referred to as product features in this thesis), that describe the relevant product, and narrowing the search space. We believe that the order in which these characteristics are presented in terms of these IQA sessions is of high importance. Therefore, they need to be ranked in order to have a dialogue which selects the product in an efficient manner. The research question investigated in this thesis is whether product characteristics mentioned in user reviews are important for a person who is likely to purchase a product and can therefore be used when designing an IQA system. We focus our attention on products such as mobile phones; however, the proposed techniques can be adapted for other types of products if the data is available. Methods from natural language processing (NLP) fields such as coreference resolution, relation extraction and opinion mining are combined to produce various rankings of phone features. The research presented in this thesis employs two corpora which contain texts related to mobile phones specifically collected for this thesis: a corpus of Wikipedia articles about mobile phones and a corpus of mobile phone reviews published on the Epinions.com website. Parts of these corpora were manually annotated with coreference relations, mobile phone features and relations between mentions of the phone and its features. The annotation is used to develop a coreference resolution module as well as a machine learning-based relation extractor. Rule-based methods for identification of coreference chains describing the phone are designed and thoroughly evaluated against the annotated gold standard. Machine learning is used to find links between mentions of the phone (identified by coreference resolution) and phone features. It determines whether some phone feature belong to the phone mentioned in the same sentence or not. In order to find the best rankings, this thesis investigates several settings. One of the hypotheses tested here is that the relatively low results of the proposed baseline are caused by noise introduced by sentences which are not directly related to the phone and phone feature. To test this hypothesis, only sentences which contained mentions of the mobile phone and a phone feature linked to it were processed to produce rankings of the phones features. Selection of the relevant sentences is based on the results of coreference resolution and relation extraction. Another hypothesis is that opinionated sentences are a good source for ranking the phone features. In order to investigate this, a sentiment classification system is also employed to distinguish between features mentioned in positive and negative contexts. The detailed evaluation and error analysis of the methods proposed form an important part of this research and ensure that the results provided in this thesis are reliable.
197

User Modeling in Social Media: Gender and Age Detection

Daneshvar, Saman 21 August 2019 (has links)
Author profiling is a field within Natural Language Processing (NLP) that is concerned with identifying various characteristics and demographic factors of authors, such as gender, age, location, native language, political orientation, and personality by analyzing the style and content of their writings. There is a growing interest in author profiling, with applications in marketing and advertising, opinion mining, personalization, recommendation systems, forensics, security, and defense. In this work, we build several classification models using NLP, Deep Learning, and classical Machine Learning techniques that can identify the gender and age of a Twitter user based on the textual contents of their correspondence (tweets) on the platform. Our SVM gender classifier utilizes a combination of word and character n-grams as features, dimensionality reduction using Latent Semantic Analysis (LSA), and a Support Vector Machine (SVM) classifier with linear kernel. At the PAN 2018 author profiling shared task, this model achieved the highest performance with 82.21%, 82.00%, and 80.90% accuracy on the English, Spanish, and Arabic datasets, respectively. Our age classifier was trained on a dataset of 11,160 Twitter users, using the same approach, though the age classification experiments are preliminary. Our Deep Learning gender classifiers are trained and tested on English datasets. Our feedforward neural network consisting of a word embedding layer, flattening, and two densely-connected layers achieves 79.57% accuracy, and our bidirectional Long Short-Term Memory (LSTM) neural network achieves 76.85% accuracy on the gender classification task.
198

The effect of noise in the training of convolutional neural networks for text summarisation

Meechan-Maddon, Ailsa January 2019 (has links)
In this thesis, we work towards bridging the gap between two distinct areas: noisy text handling and text summarisation. The overall goal of the paper is to examine the effects of noise in the training of convolutional neural networks for text summarisation, with a view to understanding how to effectively create a noise-robust text-summarisation system. We look specifically at the problem of abstractive text summarisation of noisy data in the context of summarising error-containing documents from automatic speech recognition (ASR) output. We experiment with adding varying levels of noise (errors) to the 4 million-article Gigaword corpus and training an encoder-decoder CNN on it with the aim of producing a noise-robust text summarisation system. A total of six text summarisation models are trained, each with a different level of noise. We discover that the models with a high level of noise are indeed able to aptly summarise noisy data into clean summaries, despite a tendency for all models to overfit to the level of noise on which they were trained. Directions are given for future steps in order to create an even more noise-robust and flexible text summarisation system.
199

O discurso terapêutico de Milton Erickson: uma análise à luz dos padrões da programação neurolinguística

Azevedo, Regina Maria 19 June 2012 (has links)
Este estudo apresenta o trabalho do psicanalista e hipnoterapeuta americano Milton Hyland Erickson a partir de seus dados biográficos e de sua relevância para a chamada terapia estratégica, propondo, em consonância com sua experiência profissional, uma nova epistemologia para a mudança; propõe ainda uma comparação entre a trajetória de Freud e a de Erickson em relação à hipnose, bem como um apanhado histórico sobre essa técnica. Com base nessa recuperação teórica, os padrões ericksonianos de linguagem são investigados à luz do Metamodelo e do Modelo Milton, criações de Richard Bandler e John Grinder, tomando por base alguns conceitos da Programação Neurolinguística (PNL) tais como sistemas representacionais, filtros, modelagem, espelhamento e rapport. Empreende-se uma análise do discurso ericksoniano a partir de três casos selecionados dentre seus atendimentos clínicos, evidenciando os padrões de linguagem apresentados nas categorias e subcategorias do Metamodelo e do Modelo Milton, com o objetivo de validá-los tanto teórica quanto empiricamente / This study aimed at scrutinizing Milton Hyland Ericksons theoretical framework and therapeutic methodology with a view to the understanding of their relevance to the so-called strategic therapy. That aim was carried out through analyses of issues such as Ericksons professional experience, his shared points and differences with Freudian hypnosis and particularly his patterns of discourse as well as his very clinical technique. Those analyses were accomplished under the guidance of the Metamodel and the Milton Model as proposed by Richard Bandler and John Grinder within the references and concepts comprised in the Neuro-Linguistic Programming such as representational systems, filters, modeling, mirroring and rapport. In order to ground and illustrate the theoretical analyses, this work was enriched by the scrutiny of three Ericksonian clinical cases. This strategy proved to be effective since it has provided evidences about both Ericksons language patterns and empirical data for the observation of the categories and subcategories of the Metamodel and Miltons Model, as a kind of a quest for validation of his theory and methods. The results put into light Ericksons understanding of the therapeutic work
200

Automatic Reconstruction of Itineraries from Descriptive Texts / Reconstruction automatique d’itinéraires à partir de textes descriptifs

Moncla, Ludovic 03 December 2015 (has links)
Cette thèse s'inscrit dans le cadre du projet PERDIDO dont les objectifs sont l'extraction et la reconstruction d'itinéraires à partir de documents textuels. Ces travaux ont été réalisés en collaboration entre le laboratoire LIUPPA de l'université de Pau et des Pays de l'Adour (France), l'équipe IAAA de l'université de Saragosse (Espagne) et le laboratoire COGIT de l'IGN (France). Les objectifs de cette thèse sont de concevoir un système automatique permettant d'extraire, dans des récits de voyages ou des descriptions d’itinéraires, des déplacements, puis de les représenter sur une carte. Nous proposons une approche automatique pour la représentation d'un itinéraire décrit en langage naturel. Notre approche est composée de deux tâches principales. La première tâche a pour rôle d'identifier et d'extraire les informations qui décrivent l'itinéraire dans le texte, comme par exemple les entités nommées de lieux et les expressions de déplacement ou de perception. La seconde tâche a pour objectif la reconstruction de l'itinéraire. Notre proposition combine l'utilisation d'information extraites grâce au traitement automatique du langage ainsi que des données extraites de ressources géographiques externes (comme des gazetiers). L'étape d'annotation d'informations spatiales est réalisée par une approche qui combine l'étiquetage morpho-syntaxique et des patrons lexico-syntaxiques (cascade de transducteurs) afin d'annoter des entités nommées spatiales et des expressions de déplacement ou de perception. Une première contribution au sein de la première tâche est la désambiguïsation des toponymes, qui est un problème encore mal résolu en NER et essentiel en recherche d'information géographique. Nous proposons un algorithme non-supervisé de géo-référencement basé sur une technique de clustering capable de proposer une solution pour désambiguïser les toponymes trouvés dans les ressources géographiques externes, et dans le même temps proposer une estimation de la localisation des toponymes non référencés. Nous proposons un modèle de graphe générique pour la reconstruction automatique d'itinéraires, où chaque noeud représente un lieu et chaque segment représente un chemin reliant deux lieux. L'originalité de notre modèle est qu'en plus de tenir compte des éléments habituels (chemins et points de passage), il permet de représenter les autres éléments impliqués dans la description d'un itinéraire, comme par exemple les points de repères visuels. Un calcul d'arbre de recouvrement minimal à partir d'un graphe pondéré est utilisé pour obtenir automatiquement un itinéraire sous la forme d'un graphe. Chaque segment du graphe initial est pondéré en utilisant une méthode d'analyse multi-critère combinant des critères qualitatifs et des critères quantitatifs. La valeur des critères est déterminée à partir d'informations extraites du texte et d'informations provenant de ressources géographique externes. Par exemple, nous combinons les informations issues du traitement automatique de la langue comme les relations spatiales décrivant une orientation (ex: se diriger vers le sud) avec les coordonnées géographiques des lieux trouvés dans les ressources pour déterminer la valeur du critère "relation spatiale". De plus, à partir de la définition du concept d'itinéraire et des informations utilisées dans la langue pour décrire un itinéraire, nous avons modélisé un langage d'annotation d'information spatiale adapté à la description de déplacements, s'appuyant sur les recommendations du consortium TEI (Text Encoding and Interchange). Enfin, nous avons implémenté et évalué les différentes étapes de notre approche sur un corpus multilingue de descriptions de randonnées (Français, Espagnol et Italien). / This PhD thesis is part of the research project PERDIDO, which aims at extracting and retrieving displacements from textual documents. This work was conducted in collaboration with the LIUPPA laboratory of the university of Pau (France), the IAAA team of the university of Zaragoza (Spain) and the COGIT laboratory of IGN (France). The objective of this PhD is to propose a method for establishing a processing chain to support the geoparsing and geocoding of text documents describing events strongly linked with space. We propose an approach for the automatic geocoding of itineraries described in natural language. Our proposal is divided into two main tasks. The first task aims at identifying and extracting information describing the itinerary in texts such as spatial named entities and expressions of displacement or perception. The second task deal with the reconstruction of the itinerary. Our proposal combines local information extracted using natural language processing and physical features extracted from external geographical sources such as gazetteers or datasets providing digital elevation models. The geoparsing part is a Natural Language Processing approach which combines the use of part of speech and syntactico-semantic combined patterns (cascade of transducers) for the annotation of spatial named entities and expressions of displacement or perception. The main contribution in the first task of our approach is the toponym disambiguation which represents an important issue in Geographical Information Retrieval (GIR). We propose an unsupervised geocoding algorithm that takes profit of clustering techniques to provide a solution for disambiguating the toponyms found in gazetteers, and at the same time estimating the spatial footprint of those other fine-grain toponyms not found in gazetteers. We propose a generic graph-based model for the automatic reconstruction of itineraries from texts, where each vertex represents a location and each edge represents a path between locations. %, combining information extracted from texts and information extracted from geographical databases. Our model is original in that in addition to taking into account the classic elements (paths and waypoints), it allows to represent the other elements describing an itinerary, such as features seen or mentioned as landmarks. To build automatically this graph-based representation of the itinerary, our approach computes an informed spanning tree on a weighted graph. Each edge of the initial graph is weighted using a multi-criteria analysis approach combining qualitative and quantitative criteria. Criteria are based on information extracted from the text and information extracted from geographical sources. For instance, we compare information given in the text such as spatial relations describing orientation (e.g., going south) with the geographical coordinates of locations found in gazetteers. Finally, according to the definition of an itinerary and the information used in natural language to describe itineraries, we propose a markup langugage for encoding spatial and motion information based on the Text Encoding and Interchange guidelines (TEI) which defines a standard for the representation of texts in digital form. Additionally, the rationale of the proposed approach has been verified with a set of experiments on a corpus of multilingual hiking descriptions (French, Spanish and Italian).

Page generated in 0.061 seconds