Spelling suggestions: "subject:"[een] NLP"" "subject:"[enn] NLP""
71 |
NLP Based Automated Screening Tools for Alzheimer’s DiseaseErséus, August, Strömfelt, Ted January 2022 (has links)
Severely life-impairing and often lethal dementia illnesses such as Alzheimer’s disease are of the greatest medical interest. And while a cure might yet be years in the future, there are immense benefits to gain from detecting disease debut as early as possible, from both an individual and a societal perspective. In this study we explore new approaches to Alzheimer’s screening, utilizing the recent technology leaps within natural language processing and automated speech recognition. We propose a digital, mobile application based platform for psychometric data collection that can be used by patients and research participants in a non-clinical environment. In particular, we implement automated versions of two well-recognized psychometric tests for Alzheimer’s screening: the Verbal Learning Test and the Story Recall Task. We perform a qualitative evaluation of results from 46 sessions of these tests, as well as a semi-structured interview with a clinician, and find automated psychometric tools promising for future endeavors within Alzheimer’s screening, but that the method has inherent difficulties that needs to be counteracted. We also discuss the potential economic upsides with automating parts of the screening and diagnosis processes for dementia related diseases, and conclude that there are massive savings to make – up to 600 million SEK yearly in Sweden alone. / Kraftigt livshämmande och ofta dödliga demenssjukdomar som Alzheimers är av stort medicinskt intresse. Och medan ett botemedel ännu kan vara långt borta finns det stora fördelar att dra från tidigare upptäckande av sjukdomens debut, ur både ett individuellt och ett samhälleligt perspektiv. I den här studien utforskar vi nya tillvägagångssätt för screening av Alzheimers och drar nytta av nya framsteg inom natural language processing och automated speech recognition. Vi föreslår en digital, mobilapplikations-baserad plattform för psykometrisk datainsamling, som kan användas av patienter och forskningsdeltagare i en icke-klinisk miljö. Rent konkret implementerar vi automatiserade versioner av två vedertagna psykometriska tester för Alzheimers-screening: Verbal Learning Test och Story Recall Task. Vi utför en kvalitativ evaluering av resultaten från 46 sessioner av dessa tester samt en semistrukturerad intervju med en kliniker, och finner att automatiserade psykometriska verktyg är lovande för framtida ansträngningar inom Alzheimers-screening, men att metoden har inneboende svårigheter som måste motarbetas. Vi diskuterar även de potentiella ekonomiska fördelarna med att automatisera delar av screening- och diagnosticeringsprocesserna för demensrelaterade sjukdomar, och kommer fram till att det finns massiva besparingar att göra – uppåt 600 miljoner kronor årligen bara i Sverige.
|
72 |
Multi-label Classification and Sentiment Analysis on Textual RecordsGuo, Xintong January 2019 (has links)
In this thesis we have present effective approaches for two classic Nature Language Processing tasks: Multi-label Text Classification(MLTC) and Sentiment Analysis(SA) based on two datasets.
For MLTC, a robust deep learning approach based on convolution neural network(CNN) has been introduced. We have done this on almost one million records with a related label list consists of 20 labels. We have divided our data set into three parts, training set, validation set and test set. Our CNN based model achieved great result measured in F1 score. For SA, data set was more informative and well-structured compared with MLTC. A traditional word embedding method, Word2Vec was used for generating word vector of each text records. Following that, we employed several classic deep learning models such as Bi-LSTM, RCNN, Attention mechanism and CNN to extract sentiment features. In the next step, a classification frame was designed to graded. At last, the start-of-art language model, BERT which use transfer learning method was employed.
In conclusion, we compared performance of RNN-based model, CNN-based model and pre-trained language model on classification task and discuss their applicability. / Thesis / Master of Science in Electrical and Computer Engineering (MSECE) / This theis purposed two deep learning solution to both multi-label classification problem and sentiment analysis problem.
|
73 |
Closing the Gap : Automatic Distractor Generation in Japanese Language TestingAndersson, Tim January 2024 (has links)
Recent advances in natural language processing have increased interestin automatic question generation, particularly in education (e. g.math, biology, law, medicine, and languages) due to its efficiency inassessing comprehension. Specifically, multiple-choice questions havebecome popular, especially in standardized language proficiency tests.However, manually creating high-quality tests is time-consuming andchallenging. Distractor generation, a critical aspect of multiple-choicequestion creation, is often overlooked, yet it plays a crucial role in testquality. Generating appropriate distractors requires ensuring they areincorrect but related to the correct answer (semantically or contextually),are grammatically correct, and of similar length to the targetword. While various languages have seen research in automatic distractorgeneration, Japanese has received limited attention. This Master’sThesis addresses this gap by automatically generating cloze tests,including distractors, for Japanese language proficiency tests, evaluatingthe generated questions’ quality, difficulty, and preferred distractortypes, and comparing them to human-made questions throughautomatic and manual evaluations.
|
74 |
Root Cause Prediction from Log Data using Large Language ModelsMandakath Gopinath, Aswath January 2024 (has links)
In manufacturing, uptime and system reliability are paramount, placing high demands on automation technologies such as robotic systems. Failures in these systems cause considerable disruptions and incur significant costs. Traditional troubleshooting methods require extensive manual analysis by experts of log files, system data, application information, and problem descriptions. This process is labor-intensive and time-consuming, often resulting in prolonged downtimes and increased customer dissatisfaction, leading to heavy financial losses for companies. This research explores the application of Large Language Models (LLMs) like MistralLite and Mixtral-8*7B to automate root cause prediction from log data. We employed various fine-tuning methods, including full fine-tuning, Low-Rank Adaptation (LoRA), and Quantized Low Rank Adaptation (QLoRA), on these decoder-only models. Beyond using perplexity as an evaluation metric, the study incorporates GPT-4 as-a-judge to assess model performance. Additionally, the research uses complex prompting techniques to aid in the extraction of root causes from problem descriptions using GPT-4 and utilizes vector embeddings to analyze the importance of features in root cause prediction. The findings demonstrate that LLMs, when fine-tuned, can assist in identifying root causes from log data, with the smaller MistralLite model showing superior performance compared to the larger Mixtral model, challenging the notion that larger models are inherently better. The results also indicate that different training adaptations yield varied effectiveness, with QLoRA adaptation performing best for MistralLite and full fine-tuning proving most effective for Mixtral. This suggests that a tailored approach to model adaptation is necessary for optimal performance. Additionally, employing GPT-4 with Chain of Thought (CoT) prompting has demonstrated the capability to extract reasonable root causes from solved issues using this technique. The analysis of feature vector embeddings provides insights into the significant features, enhancing our understanding of the underlying patterns and relationships in the data.
|
75 |
En utvärdering av tjänster för taligenkänning och textsammanfattning och möjligheter att skapa undertexter i filmer. / An evaluation of services for speech recognition and text summarizationand the ability to create subtitles in movies.Kjerrström, Linus, Pham Huy, Hoang January 2022 (has links)
Att skapa undertexter till filmer är idag ett hantverk som är en tidskrävande process. Företaget Firstlight Media textar cirka 200 filmer per vecka helt manuellt och var av en film tar cirka 4–6 timmar att färdigställa. Skulle man kunna automatisera delar av processen för att undertexta filmer finns det möjlighet att spara resurser. Arbetet gick ut på att utvärdera om det är möjligt att automatisera vissa delar i processen för att undertexta filmer. För att undersöka detta gjordes en litteraturstudie på tidigare arbeten som gjorts inom områdena för automatisk taligenkänning och textsammanfattning. Efter studien testades ett antal tjänster för både taligenkänning och textsammanfattning på tre olika filmer för att utvärdera ifall tjänsterna anses lämpliga att använda vid undertextning av filmer. Testandet av tjänsterna ledde till en analys av resultaten som visade att textsammanfattning ej var lämpligt dock var taligenkänning till viss del användbart för att automatisera transkribering av det talade språket i filmerna. / Creating subtitles for movies is today a handcraft that is a time-consuming process. The company Firstlight Media creates subtitles for around 200 movies per week manuelly where each movie usually takes around 4 – 6 hours to finish. If steps in the subtitling process could be automated, then there is the possibilty of saving resources. The work consisted of evaluating whether it is possible to automate parts of the process for subtitling movies. To analyze this, a literature study was done on previous work done in the areas of automatic speech recognition and text summary. After the study, a few services for both speech recognition and text summarizers were tested on three different movies to evaluate whether the services are considered suitable to use while subtitling movies. The testing of the services led to an analysis of the results which showed that text summarizer was not suitable, however, speech recognition was to some extent useful for automating the transcription of the spoken language in the movies.
|
76 |
Machine Learning Based Sentiment Classification of Text, with Application to Equity Research Reports / Maskininlärningsbaserad sentimentklassificering av text, med tillämpning på aktieanalysrapporteBlomkvist, Oscar January 2019 (has links)
In this thesis, we analyse the sentiment in equity research reports written by analysts at Skandinaviska Enskilda Banken (SEB). We provide a description of established statistical and machine learning methods for classifying the sentiment in text documents as positive or negative. Specifically, a form of recurrent neural network known as long short-term memory (LSTM) is of interest. We investigate two different labelling regimes for generating training data from the reports. Benchmark classification accuracies are obtained using logistic regression models. Finally, two different word embedding models and bidirectional LSTMs of varying network size are implemented and compared to the benchmark results. We find that the logistic regression works well for one of the labelling approaches, and that the best LSTM models outperform it slightly. / I denna rapport analyserar vi sentimentet, eller attityden, i aktieanalysrapporter skrivna av analytiker på Skandinaviska Enskilda Banken (SEB). Etablerade statistiska metoder och maskininlärningsmetoder för klassificering av sentimentet i textdokument som antingen positivt eller negativt presenteras. Vi är speciellt intresserade av en typ av rekurrent neuronnät känt som long short-term memory (LSTM). Vidare undersöker vi två olika scheman för att märka upp träningsdatan som genereras från rapporterna. Riktmärken för klassificeringsgraden erhålls med hjälp av logistisk regression. Slutligen implementeras två olika ordrepresentationsmodeller och dubbelriktad LSTM av varierande nätverksstorlek, och jämförs med riktmärkena. Vi finner att logistisk regression presterar bra för ett av märkningsschemana, och att LSTM har något bättre prestanda.
|
77 |
French AXA Insurance Word Embeddings : Effects of Fine-tuning BERT and Camembert on AXA France’s dataZouari, Hend January 2020 (has links)
We explore in this study the different Natural Language Processing state-of-the art technologies that allow transforming textual data into numerical representation. We go through the theory of the existing traditional methods as well as the most recent ones. This thesis focuses on the recent advances in Natural Language processing being developed upon the Transfer model. One of the most relevant innovations was the release of a deep bidirectional encoder called BERT that broke several state of the art results. BERT utilises Transfer Learning to improve modelling language dependencies in text. BERT is used for several different languages, other specialized model were released like the french BERT: Camembert. This thesis compares the language models of these different pre-trained models and compares their capability to insure a domain adaptation. Using the multilingual and the french pre-trained version of BERT and a dataset from AXA France’s emails, clients’ messages, legal documents, insurance documents containing over 60 million words. We fine-tuned the language models in order to adapt them on the Axa insurance’s french context to create a French AXAInsurance BERT model. We evaluate the performance of this model on the capability of the language model of predicting a masked token based on the context. BERT proves to perform better : modelling better the french AXA’s insurance text without finetuning than Camembert. However, with this small amount of data, Camembert is more capable of adaptation to this specific domain of insurance. / I denna studie undersöker vi de senaste teknologierna för Natural Language Processing, som gör det möjligt att omvandla textdata till numerisk representation. Vi går igenom teorin om befintliga traditionella metoder såväl som de senaste. Denna avhandling fokuserar på de senaste framstegen inom bearbetning av naturliga språk som utvecklats med hjälp av överföringsmodellen. En av de mest relevanta innovationerna var lanseringen av en djup dubbelriktad kodare som heter BERT som bröt flera toppmoderna resultat. BERT använder Transfer Learning för att förbättra modelleringsspråkberoenden i text. BERT används för flera olika språk, andra specialmodeller släpptes som den franska BERT: Camembert. Denna avhandling jämför språkmodellerna för dessa olika förutbildade modeller och jämför deras förmåga att säkerställa en domänanpassning. Med den flerspråkiga och franska förutbildade versionen av BERT och en dataset från AXA Frankrikes epostmeddelanden, kundmeddelanden, juridiska dokument, försäkringsdokument som innehåller över 60 miljoner ord. Vi finjusterade språkmodellerna för att anpassa dem till Axas försäkrings franska sammanhang för att skapa en fransk AXAInsurance BERT-modell. Vi utvärderar prestandan för denna modell på förmågan hos språkmodellen att förutsäga en maskerad token baserat på sammanhanget. BERTpresterar bättre: modellerar bättre den franska AXA-försäkringstexten utan finjustering än Camembert. Men med denna lilla mängd data är Camembert mer kapabel att anpassa sig till denna specifika försäkringsdomän.
|
78 |
Detection, Extraction and Analysis of Vossian Antonomasia in Large Text Corpora Using Machine LearningSchwab, Michel 02 July 2024 (has links)
Rhetorische Stilmittel, werden seit jeher in Texten verwendet, um Bilder zu erzeugen, Leser zu fesseln und wichtige Punkte hervorzuheben. Unter diesen Stilmitteln ist die Vossianische Antonomasie besonders für den Einsatz von Eigennamen als rhetorische Elemente beliebt. Genauer definiert beinhaltet die Vossianische Antonomasie, dass einem Eigennamen eine bestimmte Menge von Eigenschaften oder Attributen zugeordnet wird, indem ein anderer Eigenname, der für die entsprechenden Eigenschaften allgemein bekannt ist, genannt wird. Modifizierende Phrasen, die typischerweise in Kombination mit dem letztgenannten Eigennamen auftreten, helfen, diese Attribute zu kontextualisieren. Trotz ihrer Allgegenwärtigkeit in modernen Medien ist die Forschung zu ihrer Identifizierung, Verwendung und Interpretation selten. Dies motiviert das Thema dieser Arbeit: die automatische Erkennung, Extraktion und Analyse der Vossianischen Antonomasie.
Wir präsentieren mehrere Methoden zur automatisierten Erkennung des Phänomens und entwickeln einen annotierten Datensatz.
Die Methoden basieren zumeist auf neuronalen Netzen. Zusätzlich stellen wir verschiedene Ansätze zur Extraktion jedes Teils des Stilmittels in einem Satz vor. Darüber hinaus führen wir sprachübergreifende Extraktionsmodelle ein und verfeinern Erkennungsmethoden für eine verbesserte Leistung bei bisher unbekannten syntaktischen Variationen des Phänomens, indem wir uns ausschließlich auf den Schlüsseleigennamen des Stilmittels konzentrieren. Außerdem befassen wir uns mit einer anderen, aber ergänzenden Aufgabe, nämlich der Extraktion des zu beschreibenden Eigennamens in einem ganzen Textabsatz.
Für ein tieferes Verständnis der Vossianischen Antonomasie präsentieren wir eine explorative Analyse des entwickelten Datensatzes. Wir führen zwei interaktive Visualisierungen ein, die die einzelnen Teile des Phänomens und ihr Zusammenspiel hervorheben, um weitere Einblicke zu gewinnen. / Stylistic devices, also known as figures of speech or rhetorical devices, have always been used in text to create imagery, engage readers, and emphasize key points. Among these devices, Vossian Antonomasia, which is closely related to metaphor and metonymy, is particularly popular for employing named entities as rhetorical elements. Defined more precisely, Vossian Antonomasia involves attributing a particular set of properties or attributes to an entity by naming another named entity that is generally well-known for the respective properties. Modifying phrases, which typically appear in combination with the latter entity, help contextualize these attributes. Despite its ubiquity in modern media, the research on its identification, usage, and interpretation is rare. This motivates the topic of this thesis: The automated detection, extraction and analysis of Vossian Antonomasia. We present several methods for the automated detection of the phenomenon and create an annotated dataset. Mostly, the methods are based on neural networks. Additionally, we introduce several approaches for extracting each chunk of the device in a sentence by modeling the problem as a sequence tagging task. Moreover, we introduce cross-lingual extraction models and refine detection methods for an improved performance on unseen syntactic variations of the phenomenon by focusing solely on the key entity of the device. Furthermore, we tackle a distinct but complementary task, namely, the extraction of the entity being described in an entire text paragraph. For a deeper understanding of Vossian Antonomasia, we present an exploratory analysis of the developed dataset. We introduce two interactive visualizations that highlight the chunks of the phenomenon and their interplay to gain more insights.
|
79 |
Coaching och NLP : Vägledare på resa i coachingdjungeln / Coaching and NLP : Councellors on Expedition in the Coaching JungleGranqvist, Björn, Nilsson, Carola January 2009 (has links)
<p>Coaching har blivit en trend i vårt samhälle och denna studie kan ses som en resa genom ”coachingdjungeln”. Undersökningen består av strukturerade telefonintervjuer blandat med litteratur- och artikelstudier där syftet var att ta reda på i vilken utsträckning NLP – <em>neurolingvistisk programmering</em> – förekommer i coaching av arbetslösa och hur det används. Urvalsgruppen finns i 14 av Sveriges 21 län och består av kommunala, privata och statligt anställda jobbcoacher som arbetar med individer som står utanför arbetsmarknaden. Resultatet visar att NLP i coaching av arbetslösa inte är en vanlig modell utan det lösningsfokuserade arbetssättet dominerar. Studien belyser att förhållningssätt, begrepp, metoder, inställning till klienter och kommunikativa verktyg är centralt inom all coaching. Undersökningens resultat visar att likheter finns mellan olika coaching- och vägledningsmodellers innehåll fast de kan ha olika benämningar inom respektive modell. Studien visar också coachingens för- och nackdelar samt tankar kring utbildning, professionalitet och yrkestiteln ”coach”.</p> / <p>Coaching has become a trend in our society and this study can be seen as an expedition through the “coaching jungle”. The inquiry consists of structured interviews by telephone mixed by studies of related literature and articles, where the purpose was to find out to what extent NLP – <em>Neuro Linguistic Programming </em>– occurs in coaching of the unemployed and to what extent it is used in coaching. The selection-group can be found in 14 of the 21 counties of Sweden and it contains job coaches from the local government, private and also staff from the public sector who works with unemployed. The results exhibit that NLP is not a regular model in coaching of unemployed – instead the solution-focused model is quite the dominant one among the selection-group in this inquiry. The study also illustrates approaches, concepts, various methods, attitudes to the clients as well as communicative tools which are all central parts of coaching. The survey reveals that there are similarities between the contents of different coaching- and counselling models, even though they might have different names for it. The study shows the pros and cons of coaching in general and also thoughts and opinions of education, professionalism and the job title “coach”.</p> / Examensarbete inom påbyggnadskurs i studie- och yrkesvägledning, studie- och yrkesvägledarprogrammet
|
80 |
Lärstilar : Hur ska vi i skolan lära ut så att eleverna kan lära in ? / Learning styles : How can we teach in school so that the pupils learn?Ringdahl, Monika January 2006 (has links)
<p>Syftet med detta arbete var att ta reda på hur medvetna lärare på lågstadiet är av lärstilar och hur de praktiserar detta i sin undervisning. Jag har intervjuat sju lärare på lågstadiet för att ta reda på detta. Lärarna i min undersökning är relativt medvetna om lärstilar. Denna medvetenhet gör att det undervisar både genom att berätta (auditivt), visa (visuellt) och att de låter eleverna göra saker (kinestetiskt/taktilt) för att förstå. Detta gör att de allra flesta elever kan tillgodogöra sig undervisningen. Trots detta upplevde lärarna att de elever som lär in genom att ”göra” saker ofta kom i kläm i skolan eftersom deras inlärningsstil kräver mer arbete av läraren.</p> / <p>The purpose of this study has been to find out how conscious primary school teachers are of different learning styles and how they use them in practice. I have interviewed seven primary teachers for this study. The teachers were relatively conscious of different learning styles. This consciousness means that the education is both auditive, visual and that they let the pupils learn by doing to understand. This cause that the most pupils can profit from the training. Despite this they witnessed how pupils who “learnt by doing “lost out in the classroom because their needs demanded more effort from teachers.</p>
|
Page generated in 0.059 seconds