Global ETD Search

61	Automatic Question Paraphrasing in Swedish with Deep Generative Models / Automatisk frågeparafrasering på svenska med djupa generativa modeller Lindqvist, Niklas January 2021 (has links) Paraphrase generation refers to the task of automatically generating a paraphrase given an input sentence or text. Paraphrase generation is a fundamental yet challenging natural language processing (NLP) task and is utilized in a variety of applications such as question answering, information retrieval, conversational systems etc. In this study, we address the problem of paraphrase generation of questions in Swedish by evaluating two different deep generative models that have shown promising results on paraphrase generation of questions in English. The first model is a Conditional Variational Autoencoder (C-VAE) and the other model is an extension of the first one where a discriminator network is introduced into the model to form a Generative Adversarial Network (GAN) architecture. In addition to these models, a method not based on machine-learning was implemented to act as a baseline. The models were evaluated using both quantitative and qualitative measures including grammatical correctness and equivalence to source question. The results show that the deep generative models outperformed the baseline across all quantitative metrics. Furthermore, from the qualitative evaluation it was shown that the deep generative models outperformed the baseline at generating grammatically correct sentences, but there was no noticeable difference in terms of equivalence to the source question between the models. / Parafrasgenerering syftar på uppgiften att, utifrån en given mening eller text, automatiskt generera en parafras, det vill säga en annan text med samma betydelse. Parafrasgenerering är en grundläggande men ändå utmanande uppgift inom naturlig språkbehandling och används i en rad olika applikationer som informationssökning, konversionssystem, att besvara frågor givet en text etc. I den här studien undersöker vi problemet med parafrasgenerering av frågor på svenska genom att utvärdera två olika djupa generativa modeller som visat lovande resultat på parafrasgenerering av frågor på engelska. Den första modellen är en villkorsbaserad variationsautokodare (C-VAE). Den andra modellen är också en C-VAE men introducerar även en diskriminator vilket gör modellen till ett generativt motståndarnätverk (GAN). Förutom modellerna presenterade ovan, implementerades även en icke maskininlärningsbaserad metod som en baslinje. Modellerna utvärderades med både kvantitativa och kvalitativa mått inklusive grammatisk korrekthet och likvärdighet mellan parafras och originalfråga. Resultaten visar att de djupa generativa modellerna presterar bättre än baslinjemodellen på alla kvantitativa mätvärden. Vidare, visade the kvalitativa utvärderingen att de djupa generativa modellerna kunde generera grammatiskt korrekta frågor i större utsträckning än baslinjemodellen. Det var däremot ingen större skillnad i semantisk ekvivalens mellan parafras och originalfråga för de olika modellerna. Paraphrase Generation Variational Autoencoder Generative Adversarial Networks Natural Language Generation Deep Learning Word Embeddings Parafrasgenerering Variational Autoencoder generativa adversariala nätverk naturlig språkgenerering djupinlärning ordinbäddning Computer and Information Sciences Data- och informationsvetenskap
62	Distributionella representationer av ord för effektiv informationssökning : Algoritmer för sökning i kundsupportforum / Distributional Representations of Words for Effective Information Retrieval : Information Retrieval in Customer Support Forums Lachmann, Tim, Sabel, Johan January 2017 (has links) I takt med att informationsmängden ökar i samhället ställs högre krav på mer förfinade metoder för sökning och hantering av information. Att utvinna relevant data från företagsinterna system blir en mer komplex uppgift då större informationsmängder måste hanteras och mycket kommunikation förflyttas till digitala plattformar. Metoder för vektorbaserad ordinbäddning har under senare år gjort stora framsteg; i synnerhet visade Google 2013 banbrytande resultat med modellen Word2vec och överträffade äldre metoder. Vi implementerar en sökmotor som utnyttjar ordinbäddningar baserade på Word2vec och liknande modeller, avsedd att användas på IT-företaget Kundo och för produkten Kundo Forum. Resultaten visar på potential för informationssökning med markant bättre täckning utan minskad precision. Kopplat till huvudområdet informationssökning genomförs också en analys av vilka implikationer en förbättrad sökmotor har ur ett marknads- och produktutvecklingsperspektiv. / As the abundance of information in society increases, so does the need for more sophisticated methods of information retrieval. Extracting information from internal systems becomes a more complex task when handling larger amounts of information and when more communications are transferred to digital platforms. Recent years methods for word embedding in vector space have gained traction. In 2013 Google sent ripples across the field of Natural Language Processing with a new method called Word2vec, significantly outperforming former practices. Among different established methods for information retrieval, we implement a retrieval method utilizing Word2vec and related methods of word embedding for the search engine at IT company Kundo and their product Kundo Forum. We demonstrate the potential to improve information retrieval recall by a significant margin without diminishing precision. Coupled with the primary subject of information retrieval we also investigate potential market and product development implications related to a different kind of search engine. word2vec fasttext glove LSI LSA word embeddings information retrieval search engine machine learning neural networks natural language processing NLP distributional representations word2vec fasttext glove LSI LSA ordinbäddning informationssökning sökmotor maskininlärning språkteknologi neurala nätverk distributionella representationer Computer Sciences Datavetenskap (datalogi)
63	Automatic Pronoun Resolution for Swedish / Automatisk pronomenbestämning på svenska Ahlenius, Camilla January 2020 (has links) This report describes a quantitative analysis performed to compare two different methods on the task of pronoun resolution for Swedish. The first method, an implementation of Mitkov’s algorithm, is a heuristic-based method — meaning that the resolution is determined by a number of manually engineered rules regarding both syntactic and semantic information. The second method is data-driven — a Support Vector Machine (SVM) using dependency trees and word embeddings as features. Both methods are evaluated on an annotated corpus of Swedish news articles which was created as a part of this thesis. SVM-based methods significantly outperformed the implementation of Mitkov’s algorithm. The best performing SVM model relies on tree kernels applied to dependency trees. The model achieved an F1-score of 0.76 for the positive class and 0.9 for the negative class, where positives are pairs of pronoun and noun phrase that corefer, and negatives are pairs that do not corefer. / Rapporten beskriver en kvantitativ analys som genomförts för att jämföra två olika metoder för automatisk pronomenbestämning på svenska. Den första metoden, en implementation av Mitkovs algoritm, är en heuristisk metod vilket innebär att pronomenbestämningen görs med ett antal manuellt utformade regler som avser att fånga både syntaktisk och semantisk information. Den andra metoden är datadriven, en stödvektormaskin (SVM) som använder dependensträd och ordvektorer som särdrag. Båda metoderna utvärderades med hjälp av en annoterad datamängd bestående av svenska nyhetsartiklar som skapats som en del av denna avhandling. Den datadrivna metoden överträffade Mitkovs algoritm. Den SVM-modell som ger bäst resultat bygger på trädkärnor som tillämpas på dependensträd. Modellen uppnådde ett F1-värde på 0.76 för den positiva klassen och 0.9 för den negativa klassen, där de positiva datapunkterna utgörs av ett par av pronomen och nominalfras som korefererar, och de negativa datapunkterna utgörs av par som inte korefererar. Pronoun resolution Mitkov’s algorithm Support Vector Machine Supervised learning SVM-Light-TK Tree kernels Dependency trees Word embeddings Pronomenbestämning Mitkovs algoritm Stödvektormaskin Övervakad inlärning SVM-Light-TK Trädkärnor Dependensträd Ordvektorer Computer and Information Sciences Data- och informationsvetenskap
64	Discovering Implant Terms in Medical Records Jerdhaf, Oskar January 2021 (has links) Implant terms are terms like "pacemaker" which indicate the presence of artifacts in the body of a human. These implant terms are key to determining if a patient can safely undergo Magnetic Resonance Imaging (MRI). However, to identify these terms in medical records is time-consuming, laborious and expensive, but necessary for taking the correct precautions before an MRI scan. Automating this process is of great interest to radiologists as it ideally saves time, prevents mistakes and as a result saves lives. The electronic medical records (EMR) contain the documented medical history of a patient, including any implants or objects that an individual would have inside their body. Information about such objects and implants are of great interest when determining if and how a patient can be scanned using MRI. This information is unfortunately not easily extracted through automatic means. Due to their sparse presence and the unusual structure of medical records compared to most written text, makes it very difficult to automate using simple means. By leveraging the recent advancements in Artificial Intelligence (AI), this thesis explores the ability to identify and extract such terms automatically in Swedish EMRs. For the task of identifying implant terms in medical records a generally trained Swedish Bidirectional Encoder Representations from Transformers (BERT) model is used, which is then fine-tuned on Swedish medical records. Using this model a variety of approaches are explored two of which will be covered in this thesis. Using this model a variety of approaches are explored, namely BERT-KDTree, BERT-BallTree, Cosine Brute Force and unsupervised NER. The results show that BERT-KDTree and BERT-BallTree are the most rewarding methods. Results from both methods have been evaluated by domain experts and appear promising for such an early stage, given the difficulty of the task. The evaluation of BERT-BallTree shows that multiple methods of extraction may be preferable as they provide different but still useful terms. Cosine brute force is deemed to be an unrealistic approach due to computational and memory requirements. The NER approach was deemed too impractical and laborious to justify for this study, yet is potentially useful if not more suitable given a different set of conditions and goals. While there is much to be explored and improved, these experiments are a clear indication that automatic identification of implant terms is possible, as a large number of implant terms were successfully discovered using automated means. AI Machine Learning Medical Records Patient Records Medical Record Electronic Records Electronic Medical Records BERT EMR Implant Terms Implants Term Terms Term Discovery Artificial Intelligence Word Similarity Word Similarity word-similarity embeddings word embeddings word-embeddings transformers KDTREE BALLTREE NER AI Artificiel Intelligens Maskininlärning Patient Journal Medicinsk Journal Elektronisk Medicinsk Journal Termer BERT KDTREE BALLTREE NER liknande ord transformers EMR
65	L’usage des codons régule la présentation des peptides associés aux molécules du CMH-I Daouda, Tariq 01 1900 (has links) No description available. Immunothérapie du cancer Prédiction de néoantigènes Protéogénomique Réseaux de neurones artificiels Spectrométrie de masse Immunothérapie du cancer Intégration de mots Développement logiciel MHC-I associated peptides Artificial neural networks Cancer immunotherapy Neoantigen prediction Proteogenomics Mass Spectrometry Word embeddings mARN Codon usage Software development Usage de codons ARNm
66	Dynamic Student Embeddings for a Stable Time Dimension in Knowledge Tracing Tump, Clara January 2020 (has links) Knowledge tracing is concerned with tracking a student’s knowledge as she/he engages with exercises in an (online) learning platform. A commonly used state-of-theart knowledge tracing model is Deep Knowledge Tracing (DKT) which models the time dimension as a sequence of completed exercises per student by using a Long Short-Term Memory Neural Network (LSTM). However, a common problem in this sequence-based model is too much instability in the time dimension of the modelled knowledge of a student. In other words, the student’s knowledge on a skill changes too quickly and unreliably. We propose dynamic student embeddings as a stable method for encoding the time dimension of knowledge tracing systems. In this method the time dimension is encoded in time slices of a fixed size, while the model’s loss function is designed to smoothly align subsequent time slices. We compare the dynamic student embeddings to DKT on a large-scale real-world dataset, and we show that dynamic student embeddings provide a more stable knowledge tracing while retaining good performance. / Kunskapsspårning handlar om att modellera en students kunskaper då den arbetar med uppgifter i en (online) lärplattform. En vanlig state-of-the-art kunskapsspårningsmodell är Deep Knowledge Tracing (DKT) vilken modellerar tidsdimensionen som en sekvens av avslutade uppgifter per student med hjälp av ett neuronnät kallat Long Short-Term Memory Neural Network (LSTM). Ett vanligt problem i dessa sekvensbaserade modeller är emellertid en för stor instabilitet i tidsdimensionen för studentens modellerade kunskap. Med andra ord, studentens kunskaper förändras för snabbt och otillförlitligt. Vi föreslår därför Dynamiska Studentvektorer som en stabil metod för kodning av tidsdimensionen för kunskapsspårningssystem. I denna metod kodas tidsdimensionen i tidsskivor av fix storlek, medan modellens förlustfunktion är utformad för att smidigt justera efterföljande tidsskivor. I denna uppsats jämför vi de Dynamiska Studentvektorer med DKT i en storskalig verklighetsbaserad dataset, och visar att Dynamiska Studentvektorer tillhandahåller en stabilare kunskapsspårning samtidigt som prestandan bibehålls. Knowledge Tracing Exercise Recommendation Adaptive Learning Machine Learning Word Embeddings Dynamic Embeddings Recurrent Neural Networks Long Short-Term Memory Neural Networks Kunskapsspårning Uppgiftsrekommendation Adaptivt Lärande Maskininlärning Ordvektorer Dynamiska Studentvektorer Recurrent Neural Networks Long ShortTerm Memory Neural Networks Computer and Information Sciences Data- och informationsvetenskap

Page generated in 0.0656 seconds