Global ETD Search

Return to search

Optimering av en chattbot för det svenska språket / Optimization of a Chatbot for the Swedish Language

Chattbotutvecklare på Softronic använder i dagsläget Rasa-ramverket och dess standardkomponenter för bearbetning av användarinmatning. Det här är problematiskt då standardkomponenterna inte är optimerade för det svenska språket. Till följd av detta efterfrågades en utvärdering av samtliga Rasa-komponenter med syfte att identifiera de mest gynnsamma komponenterna för att maximera klassificeringsträffsäkerhet. I detta examensarbete framtogs och jämfördes flera Rasa-pipelines med olika komponenter för tokenisering, känneteckensextrahering och klassificering. Resultaten av komponenterna för tokenisering visade att Rasas WhitespaceTokenizer överträffade både SpacyTokenizer och StanzaTokenizer. För känneteckensextrahering var CountVectorsFeaturizer, LanguageModelFeaturizer (med LaBSE-modellen) och FastTextFeaturizer (med den officiella fastText-modellen tränad på svenska Wikipedia) de mest optimala komponenterna. Den klassificerare som i allmänhet presterade bäst var DIETClassifier, men det fanns flera tillfällen där SklearnIntentClassifier överträffade den. Detta arbete resulterade i flera pipelines som överträffade Rasas standard-pipeline. Av dessa pipelines var det två som presterade bäst. Den första pipeline implementerade komponenterna WhitespaceTokenizer, CountVectorsFeaturizer, FastTextFeaturizer (med den officiella fastText-modellen tränad på svenska Wikipedia) och DIETClassifier med en klassificeringsträffsäkerhet på 91% (F1-score). Den andra pipeline implementerade komponenterna WhitespaceTokenizer, LanguageModelFeaturizer (med LaBSE-modellen) och SklearnIntentClassifier med en klassificeringsträffsäkerhet på 91,5% (F1-score). / Chatbot developers at Softronic currently use the Rasa framework and its default components for processing user input. This is problematic as the default components are not optimized for the Swedish language. Following this an evaluation of all Rasa components was requested with the purpose of identifying the most favorable components to maximize classification accuracy. In this thesis, several Rasa pipelines were developed and compared with different components for tokenization, feature extraction and classification. The results of the tokenization components showed that Rasa's WhitespaceTokenizer surpassed both SpacyTokenizer and StanzaTokenizer. For feature extraction, CountVectorsFeaturizer, LanguageModelFeaturizer (with the LaBSE model) and FastTextFeaturizer (with the official fastText model trained on Swedish Wikipedia) were the most optimal components. The classifier that generally performed best was DIETClassifier, but there were several occasions where SklearnIntentClassifier surpassed it. This work resulted in several pipelines that exceeded Rasa’s standard pipeline. Of these pipelines, two performed best. The first pipeline implemented the components WhitespaceTokenizer, CountVectorsFeaturizer, FastTextFeaturizer (with the official fastText model trained on Swedish Wikipedia) and DIETClassifier with a classification accuracy of 91% (F1 score). The other pipeline implemented the components WhitespaceTokenizer, LanguageModelFeaturizer (with the LaBSE model) and SklearnIntentClassifier with a classification accuracy of 91.5% (F1 score).

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-296616

Chatbots

machine learning

natural language processing

naturlig språkbearbetning

tokenisering

känneteckensextrahering

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-296616
Date	January 2021
Creators	Mutaliev, Mohammed, Almimar, Ibrahim
Publisher	KTH, Hälsoinformatik och logistik
Source Sets	DiVA Archive at Upsalla University
Language	Swedish
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	TRITA-CBH-GRU ; 2021:38

Page generated in 0.0024 seconds

Optimering av en chattbot för det svenska språket / Optimization of a Chatbot for the Swedish Language

Description

Links & Downloads

Tags

Additional Fields