Global ETD Search

Return to search

Evaluation of methods for question answering data generation : Using large language models / Utvärdering av metoder för skapande av fråge-svar data

One of the largest challenges in the field of artificial intelligence and machine learning isthe acquisition of a large quantity of quality data to train models on.This thesis investigates and evaluates approaches to data generation in a telecom domain for the task of extractive QA. To do this a pipeline was built using a combination ofBERT-like models and T5 models for data generation. We then evaluated our generateddata using the downstream task of QA on a telecom domain data set. We measured theperformance using EM and F1-scores. We achieved results that are state of the art on thetelecom domain data set.We found that synthetic data generation is a viable approach to obtaining synthetictelecom QA data with the potential of improving model performance when used in addition to human-annotated data. We also found that using models from the general domainprovided results that are on par or better than domain-specific models for the generation, which provides possibilities to use a single generation pipeline for many differentdomains. Furthermore, we found that increasing the amount of synthetic data providedlittle benefit for our models on the downstream task, with diminishing returns setting inquickly. We were unable to pinpoint the reason for this. In short, our approach works butmuch more work remains to understand and optimize it for greater results

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-187956

Machine Learning

NLP

Natural Language Processing

Datavetenskap (datalogi)

Other Computer and Information Science

Annan data- och informationsvetenskap

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-187956
Date	January 2022
Creators	Bissessar, Daniel, Bois, Alexander
Publisher	Linköpings universitet, Institutionen för datavetenskap
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0018 seconds

Evaluation of methods for question answering data generation : Using large language models / Utvärdering av metoder för skapande av fråge-svar data

Description

Links & Downloads

Tags

Additional Fields