• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3
  • Tagged with
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Question Classification in Question Answering Systems

Sundblad, Håkan January 2007 (has links)
<p>Question answering systems can be seen as the next step in information retrieval, allowing users to pose questions in natural language and receive succinct answers. In order for a question answering system as a whole to be successful, research has shown that the correct classification of questions with regards to the expected answer type is imperative. Question classification has two components: a taxonomy of answer types, and a machinery for making the classifications.</p><p>This thesis focuses on five different machine learning algorithms for the question classification task. The algorithms are k nearest neighbours, naïve bayes, decision tree learning, sparse network of winnows, and support vector machines. These algorithms have been applied to two different corpora, one of which has been used extensively in previous work and has been constructed for a specific agenda. The other corpus is drawn from a set of users' questions posed to a running online system. The results showed that the performance of the algorithms on the different corpora differs both in absolute terms, as well as with regards to the relative ranking of them. On the novel corpus, naïve bayes, decision tree learning, and support vector machines perform on par with each other, while on the biased corpus there is a clear difference between them, with support vector machines being the best and naïve bayes being the worst.</p><p>The thesis also presents an analysis of questions that are problematic for all learning algorithms. The errors can roughly be divided as due to categories with few members, variations in question formulation, the actual usage of the taxonomy, keyword errors, and spelling errors. A large portion of the errors were also hard to explain.</p> / Report code: LiU-Tek-Lic-2007:29.
2

Question Classification in Question Answering Systems

Sundblad, Håkan January 2007 (has links)
Question answering systems can be seen as the next step in information retrieval, allowing users to pose questions in natural language and receive succinct answers. In order for a question answering system as a whole to be successful, research has shown that the correct classification of questions with regards to the expected answer type is imperative. Question classification has two components: a taxonomy of answer types, and a machinery for making the classifications. This thesis focuses on five different machine learning algorithms for the question classification task. The algorithms are k nearest neighbours, naïve bayes, decision tree learning, sparse network of winnows, and support vector machines. These algorithms have been applied to two different corpora, one of which has been used extensively in previous work and has been constructed for a specific agenda. The other corpus is drawn from a set of users' questions posed to a running online system. The results showed that the performance of the algorithms on the different corpora differs both in absolute terms, as well as with regards to the relative ranking of them. On the novel corpus, naïve bayes, decision tree learning, and support vector machines perform on par with each other, while on the biased corpus there is a clear difference between them, with support vector machines being the best and naïve bayes being the worst. The thesis also presents an analysis of questions that are problematic for all learning algorithms. The errors can roughly be divided as due to categories with few members, variations in question formulation, the actual usage of the taxonomy, keyword errors, and spelling errors. A large portion of the errors were also hard to explain. / <p>Report code: LiU-Tek-Lic-2007:29.</p>
3

What the BERT? : Fine-tuning KB-BERT for Question Classification / Vad i BERT? : Finjustering av KB-BERT för frågeklassificering

Cervall, Jonatan January 2021 (has links)
This work explores the capabilities of KB-BERT on the downstream task of Question Classification. The TREC data set for Question Classification with the Li and Roth taxonomy was translated to Swedish, by manually correcting the output of Google’s Neural Machine Translation. 500 new data points were added. The fine-tuned model was compared with a similarly trained model based on Multilingual BERT, a human evaluation, and a simple rule-based baseline. Out of the four methods of this work, the Swedish BERT model (SwEAT- BERT) performed the best, achieving 91.2% accuracy on TREC-50 and 96.2% accuracy on TREC-6. The performance of the human evaluation was worse than both BERT models, but doubt is cast on how fair this comparison is. SwEAT-BERTs results are competitive even when compared to similar models based on English BERT. This furthers the notion that the only roadblock in training language models for smaller languages is the amount of readily available training data. / Detta arbete utforskar hur bra den svenska BERT-modellen, KB-BERT, är på frågeklassificering. BERT är en transformermodell som skapar kontextuella, bidirektionella ordinbäddningar. Det engelska datasetet för frågeklassificering, TREC, översattes till svenska och utökades med 500 nya datapunkter. Två BERT-modeller finjusterades på detta nya TREC-dataset, en baserad på KB-BERT och en baserad på Multilingual BERT, en flerspråkig variant av BERT tränad på data från 104 språk (däribland svenska). En regel-baserad modell byggdes som en nedre gräns på problemet, och en mänsklig klassificeringsstudie utfördes som jämförelse. BERT-modellen baserad på KB-BERT (SwEAT-BERT) uppnådde 96.2% korrekthet på TREC med 6 kategorier, och 91.2% korrekthet på TREC med 50 kategorier. Den mänskliga klassificeringen uppnådde sämre resultat än båda BERT-modellerna, men det är tvivelaktigt hur rättvis denna jämförelse är. SwEAT-BERT presterade bäst av metoderna som testades i denna studie, och konkurrenskraftigt i jämförelse med engelska BERT-modeller finjusterade på det engelska TREC-datasetet. Detta resultat stärker uppfattningen att tillgänglighet till träningsdata är det enda som står i vägen för starkare språkmodeller för mindre språk.

Page generated in 0.1652 seconds