Return to search

Question Classification in Question Answering Systems

Question answering systems can be seen as the next step in information retrieval, allowing users to pose questions in natural language and receive succinct answers. In order for a question answering system as a whole to be successful, research has shown that the correct classification of questions with regards to the expected answer type is imperative. Question classification has two components: a taxonomy of answer types, and a machinery for making the classifications. This thesis focuses on five different machine learning algorithms for the question classification task. The algorithms are k nearest neighbours, naïve bayes, decision tree learning, sparse network of winnows, and support vector machines. These algorithms have been applied to two different corpora, one of which has been used extensively in previous work and has been constructed for a specific agenda. The other corpus is drawn from a set of users' questions posed to a running online system. The results showed that the performance of the algorithms on the different corpora differs both in absolute terms, as well as with regards to the relative ranking of them. On the novel corpus, naïve bayes, decision tree learning, and support vector machines perform on par with each other, while on the biased corpus there is a clear difference between them, with support vector machines being the best and naïve bayes being the worst. The thesis also presents an analysis of questions that are problematic for all learning algorithms. The errors can roughly be divided as due to categories with few members, variations in question formulation, the actual usage of the taxonomy, keyword errors, and spelling errors. A large portion of the errors were also hard to explain. / <p>Report code: LiU-Tek-Lic-2007:29.</p>

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-9014
Date January 2007
CreatorsSundblad, Håkan
PublisherLinköpings universitet, NLPLAB - Laboratoriet för databehandling av naturligt språk, Linköpings universitet, Tekniska högskolan, Institutionen för datavetenskap
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeLicentiate thesis, monograph, info:eu-repo/semantics/masterThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess
RelationLinköping Studies in Science and Technology. Thesis, 0280-7971 ; 1320

Page generated in 0.0019 seconds