Return to search

Computational methods for identifying and classifying questions in online collaborative learning discourse of Hong Kong students

This study aims to investigate the automated question detection and classification methods to support teachers in monitoring the progression of discussion in Computer-Supported Collaborative Learning (CSCL) discourse of Hong Kong students. Questioning is an important component of CSCL. Through the analysis of question types in CSCL discourse, teachers may probably get a general idea of how an inquiry is constructed. This study is an attempt to take up this time-consuming task of question classification with the techniques developed from machine learning. In general, the performance of machine learning algorithms will improve by increasing the amount of empirical data for training. The amount of training data is a determining factor for the performance of machine learning algorithms. The machine learning based question classification algorithms may not able to detect those question types with a small amount of training data. In order not to miss out those questions, an extra step to detect the occurrence of all question types might be needed.

One Chinese and one English datasets are collected from an online discussion platform. These datasets are selected for comparing the performance of question detection and classification in the two languages, and a sentence is defined as the unit of analysis. Question detection is a process to distinguish questions from other types of discourse act. A hybrid method is proposed to combine the rule-based question mark method and machine-learning-based syntax method for question detection. This method achieves 94.8% f1-score and 98.9% accuracy in English question detection and 94.8% f1-score and 93.9% accuracy in Chinese question detection. While question detection focuses mainly on the identification of questions, question classification concentrates on the categorization of questions. The literature showed that the tree kernel method is almost a standardized method for question classification. The classification of English verification and reason questions using tree kernel method can both attained f1-score above 80%. Though the precision of Chinese question classification using the same settings remains at a similar level, the recall drops greatly. This result indicates that the syntax-based tree kernel method may not be appropriate for classifying questions in Chinese languages. In order to improve on the Chinese question classification result, Case-Based Reasoning (CBR) is introduced. CBR is a method to retrieve example case(s) which shares the maximum percentage of similarity with the test case from a database. In this study, the similarity is measured by the lexemes that composed a question. Although the implementation of the CBR method can improve the recall, it also causes the great drop of precision. Considering the high precision of tree kernel method and wide coverage of CBR method, a hybrid method is proposed to combine the two methods. The experiment result shows that f1-score of the hybrid method for multi-class classification surpasses the tree kernel and CBR methods. This indicates that the implementation of hybrid method can generally improve the result of Chinese question classification. / published_or_final_version / Education / Master / Master of Philosophy

Identiferoai:union.ndltd.org:HKU/oai:hub.hku.hk:10722/188758
Date January 2013
CreatorsWong, On-wing., 黃安穎.
ContributorsLaw, NWY, Chan, KP
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Source SetsHong Kong University Theses
LanguageEnglish
Detected LanguageEnglish
TypePG_Thesis
Sourcehttp://hub.hku.hk/bib/B50605859
RightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works., Creative Commons: Attribution 3.0 Hong Kong License
RelationHKU Theses Online (HKUTO)

Page generated in 0.0021 seconds