Global ETD Search

Return to search

Learning for text mining : tackling the cost of feature and knowledge engineering

Over the last decade, the state-of-the-art in text mining has moved towards the adoption of machine learning as the main paradigm at the heart of approaches. Despite significant advances, machine learning based text mining solutions remain costly to design, develop and maintain for real world problems. An important component of such cost (feature engineering) concerns the effort required to understand which features or characteristics of the data can be successfully exploited in inducing a predictive model of the data. Another important component of the cost (knowledge engineering) has to do with the effort in creating labelled data, and in eliciting knowledge about the mining systems and the data itself. I present a series of approaches, methods and findings aimed at reducing the cost of creating and maintaining document classification and information extraction systems. They address the following questions: Which classes of features lead to an improved classification accuracy in the document classification and entity extraction tasks? How to reduce the amount of labelled examples needed to train machine learning based document classification and information extraction systems, so as to relieve domain experts from this costly task? How to effectively represent knowledge about these systems and the data that they manipulate, in order to make systems interoperable and results replicable? I provide the reader with the background information necessary to understand the above questions and the contributions to the state-of the- art contained herein. The contributions include: the identification of novel classes of features for the document classification task which exploit the multimedia nature of documents and lead to improved classification accuracy; a novel approach to domain adaptation for text categorization which outperforms standard supervised and semi-supervised methods while requiring considerably less supervision; and a well-founded formalism for declaratively specifying text and multimedia mining systems.

http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.577568

006.312

Identifer	oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:577568
Date	January 2013
Creators	Iria, José
Publisher	University of Sheffield
Source Sets	Ethos UK
Detected Language	English
Type	Electronic Thesis or Dissertation
Source	http://etheses.whiterose.ac.uk/14608/

Page generated in 0.0026 seconds

Learning for text mining : tackling the cost of feature and knowledge engineering

Description

Links & Downloads

Tags

Additional Fields