Return to search

Entity discovery by exploiting contextual structures. / CUHK electronic theses & dissertations collection

In text mining, being able to recognize and extract named entities, e.g. Locations, Persons, Organizations, is very useful in many applications. This is usually referred to named entity recognition (NER). This thesis presents a cascaded framework for extracting named entities from text documents. We automatically derive features on a set of documents from different feature templates. To avoid high computational cost incurred by a single-phase approach, we divide the named entity extraction task into a segmentation task and a classification task, reducing the computational cost by an order of magnitude. / To handle cascaded errors that often occur in a sequence of tasks, we investigate and develop three models: maximum-entropy margin-based (MEMB) model, isomeric conditional random field (ICRF) model, and online cascaded reranking (OCR) model. MEMB model makes use of the concept of margin in maximizing log-likelihood. Parameters are trained in a way that they can maximize the "margin" between the decision boundary and the nearest training data points. ICRF model makes use of the concept of joint training. Instead of training each model independently, we design the segmentation and classification models in a way that they can be efficiently trained together under a soft constraint. OCR model is developed by using an online training method to maximize a margin without considering any probability measures, which greatly reduces the training time. It reranks all of the possible outputs from a previous stage based on a total output score. The best output with the highest total score is the final output. / We report experimental evaluations on the GENIA Corpus available from the BioNLP/NLPBA (2004) shared task and the Reuters Corpus available from the CoNLL-2003 shared tasks, which demonstrate the state-of-the-art performance achieved by the proposed models. / Chan, Shing Kit. / Advisers: Wai Lam; Kai Pui Lam. / Source: Dissertation Abstracts International, Volume: 73-06, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2011. / Includes bibliographical references (leaves 126-133). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese.

Identiferoai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_344972
Date January 2011
ContributorsChan, Shing Kit., Chinese University of Hong Kong Graduate School. Division of Systems Engineering and Engineering Management.
Source SetsThe Chinese University of Hong Kong
LanguageEnglish, Chinese
Detected LanguageEnglish
TypeText, theses
Formatelectronic resource, microform, microfiche, 1 online resource (xv, 133 leaves : ill.)
RightsUse of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.002 seconds