Global ETD Search

Return to search

A Semi-Supervised Information Extraction Framework for Large Redundant Corpora

The vast majority of text freely available on the Internet is not available in a form that computers can understand. There have been numerous approaches to automatically extract information from human- readable sources. The most successful attempts rely on vast training sets of data. Others have succeeded in extracting restricted subsets of the available information. These approaches have limited use and require domain knowledge to be coded into the application. The current thesis proposes a novel framework for Information Extraction. From large sets of documents, the system develops statistical models of the data the user wishes to query which generally avoid the lim- itations and complexity of most Information Extractions systems. The framework uses a semi-supervised approach to minimize human input. It also eliminates the need for external Named Entity Recognition systems by relying on freely available databases. The final result is a query-answering system which extracts information from large corpora with a high degree of accuracy.

Information Extraction

Natural Language Processing

Support Vector Machine

Machine Learn- ing

Information Retrieval

unstructured text

Identifer	oai:union.ndltd.org:uno.edu/oai:scholarworks.uno.edu:td-1857
Date	19 December 2008
Creators	Normand, Eric
Publisher	ScholarWorks@UNO
Source Sets	University of New Orleans
Detected Language	English
Type	text
Format	application/pdf
Source	University of New Orleans Theses and Dissertations

Page generated in 0.014 seconds

A Semi-Supervised Information Extraction Framework for Large Redundant Corpora

Description

Links & Downloads

Tags

Additional Fields