Information Extraction (IE) is the process of analyzing documents and identifying desired pieces of information within them. Many IE systems have been developed over the last couple of decades, but there is still room for improvement as IE remains an open problem for researchers. This work discusses the development of a hybrid IE system that attempts to combine the strengths of rule-based and statistical IE systems while avoiding their unique pitfalls in order to achieve high performance for any type of information on any type of document. Test results show that this system operates competitively in cases where target information belongs to a highly-structured data type and when critical contextual information is in close proximity to the target.
Identifer | oai:union.ndltd.org:CALPOLY/oai:digitalcommons.calpoly.edu:theses-2622 |
Date | 01 September 2015 |
Creators | Grap, Marie Belen |
Publisher | DigitalCommons@CalPoly |
Source Sets | California Polytechnic State University |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Master's Theses |
Page generated in 0.0224 seconds