• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • No language data
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

A Green Form-Based Information Extraction System for Historical Documents

Kim, Tae Woo 01 May 2017 (has links)
Many historical documents are rich in genealogical facts. Extracting these facts by hand is tedious and almost impossible considering the hundreds of thousands of genealogically rich family-history books currently scanned and online. As one approach for helping to make the extraction feasible, we propose GreenFIE—a "Green" Form-based Information-Extraction tool which is "green" in the sense that it improves with use toward the goal of minimizing the cost of human labor while maintaining high extraction accuracy. Given a page in a historical document, the user's task is to fill out given forms with all facts on a page in a document called for by the forms (e.g. to collect the birth and death information, marriage information, and parent-child relationships for each person on the page). GreenFIE has a repository of extraction patterns that it applies to fill in forms. A user checks the correctness of GreenFIE's form filling, adds any missed facts, and fixes any mistakes. GreenFIE learns based on user feedback, adding new extraction rules to its repository. Ideally, GreenFIE improves as it proceeds so that it does most of the work, leaving little for the user to do other than confirm that its extraction is correct. We evaluate how well GreenFIE performs on family history books in terms of "greenness"—how much human labor diminishes during form filling, while simultaneously maintaining high accuracy.

Page generated in 0.0797 seconds