Global ETD Search

Return to search

All Purpose Textual Data Information Extraction, Visualization and Querying

abstract: Since the advent of the internet and even more after social media platforms, the explosive growth of textual data and its availability has made analysis a tedious task. Information extraction systems are available but are generally too specific and often only extract certain kinds of information they deem necessary and extraction worthy. Using data visualization theory and fast, interactive querying methods, leaving out information might not really be necessary. This thesis explores textual data visualization techniques, intuitive querying, and a novel approach to all-purpose textual information extraction to encode large text corpus to improve human understanding of the information present in textual data.

This thesis presents a modified traversal algorithm on dependency parse output of text to extract all subject predicate object pairs from text while ensuring that no information is missed out. To support full scale, all-purpose information extraction from large text corpuses, a data preprocessing pipeline is recommended to be used before the extraction is run. The output format is designed specifically to fit on a node-edge-node model and form the building blocks of a network which makes understanding of the text and querying of information from corpus quick and intuitive. It attempts to reduce reading time and enhancing understanding of the text using interactive graph and timeline. / Dissertation/Thesis / Masters Thesis Software Engineering 2018

http://hdl.handle.net/2286/R.I.50530

Computer science

Natural Language Processing

Natural Language Understanding

Text Mining

Text Querying

Text Visualization

Identifer	oai:union.ndltd.org:asu.edu/item:50530
Date	January 2018
Contributors	Hashmi, Syed Usama (Author), Bansal, Ajay (Advisor), Bansal, Srividya (Committee member), Gonzalez Sanchez, Javier (Committee member), Arizona State University (Publisher)
Source Sets	Arizona State University
Language	English
Detected Language	English
Type	Masters Thesis
Format	148 pages
Rights	http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.002 seconds

All Purpose Textual Data Information Extraction, Visualization and Querying

Description

Links & Downloads

Tags

Additional Fields