Global ETD Search

Return to search

Advanced Text Analytics and Machine Learning Approach for Document Classification

Text classification is used in information extraction and retrieval from a given text, and text classification has been considered as an important step to manage a vast number of records given in digital form that is far-reaching and expanding. This thesis addresses patent document classification problem into fifteen different categories or classes, where some classes overlap with other classes for practical reasons. For the development of the classification model using machine learning techniques, useful features have been extracted from the given documents. The features are used to classify patent document as well as to generate useful tag-words. The overall objective of this work is to systematize NASA’s patent management, by developing a set of automated tools that can assist NASA to manage and market its portfolio of intellectual properties (IP), and to enable easier discovery of relevant IP by users. We have identified an array of methods that can be applied such as k-Nearest Neighbors (kNN), two variations of the Support Vector Machine (SVM) algorithms, and two tree based classification algorithms: Random Forest and J48. The major research steps in this work consist of filtering techniques for variable selection, information gain and feature correlation analysis, and training and testing potential models using effective classifiers. Further, the obstacles associated with the imbalanced data were mitigated by adding synthetic data wherever appropriate, which resulted in a superior SVM classifier based model.

Computational Engineering

Computer and Systems Architecture

Identifer	oai:union.ndltd.org:uno.edu/oai:scholarworks.uno.edu:td-3466
Date	19 May 2017
Creators	Anne, Chaitanya
Publisher	ScholarWorks@UNO
Source Sets	University of New Orleans
Detected Language	English
Type	text
Format	application/pdf
Source	University of New Orleans Theses and Dissertations

Page generated in 0.0019 seconds

Advanced Text Analytics and Machine Learning Approach for Document Classification

Description

Links & Downloads

Tags

Additional Fields