Global ETD Search

Return to search

Head Tail Open: Open Tailed Classification of Imbalanced Document Data

Deep learning models for scanned document image classification and form understand- ing have made significant progress in the last few years. High accuracy can be achieved by a model with the help of copious amounts of labelled training data for closed-world classification. However, very little work has been done in the domain of fine-grained and head-tailed(class imbalance with some classes having high numbers of data points and some having a low number of data points) open-world classification for documents. Our proposed method achieves a better classification results than the baseline of the head-tail-novel/open dataset. Our techniques include separating the head-tail classes and transferring the knowledge from head data to the tail data. This transfer of knowledge also improves the capability of recognizing a novel category by 15% as compared to the baseline.

Open world classification

Class imbalance

Document classification

Physical Sciences and Mathematics

Identifer	oai:union.ndltd.org:BGMYU2/oai:scholarsarchive.byu.edu:etd-11383
Date	23 April 2024
Creators	Joshi, Chetan
Publisher	BYU ScholarsArchive
Source Sets	Brigham Young University
Detected Language	English
Type	text
Format	application/pdf
Source	Theses and Dissertations
Rights	https://lib.byu.edu/about/copyright/

Page generated in 0.0055 seconds

Head Tail Open: Open Tailed Classification of Imbalanced Document Data

Description

Links & Downloads

Tags

Additional Fields