Return to search

The Convolutional Recurrent Structure in Computer Vision Applications

By organically fusing the methods of convolutional neural network (CNN) and recurrent neural network (RNN), this dissertation focuses on the application of optical character recognition and image classification processing. The first part of this dissertation presents an end-to-end novel receipt recognition system for capturing effective information from receipts (CEIR). The main contributions of this research part are divided into three parts. First, this research develops a preprocessing method for receipt images. Second, the modified connectionist text proposal network is introduced to execute text detection. Third, the CEIR combines the convolutional recurrent neural network with the connectionist temporal classification with maximum entropy regularization as a loss function to update the weights in networks and extract the characters from receipt. The CEIR system is validated with the scanned receipts optical character recognition and information extraction (SROIE) database. Furthermore, the CEIR system has strong robustness and can be extended to a variety of different scenarios beyond receipts. For the convolutional recurrent structure application of land use image classification, this dissertation comes up with a novel deep learning model for land use classification, the convolutional recurrent land use classifier (CRLUC), which further improves the accuracy in classifying remote sensing land use images. Besides, the convolutional fully-connected neural networks with hard sample memory pool structure (CFMP) is invented to tackle the remote sensing land use image classification tasks. The CRLUC and CFMP algorithm performances are tested in popular datasets. Experimental studies show the proposed algorithms can classify images with higher accuracy and fewer training episodes compared to popular image classification algorithms.

Identiferoai:union.ndltd.org:unt.edu/info:ark/67531/metadc1873860
Date12 1900
CreatorsXie, Dong
ContributorsBailey, Colleen, Namuduri, Kamesh, Guturu, Parthasarathy, Chamadia, Shubham
PublisherUniversity of North Texas
Source SetsUniversity of North Texas
LanguageEnglish
Detected LanguageEnglish
TypeThesis or Dissertation
Formatx, 63 pages, Text
RightsPublic, Xie, Dong, Copyright, Copyright is held by the author, unless otherwise noted. All rights Reserved.

Page generated in 0.0022 seconds