Global organizations have adopted Industry 4.0 practices to stay viable through the information shared through billions of digital documents. The information in such documents is vital to the daily functioning of such organizations. Most critical information is laid out in tabular format in order to provide the information in a concise manner. Extracting this critical data and providing access to the latest information can help institutions to make evidence based and data driven decisions. Assembling such data for analysis can further enable organizations to automate certain processes such as manufacturing. A generalized solution for table text extraction would have to handle the variations in the page content and table layouts in order to accurately extract the text. We hypothesize that a table text extraction pipeline can extract this data in three stages. The first stage would involve identifying the images that contain tables and detecting the table region. The second stage would consider the detected table region and detect the rows and columns of the table. The last stage would involve extracting the text from the cell locations generated by the intersecting lines of the detected rows and columns. For first stage of the pipeline, we propose TableDet: a deep learning (artificial neural network) based methodology to solve table detection and table image classification in datasheet (document) images in a single inference. TableDet utilizes a Cascade R-CNN architecture with Complete IOU (CIOU) loss at each box head and a deformable convolution backbone to capture the variations of tables that appear at multiple scales and orientations. It also detects text and figures to enhance its table detection performance. We demonstrate the effectiveness of training TableDet with a dual-step transfer learning process and fine-tuning it with Table Aware Cutout (TAC) augmented images. TableDet achieves the highest F1 score for table detection against state-of-the-art solutions on ICDAR 2013 (complete set), ICDAR 2017 (test set) and ICDAR 2019 (test set) with 100%, 99.3% and 95.1% respectively. We show that the enhanced table detection performance can be utilized to address the table image classification task with the addition of a classification head which comprises of 3 conditions. For the table image classification task TableDet achieves 100% recall and above 92% precision on three test sets. These classification results indicate that all images with tables along with a significantly reduced number of images without tables would be promoted to the next stage of the table text extraction pipeline. For the second stage we propose TableStrDet, a deep learning (artificial neural network) based approach to recognize the structure of the detected tables regions from stage 1 by detecting and classifying rows and columns. TableStrDet comprises of two Cascade R-CNN architectures each with a deformable backbone and Complete IOU loss to improve their detection performance. One architecture detects and classifies columns as regular columns (column without a merged cell) and irregular columns (group of regular columns that share a merged cell). The second architecture detects and classifies rows as regular rows (row without a merged cell) and irregular rows (group of regular rows that share a merged cell). Both architectures work in parallel to provide the results in a single inference. We show that utilizing TableStrDet to detect four classes of objects enhances the quality of table structure detection by capturing table contents that may or may not have hierarchical layouts on two public test sets. Under the TabStructDB test set we achieve 72.7% and 78.5% weighted average F1 score for rows and columns respectively. On the ICDAR 2013 test set we achieve 90.5% and 89.6% weighted average F1 score for rows and columns respectively. Furthermore, we show that TableStrDet has a higher generalization potential on the available datasets.
Identifer | oai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/43374 |
Date | 11 March 2022 |
Creators | Fernandes, Johan |
Contributors | Kantarci, Burak |
Publisher | Université d'Ottawa / University of Ottawa |
Source Sets | Université d’Ottawa |
Language | English |
Detected Language | English |
Type | Thesis |
Format | application/pdf |
Page generated in 0.0029 seconds