Spelling suggestions: "subject:"table detection"" "subject:"cable detection""
1 |
High Precision Deep Learning-Based Tabular Data ExtractionJiang, Ji Chu 21 January 2021 (has links)
The advancements of AI methodologies and computing power enables automation and propels the Industry 4.0 phenomenon. Information and data are digitized more than ever, millions of documents are being processed every day, they are fueled by the growth in institutions, organizations, and their supply chains. Processing documents is a time consuming laborious task. Therefore automating data processing is a highly important task for optimizing supply chains efficiency across all industries. Document analysis for data extraction is an impactful field, this thesis aims to achieve the vital steps in an ideal data extraction pipeline. Data is often stored in tables since it is a structured formats and the user can easily associate values and attributes. Tables can contain vital information from specifications, dimensions, cost etc. Therefore focusing on table analysis and recognition in documents is a cornerstone to data extraction.
This thesis applies deep learning methodologies for automating the two main problems within table analysis for data extraction; table detection and table structure detection. Table detection is identifying and localizing the boundaries of the table. The output of the table detection model will be inputted into the table structure detection model for structure format analysis. Therefore the output of the table detection model must have high localization performance otherwise it would affect the rest of the data extraction pipeline. Our table detection improves bounding box localization performance by incorporating a Kullback–Leibler loss function that calculates the divergence between the probabilistic distribution between ground truth and predicted bounding boxes. As well as adding a voting procedure into the non-maximum suppression step to produce better localized merged bounding box proposals. This model improved precision of tabular detection by 1.2% while achieving the same recall as other state-of-the-art models on the public ICDAR2013 dataset. While also achieving state-of-the-art results of 99.8% precision on the ICDAR2017 dataset. Furthermore, our model showed huge improvements espcially at higher intersection over union (IoU) thresholds; at 95% IoU an improvement of 10.9% can be seen for ICDAR2013 dataset and an improvement of 8.4% can be seen for ICDAR2017 dataset.
Table structure detection is recognizing the internal layout of a table. Often times researchers approach this through detecting the rows and columns. However, in order for correct mapping of each individual cell data location in the semantic extraction step the rows and columns would have to be combined and form a matrix, this introduces additional degrees of error. Alternatively we propose a model that directly detects each individual cell. Our model is an ensemble of state-of-the-art models; Hybird Task Cascade as the detector and dual ResNeXt101 backbones arranged in a CBNet architecture. There is a lack of quality labeled data for table cell structure detection, therefore we hand labeled the ICDAR2013 dataset, and we wish to establish a strong baseline for this dataset. Our model was compared with other state-of-the-art models that excelled at table or table structure detection. Our model yielded a precision of 89.2% and recall of 98.7% on the ICDAR2013 cell structure dataset.
|
2 |
End-to-End Tabular Information Extraction in Datasheets with Deep LearningKara, Ertugrul 09 July 2019 (has links)
The advent of Industry 4.0 phenomenon has been transforming the information management regarding the specifications of electronic components.
This change affects many organizations, including global supply chains that optimizes many product chains, such as raw materials or electronic components.
Supply chains consist of thousands of manufacturers and connect them to other organizations and end user, and they include billions of distinct components.
The digitization of critical information has to be carried out automatically since there are millions of documents.
Although the documents vary greatly in shape and style, the essential information is usually presented in the tables in a condensed format.
Extracting the structured information from tables are done by human operators, which costs human effort, time and corporate resources.
Based on the motivation that AI-based solutions are automating many processes, this thesis proposes to use deep learning-based solutions for three main problems: (i) table detection, (ii) table internal structure detection and (iii) End-to-End (E2E) tabular structure detection.
To this end, deep learning models are trained mostly with public datasets, and a private dataset (after labelling 2000+ documents) which was provided to us by our industry partner.
To achieve accurate table detection, we propose a method based on the successful Mask-Region-Based Convolutional Neural Network (Mask-RCNN) instance segmentation model.
With some modifications to the training set labels, we have achieved state-of-the-art detection rates with 99% AP and 100% recall.
We use the PASCAL Visual Object Classes (VOC) 11-point Average Precision (AP) metric to compare the evaluated deep learning-based methods.
Detecting tables is the initial step towards semantic modelling of e-components. Therefore, the structure should also be detected in order to extract information.
With this in mind, we introduce another method based on the Mask-RCNN model, which is able to detect structure at a with around 96% AP.
Combining these two networks, or developing a new model is a necessity.
To this end, inspired by the success of Mask-RCNN models, we introduce the following Mask-RCNN based models to realize E2E tabular structure detection:
Stitched E2E model achieved by bridging the output of table detection model into the structure detection model, attained more than 77% AP on the difficult public UNLV dataset with various post-processing steps applied when bridging the two network. Single-pass E2E detection networks were able to attain a higher AP of 86% but with lower recall.
This thesis concludes that deep learning-based object detection and instance segmentation networks can accomplish state-of-the-art performance.
|
3 |
Artificial Neural Networks-Driven High Precision Tabular Information Extraction from DatasheetsFernandes, Johan 11 March 2022 (has links)
Global organizations have adopted Industry 4.0 practices to stay viable through the information shared through billions of digital documents. The information in such documents is vital to the daily functioning of such organizations. Most critical information is laid out in tabular format in order to provide the information in a concise manner. Extracting this critical data and providing access to the latest information can help institutions to make evidence based and data driven decisions. Assembling such data for analysis can further enable organizations to automate certain processes such as manufacturing. A generalized solution for table text extraction would have to handle the variations in the page content and table layouts in order to accurately extract the text. We hypothesize that a table text extraction pipeline can extract this data in three stages. The first stage would involve identifying the images that contain tables and detecting the table region. The second stage would consider the detected table region and detect the rows and columns of the table. The last stage would involve extracting the text from the cell locations generated by the intersecting lines of the detected rows and columns. For first stage of the pipeline, we propose TableDet: a deep learning (artificial neural network) based methodology to solve table detection and table image classification in datasheet (document) images in a single inference. TableDet utilizes a Cascade R-CNN architecture with Complete IOU (CIOU) loss at each box head and a deformable convolution backbone to capture the variations of tables that appear at multiple scales and orientations. It also detects text and figures to enhance its table detection performance. We demonstrate the effectiveness of training TableDet with a dual-step transfer learning process and fine-tuning it with Table Aware Cutout (TAC) augmented images. TableDet achieves the highest F1 score for table detection against state-of-the-art solutions on ICDAR 2013 (complete set), ICDAR 2017 (test set) and ICDAR 2019 (test set) with 100%, 99.3% and 95.1% respectively. We show that the enhanced table detection performance can be utilized to address the table image classification task with the addition of a classification head which comprises of 3 conditions. For the table image classification task TableDet achieves 100% recall and above 92% precision on three test sets. These classification results indicate that all images with tables along with a significantly reduced number of images without tables would be promoted to the next stage of the table text extraction pipeline. For the second stage we propose TableStrDet, a deep learning (artificial neural network) based approach to recognize the structure of the detected tables regions from stage 1 by detecting and classifying rows and columns. TableStrDet comprises of two Cascade R-CNN architectures each with a deformable backbone and Complete IOU loss to improve their detection performance. One architecture detects and classifies columns as regular columns (column without a merged cell) and irregular columns (group of regular columns that share a merged cell). The second architecture detects and classifies rows as regular rows (row without a merged cell) and irregular rows (group of regular rows that share a merged cell). Both architectures work in parallel to provide the results in a single inference. We show that utilizing TableStrDet to detect four classes of objects enhances the quality of table structure detection by capturing table contents that may or may not have hierarchical layouts on two public test sets. Under the TabStructDB test set we achieve 72.7% and 78.5% weighted average F1 score for rows and columns respectively. On the ICDAR 2013 test set we achieve 90.5% and 89.6% weighted average F1 score for rows and columns respectively. Furthermore, we show that TableStrDet has a higher generalization potential on the available datasets.
|
4 |
Table Understanding for Information RetrievalPande, Ashwini K. 03 September 2002 (has links)
This thesis proposes a novel approach for finding tables in text files containing a mixture of unstructured and structured text. Tables may be arbitrarily complex because the data in the tables may themselves be tables and because the grouping of data elements displayed in a table may be very complex. Although investigators have proposed competence models to explain the structure of tables, there are no computationally feasible performance models for detecting and parsing general structures in real data. Our emphasis is placed on the investigation of a new statistical procedure for detecting basic tables in plain text documents. The main task here is defining and testing this theory in the context of the Odessa Digital Library. / Master of Science
|
5 |
Intégration holistique et entreposage automatique des données ouvertes / Holistic integration and automatic warehousing of open dataMegdiche Bousarsar, Imen 10 December 2015 (has links)
Les statistiques présentes dans les Open Data ou données ouvertes constituent des informations utiles pour alimenter un système décisionnel. Leur intégration et leur entreposage au sein du système décisionnel se fait à travers des processus ETL. Il faut automatiser ces processus afin de faciliter leur accessibilité à des non-experts. Ces processus doivent pallier aux problèmes de manque de schémas, d'hétérogénéité structurelle et sémantique qui caractérisent les données ouvertes. Afin de répondre à ces problématiques, nous proposons une nouvelle démarche ETL basée sur les graphes. Pour l'extraction du graphe d'un tableau, nous proposons des activités de détection et d'annotation automatiques. Pour la transformation, nous proposons un programme linéaire pour résoudre le problème d'appariement holistique de données structurelles provenant de plusieurs graphes. Ce modèle fournit une solution optimale et unique. Pour le chargement, nous proposons un processus progressif pour la définition du schéma multidimensionnel et l'augmentation du graphe intégré. Enfin, nous présentons un prototype et les résultats d'expérimentations. / Statistical Open Data present useful information to feed up a decision-making system. Their integration and storage within these systems is achieved through ETL processes. It is necessary to automate these processes in order to facilitate their accessibility to non-experts. These processes have also need to face out the problems of lack of schemes and structural and sematic heterogeneity, which characterize the Open Data. To meet these issues, we propose a new ETL approach based on graphs. For the extraction, we propose automatic activities performing detection and annotations based on a model of a table. For the transformation, we propose a linear program fulfilling holistic integration of several graphs. This model supplies an optimal and a unique solution. For the loading, we propose a progressive process for the definition of the multidimensional schema and the augmentation of the integrated graph. Finally, we present a prototype and the experimental evaluations.
|
6 |
Tabular Information Extraction from Datasheets with Deep Learning for Semantic ModelingAkkaya, Yakup 22 March 2022 (has links)
The growing popularity of artificial intelligence and machine learning has led to the adop-
tion of the automation vision in the industry by many other institutions and organizations.
Many corporations have made it their primary objective to make the delivery of goods and
services and manufacturing in a more efficient way with minimal human intervention. Au-
tomated document processing and analysis is also a critical component of this cycle for
many organizations that contribute to the supply chain. The massive volume and diver-
sity of data created in this rapidly evolving environment make this a highly desired step.
Despite this diversity, important information in the documents is provided in the tables.
As a result, extracting tabular data is a crucial aspect of document processing.
This thesis applies deep learning methodologies to detect table structure elements for
the extraction of data and preparation for semantic modelling. In order to find optimal
structure definition, we analyzed the performance of deep learning models in different
formats such as row/column and cell. The combined row and column detection models
perform poorly compared to other models’ detection performance due to the highly over-
lapping nature of rows and columns. Separate row and column detection models seem
to achieve the best average F1-score with 78.5% and 79.1%, respectively. However, de-
termining cell elements from the row and column detections for semantic modelling is
a complicated task due to spanning rows and columns. Considering these facts, a new
method is proposed to set the ground-truth information called a content-focused annota-
tion to define table elements better. Our content-focused method is competent in handling
ambiguities caused by huge white spaces and lack of boundary lines in table structures;
hence, it provides higher accuracy.
Prior works have addressed the table analysis problem under table detection and table
structure detection tasks. However, the impact of dataset structures on table structure
detection has not been investigated. We provide a comparison of table structure detection
performance with cropped and uncropped datasets. The cropped set consists of only
table images that are cropped from documents assuming tables are detected perfectly.
The uncropped set consists of regular document images. Experiments show that deep
learning models can improve the detection performance by up to 9% in average precision
and average recall on the cropped versions. Furthermore, the impact of cropped images is
negligible under the Intersection over Union (IoU) values of 50%-70% when compared to
the uncropped versions. However, beyond 70% IoU thresholds, cropped datasets provide
significantly higher detection performance.
|
7 |
Détection de tableaux dans des documents : une étude de TableBankYockell, Eugénie 04 1900 (has links)
L’extraction d’information dans des documents est une nécessité, particulièrement dans
notre ère actuelle où il est commun d’employer un téléphone portable pour photographier
des documents ou des factures. On trouve aussi une utilisation répandue de documents
PDF qui nécessite de traiter une imposante quantité de documents digitaux. Par leur
nature, les données des documents PDF sont complexes à extraire, nécessitant d’être
analysés comme des images. Dans cette recherche, on se concentre sur une information
particulière à prélever: des tableaux. En effet, les tableaux retrouvés dans les docu-
ments représentent une entité significative, car ils contiennent des informations décisives.
L’utilisation de modèles neuronaux pour performer des extractions automatiques permet
considérablement d’économiser du temps et des efforts.
Dans ce mémoire, on définit les métriques, les modèles et les ensembles de données
utilisés pour la tâche de détection de tableaux. On se concentre notamment sur l’étude
des ensembles de données TableBank et PubLayNet, en soulignant les problèmes d’an-
notations présents dans l’ensemble TableBank. On relève que différentes combinaisons
d’ensembles d’entraînement avec TableBank et PubLayNet semblent améliorer les perfor-
mances du modèle Faster R-CNN, ainsi que des méthodes d’augmentations de données.
On compare aussi le modèle de Faster R-CNN avec le modèle CascadeTabNet pour la
détection de tableaux où ce premier demeure supérieur.
D’autre part, on soulève un enjeu qui est peu discuté dans la tâche de détection
d’objets, soit qu’il existe une trop grande quantité de métriques. Cette problématique
rend la comparaison de modèles ardue. On génère ainsi les résultats de modèles selon
plusieurs métriques afin de démontrer qu’elles conduisent généralement vers différents
modèles gagnants, soit le modèle ayant les meilleures performances. On recommande
aussi les métriques les plus pertinentes à observer pour la détection de tableaux, c’est-à-
dire APmedium/APmedium, Pascal AP85 ou COCO AP85 et la métrique de TableBank. / Extracting information from documents is a necessity, especially in today’s age where
it is common to use a cell phone to photograph documents or invoices. There is also
the widespread use of PDF documents that requires processing a large amount of digital
documents. Due to their nature, the data in PDF documents are complex to retrieve,
needing to be analyzed as images. In this research, we focus on a particular information to
be extracted: tables. Indeed, the tables found in documents represent a significant entity,
as they contain decisive information. The use of neural networks to perform automatic
retrieval saves time and effort.
In this research, the metrics, models and datasets used for the table detection task are
defined. In particular, we focus on the study of the TableBank and PubLayNet datasets,
highlighting the problems of annotations present in the TableBank set. We point out that
different combinations of training sets using TableBank and PubLayNet appear to improve
the performance of the Faster R-CNN model, as well as data augmentation methods. We
also compare the Faster R-CNN model with the CascadeTabNet model for table detection
where the former remains superior.
In addition, we raise an issue that is not often discussed in the object detection task,
namely that there are too many metrics. This problem makes model comparison difficult.
We therefore generate results from models with several metrics in order to demonstrate
the influence of these metrics in defining the best performing model. We also recommend
the most relevant metrics to observe for table detection, APmedium/APmedium, Pascal
AP85 or COCO AP85 and the TableBank metric.
|
Page generated in 0.0991 seconds