Global ETD Search

1	Mathematical Expression Detection and Segmentation in Document Images Bruce, Jacob Robert 19 March 2014 (has links) Various document layout analysis techniques are employed in order to enhance the accuracy of optical character recognition (OCR) in document images. Type-specific document layout analysis involves localizing and segmenting specific zones in an image so that they may be recognized by specialized OCR modules. Zones of interest include titles, headers/footers, paragraphs, images, mathematical expressions, chemical equations, musical notations, tables, circuit diagrams, among others. False positive/negative detections, oversegmentations, and undersegmentations made during the detection and segmentation stage will confuse a specialized OCR system and thus may result in garbled, incoherent output. In this work a mathematical expression detection and segmentation (MEDS) module is implemented and then thoroughly evaluated. The module is fully integrated with the open source OCR software, Tesseract, and is designed to function as a component of it. Evaluation is carried out on freely available public domain images so that future and existing techniques may be objectively compared. / Master of Science document layout analysis optical character recognition document image type-specific layout analysis
2	GENERATIVE LARGE-SCALE URBAN LAYOUT ANALYSIS AND SYNTHESIS Liu He (20376051) 05 December 2024 (has links) <p dir="ltr">A building layout consists of a set of buildings in city blocks defined by a network of roads. Modeling and generating large-scale urban building layouts is of significant interest in computer vision, computer graphics, and urban applications. Researchers seek to obtain building features (e.g. building shapes, counts, and areas) at large scales. However, data quality and data equality challenge the generation and extraction of building features. Blurriness, occlusions, and noise from prevailing satellite images severely hinders performance of image segmentation, super-resolution, or deep-learning based translation networks. Moreover, large-scale urban layout generation struggles with complex and arbitrary shapes of building layouts, and context-sensitive nature of the city morphology, which prior approaches have not considered. Facing the challenges of data quality, generation robustness, and context-sensitivity of urban layout generation, In this thesis, we first address the data quality problem by combing globally-available satellite images and spatial geometric feature datasets, in order to create a generative modeling framework that enables obtaining significantly improved accuracy in per-building feature estimation as well as generation of visually plausible building footprints. Secondly, for generation robustness, We observe that building layouts are discrete structures, consisting of multiple rows of buildings of various shapes, and are amenable to skeletonization for mapping arbitrary city block shapes to a canonical form. In that, we propose a fully automatic approach to building layout generation using graph attention networks. The method generates realistic urban layouts given arbitrary road networks, and enables conditional generation based on learned priors. Nevertheless, we propose the approach addresses context-sensitivity by leveraging a canonical graph representation for the entire city, which facilitates scalability and captures the multi-layer semantics inherent in urban layouts. We introduce a novel graph-based masked autoencoder (GMAE) for city-scale urban layout generation. The method encodes attributed buildings, city blocks, communities and cities into a unified graph structure, enabling self-supervised masked training for graph autoencoder. Additionally, we employ scheduled iterative sampling for 2.5D layout generation, prioritizing the generation of important city blocks and buildings. Our method has proven its robustness by large-scale prototypical experiments covering heterogeneous scenarios from dense urban to sparse rural. It achieves good realism, semantic consistency, and correctness across the heterogeneous urban styles in 330 US cities. </p> Computer vision Computer graphics generative network Layout Analysis Layout Synthesis
3	Enhancing Layout Understanding via Human-in-the-Loop: A User Study on PDF-to-HTML Conversion for Long Documents Mao, Chenyu 24 March 2025 (has links) Document layout understanding often utilizes object detection to locate and parse document elements, enabling systems that convert documents into searchable and editable formats to enhance accessibility and usability. Nevertheless, the recognition results often contain errors that require manual correction due to small training dataset size, limitations of models, and defects in training annotations. However, many of these problems can be addressed via human review to improve correctness. We first improved our system by combining the previous Electronic Thesis/Dissertation (ETD) parsing tool and AI-aided annotation tool, providing instant and accurate file output. Then we used our new pipeline to investigate the effectiveness and efficiency of manual correction strategies in improving object detection accuracy through user studies, including 8 participants, comprising a balanced number of four STEM and four non-STEM researchers, all with some background in ETDs. Each participant was assigned correction tasks on a set of ETDs from both STEM and non-STEM disciplines to ensure comprehensive evaluation across different document types. We collected quantitative metrics, such as completion times, accuracy rates, number of wrong labels, and feedback through our post-survey, to assess the usability and performance of the manual correction process and to examine their relationship with users' academic backgrounds. Results demonstrate that manual adjustment significantly enhanced the accuracy of document element identification and classification, with experienced participants achieving superior correction precision. Furthermore, usability feedback revealed a strong correlation between user satisfaction and system design, providing valuable insights for future system enhancement and development. / Master of Science / With the development of technology, there is an increasing demand to make printed and scanned documents more accessible. Organizations such as universities and libraries have millions of valuable documents, including theses, dissertations, and research papers, which exist only in PDF, often as a scanned format. While these works contain valuable knowledge, they can be challenging to search through or access, especially for those with low vision. To solve this problem, we need computer systems that automatically recognize and convert different parts of these documents --- like titles, headings, paragraphs, and figures --- into more usable forms. Our research focuses on improving how these document recognition systems work by combining computer automation with human expertise. While computers can process documents quickly, they sometimes need more training data for complex document layouts. We developed a web-based tool allowing people to review the computer's work and correct errors, such as mislabeled sections or missed elements. We conducted a detailed study with 8 participants who used our correction tool, to understand how effective this human-computer collaboration could be. We carefully measured several aspects of their experience: how many pages they annotated in a fixed amount of time, how accurate their corrections were, and how they felt about using the tool. We also used a post-survey to gather feedback about their experience with the tool. The results were very encouraging. When humans reviewed and corrected the computer's work, the accuracy of document recognition improved significantly. We found that participants could effectively identify and fix errors in the computer's output, especially when the tool was easy to use. Higher user satisfaction was strongly linked to how intuitive and straightforward participants found the correction process. One useful finding was that this process creates a positive feedback loop. Every correction a person makes helps expand the training data available to the computer system, which means the system can learn from these corrections and gradually become better at recognizing similar elements in future documents, reducing the number of errors that need to be corrected over time. Our research offers insights into building advanced object detection systems incorporating computational efficiency with human review. The results boost the formulation of optimal strategies for developing user-centric interfaces and effective document repair operations. This work has practical implications for making academic and research documents more accessible to everyone, including those relying on screen readers or other assistive technologies. This research represents a step forward in making the vast knowledge of digital documents more accessible, searchable, and usable for all readers. By showing how humans and computers can work together effectively, we are helping to build better systems for preserving and sharing knowledge in the digital age. ETD deep learning object detection document layout analysis
4	A User Centered Design and Prototype of a Mobile Reading Device for the Visually Impaired Keefer, Robert B. 10 June 2011 (has links) No description available. Computer Science Document Image Processing Document Image Layout Analysis Voice User Interface User Interface Modeling
5	An algorithm to evaluate plant layout alternatives using the manufacturing process as a criterion Imam, Altaf S. January 1995 (has links) No description available. Engineering, Industrial Computer Algorithms
6	Analýza rozložení textu v historických dokumentech / Text Layout Analysis in Historical Documents Palacková, Bianca January 2021 (has links) The goal of this thesis is to design and implement algorithm for text layout analysis in historical documents. Neural network was used to solve this problem, specifically architecture Faster-RCNN. Dataset of 6 135 images with historical newspaper was used for training and testing. For purpose of the thesis four models of neural networks were trained: model for detection of words, headings, text regions and model for words detection based on position in line. Outputs from these models were processed in order to determine text layout in input image. A modified F-score metric was used for the evaluation. Based on this metric, the algorithm reached an accuracy almost 80 %.
7	Complex Document Parsing with Vision Language Models Yifei Hu (9193709) 17 December 2024 (has links) <p dir="ltr">This thesis explores the application of vision language models (VLMs) on document layout analysis (DLA) and optical character recognition (OCR). For document layout analysis, we found that VLMs excel at detecting text areas by leveraging their understanding of textual content, rather than relying solely on visual features. This approach proves more robust than traditional object detection methods, particularly for text-rich images typical in document analysis tasks. In addressing OCR challenges, we identified a critical bottleneck: the lack of high-quality, document-level OCR datasets. To overcome this limitation, we developed a novel synthetic data generation pipeline. This pipeline utilizes Large Language Models to create OCR training data by rendering markdown source text into images. Our experiments show that VLMs trained on this synthetic data outperform models trained on conventional datasets. This research highlights the potential of VLMs in document understanding tasks and introduces an innovative approach to generating training data for OCR. Our findings suggest that leveraging the dual image-text understanding capabilities of VLMs, combined with strategically generated synthetic data, can significantly advance the state of the art in document layout analysis and OCR.</p> Natural language processing Parsing Information Optical Character Recognition (OCR) layout analysis
8	Comparative study of table layout analysis : Layout analysis solutions study for Swedish historical hand-written document Liang, Xusheng January 2019 (has links) Background. Nowadays, information retrieval system become more and more popular, it helps people retrieve information more efficiently and accelerates daily task. Within this context, Image processing technology play an important role that help transcribing content in printed or handwritten documents into digital data in information retrieval system. This transcribing procedure is called document digitization. In this transcribing procedure, image processing technique such as layout analysis and word recognition are employed to segment the document content and transcribe the image content into words. At this point, a Swedish company (ArkivDigital® AB) has a demand to transcribe their document data into digital data. Objectives. In this study, the aim is to find out effective solution to extract document layout regard to the Swedish handwritten historical documents, which are featured by their tabular forms containing the handwritten content. In this case, outcome of application of OCRopus, OCRfeeder, traditional image processing techniques, machine learning techniques on Swedish historical hand-written document is compared and studied. Methods. Implementation and experiment are used to develop three comparative solutions in this study. One is Hessian filtering with mask operation; another one is Gabor filtering with morphological open operation; the last one is Gabor filtering with machine learning classification. In the last solution, different alternatives were explored to build up document layout extraction pipeline. Hessian filter and Gabor filter are evaluated; Secondly, filter images with the better filter evaluated at previous stage, then refine the filtered image with Hough line transform method. Third, extract transfer learning feature and custom feature. Fourth, feed classifier with previous extracted features and analyze the result. After implementing all the solutions, sample set of the Swedish historical handwritten document is applied with these solutions and compare their performance with survey. Results. Both open source OCR system OCRopus and OCRfeeder fail to deliver the outcome due to these systems are designed to handle general document layout instead of table layout. Traditional image processing solutions work in more than a half of the cases, but it does not work well. Combining traditional image process technique and machine leaning technique give the best result, but with great time cost. Conclusions. Results shows that existing OCR system cannot carry layout analysis task in our Swedish historical handwritten document. Traditional image processing techniques are capable to extract the general table layout in these documents. By introducing machine learning technique, better and more accurate table layout can be extracted, but comes with a bigger time cost. / Scalable resource-efficient systems for big data analytics layout analysis pattern recognition document digitalization table layout extraction transfer learning machine learning image processing Hessian filter Gabor filter Computer Sciences Datavetenskap (datalogi)
9	Semantic Segmentation of Historical Document Images Using Recurrent Neural Networks Ahrneteg, Jakob, Kulenovic, Dean January 2019 (has links) Background. This thesis focuses on the task of historical document semantic segmentation with recurrent neural networks. Document semantic segmentation involves the segmentation of a page into different meaningful regions and is an important prerequisite step of automated document analysis and digitisation with optical character recognition. At the time of writing, convolutional neural network based solutions are the state-of-the-art for analyzing document images while the use of recurrent neural networks in document semantic segmentation has not yet been studied. Considering the nature of a recurrent neural network and the recent success of recurrent neural networks in document image binarization, it should be possible to employ a recurrent neural network for document semantic segmentation and further achieve high performance results. Objectives. The main objective of this thesis is to investigate if recurrent neural networks are a viable alternative to convolutional neural networks in document semantic segmentation. By using a combination of a convolutional neural network and a recurrent neural network, another objective is also to determine if the performance of the combination can improve upon the existing case of only using the recurrent neural network. Methods. To investigate the impact of recurrent neural networks in document semantic segmentation, three different recurrent neural network architectures are implemented and trained while their performance are further evaluated with Intersection over Union. Afterwards their segmentation result are compared to a convolutional neural network. By performing pre-processing on training images and multi-class labeling, prediction images are ultimately produced by the employed models. Results. The results from the gathered performance data shows a 2.7% performance difference between the best recurrent neural network model and the convolutional neural network. Notably, it can be observed that this recurrent neural network model has a more consistent performance than the convolutional neural network but comparable performance results overall. For the other recurrent neural network architectures lower performance results are observed which is connected to the complexity of these models. Furthermore, by analyzing the performance results of a model using a combination of a convolutional neural network and a recurrent neural network, it can be noticed that the combination performs significantly better with a 4.9% performance increase compared to the case with only using the recurrent neural network. Conclusions. This thesis concludes that recurrent neural networks are likely a viable alternative to convolutional neural networks in document semantic segmentation but that further investigation is required. Furthermore, by combining a convolutional neural network with a recurrent neural network it is concluded that the performance of a recurrent neural network model is significantly increased. / Bakgrund. Detta arbete handlar om semantisk segmentering av historiska dokument med recurrent neural network. Semantisk segmentering av dokument inbegriper att dela in ett dokument i olika regioner, något som är viktigt för att i efterhand kunna utföra automatisk dokument analys och digitalisering med optisk teckenläsning. Vidare är convolutional neural network det främsta alternativet för bearbetning av dokument bilder medan recurrent neural network aldrig har använts för semantisk segmentering av dokument. Detta är intressant eftersom om vi tar hänsyn till hur ett recurrent neural network fungerar och att recurrent neural network har uppnått mycket bra resultat inom binär bearbetning av dokument, borde det likväl vara möjligt att använda ett recurrent neural network för semantisk segmentering av dokument och även här uppnå bra resultat. Syfte. Syftet med arbetet är att undersöka om ett recurrent neural network kan uppnå ett likvärdigt resultat jämfört med ett convolutional neural network för semantisk segmentering av dokument. Vidare är syftet även att undersöka om en kombination av ett convolutional neural network och ett recurrent neural network kan ge ett bättre resultat än att bara endast använda ett recurrent neural network. Metod. För att kunna avgöra om ett recurrent neural network är ett lämpligt alternativ för semantisk segmentering av dokument utvärderas prestanda resultatet för tre olika modeller av recurrent neural network. Därefter jämförs dessa resultat med prestanda resultatet för ett convolutional neural network. Vidare utförs förbehandling av bilder och multi klassificering för att modellerna i slutändan ska kunna producera mätbara resultat av uppskattnings bilder. Resultat. Genom att utvärdera prestanda resultaten för modellerna kan vi i en jämförelse med den bästa modellen och ett convolutional neural network uppmäta en prestanda skillnad på 2.7%. Noterbart i det här fallet är att den bästa modellen uppvisar en jämnare fördelning av prestanda. För de två modellerna som uppvisade en lägre prestanda kan slutsatsen dras att deras utfall beror på en lägre modell komplexitet. Vidare vid en jämförelse av dessa två modeller, där den ena har en kombination av ett convolutional neural network och ett recurrent neural network medan den andra endast har ett recurrent neural network uppmäts en prestanda skillnad på 4.9%. Slutsatser. Resultatet antyder att ett recurrent neural network förmodligen är ett lämpligt alternativ till ett convolutional neural network för semantisk segmentering av dokument. Vidare dras slutsatsen att en kombination av de båda varianterna bidrar till ett bättre prestanda resultat. semantic segmentation page segmentation recurrent neural network layout analysis semantisk segmentering dokument segmentering recurrent neural network layout analys Computer Sciences Datavetenskap (datalogi)
10	Layout Analysis on modern Newspapers using the Object Detection model Faster R-CNN Funkquist, Mikaela January 2022 (has links) As society is becoming more and more digitized the amount of digital data is increasing rapidly. Newspapers are one example of this, that many Libraries around the world are storing as digital images. This enables a great opportunity for research on Newspapers, and a particular research area is Document Layout Analysis where one divides the document into different segments and classifies them. In this thesis modern Newspaper pages, provided by KBLab, were used to investigate how well a Deep Learning model developed for General Object Detection performs in this area. In particular the Faster R-CNN Object detection model was trained on manually annotated newspaper pages from two different Swedish publishers, namely Dagens Nyheter and Aftonbladet. All newspaper pages were taken from editions published between 2010 and 2020, meaning only modern newspapers were considered. The methodology in this thesis involved sampling editions from the given publishers and time periods and then manually annotating these by marking out the desired layout elements with bounding boxes. The classes considered were: headlines, subheadlines, decks, charts/infographics, photographs, pull quotes, cartoons, fact boxes, bylines/credits, captions, tableaus and tables. Given the annotated data, a Faster R-CNN with a ResNet-50-FPN backbone was trained on both the Dagens Nyheter and Aftonbladet train sets and then evaluated on different test set. Results such as a mAP0.5:0.95 of 0.6 were achieved for all classes, while class-wise evaluation indicate precisions around 0.8 for some classes such as tableaus, decks and photographs. / I takt med att samhället blir mer och mer digitaliserat ökar mängden digital data snabbt. Tidningar är ett exempel på detta, som många bibliotek runt om i världen lagrar som digitala bilder. Detta möjliggör en stor möjlighet för forskning på tidningar, och ett särskilt forskningsområde är Dokument Layout Analys där man delar in dokumentet i olika segment och klassificerar dem. I denna avhandling användes moderna tidningssidor, tillhandahållna av KBLab, för att undersöka hur väl en djupinlärnings-modell utvecklad för generell Objektdetektering presterar inom detta område. Mer precist, tränades en Faster R-CNN Objektdetekteringsmodell på manuellt annoterade tidningssidor från två olika svenska förlag, nämligen Dagens Nyheter och Aftonbladet. Alla tidningssidor togs från utgåvor som publicerats mellan 2010 och 2020, vilket innebär att endast moderna tidningar behandlades. Metodiken i detta examensarbete innebar att först göra ett urval av utgåvor från givna förlag och tidsperioder och sedan manuellt annotera dessa genom att markera ut önskade layoutelement med begränsningsrutor. Klasserna som användes var: rubriker, underrubriker, ingress, diagram/infografik, fotografier, citat, tecknade serier, faktarutor, författares signatur, bildtexter, tablåer och tabeller. Givet den annoterade datan, tränades en Faster R-CNN med en ResNet-50-FPN ryggrad på både Dagens Nyheter och Aftonbladet träningsdatan och sedan utvärderades dem på olika testset. Resultat som mAP0.5:0.95 på 0.6 uppnåddes för alla klasser, medan klassvis utvärdering indikerar precision kring 0.8 för vissa klasser som tablåer, ingresser och fotografier. Document Layout Analysis Newspapers Object Detection Faster R-CNN Deep Learning Dokument Layout Analys Tidningar Objektdetektering Faster R-CNN Djupinlärning Computer and Information Sciences Data- och informationsvetenskap

Search results