• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 8
  • 4
  • 1
  • 1
  • Tagged with
  • 16
  • 16
  • 5
  • 5
  • 5
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

CONFIRM: Clustering of Noisy Form Images using Robust Matching

Tensmeyer, Christopher Alan 01 May 2016 (has links)
Identifying the type of a scanned form greatly facilitates processing, including automated field segmentation and field recognition. Contrary to the majority of existing techniques, we focus on unsupervised type identification, where the set of form types are not known apriori, and on noisy collections that contain very similar document types. This work presents a novel algorithm: CONFIRM (Clustering Of Noisy Form Images using Robust Matching), which simultaneously discovers the types in a collection of forms and assigns each form to a type. CONFIRM matches type-set text and rule lines between forms to create domain specific features, which we show outperform Bag of Visual Word (BoVW) features employed by the current state-of-the-art. To scale to large document collections, we use a bootstrap approach to clustering, where only a small subset of the data is clustered directly, while the rest of the data is assigned to clusters in linear time. We show that CONFIRM reduces average cluster impurity by 44% compared to the state-of-the art on 5 collections of historical forms that contain significant noise. We also show competitive performance on the relatively clean NIST tax form collection.
2

Evaluation of word segmentation algorithms applied on handwritten text

Isaac, Andreas January 2020 (has links)
The aim of this thesis is to build and evaluate how a word segmentation algorithm performs when extracting words from historical handwritten documents. Since historical documents often consist of background noise, the aim will also be to investigate whether applying a background removal algorithm will affect the final result or not. Three different types of historical handwritten documents are used to be able to compare the output when applying two different word segmentation algorithms. The result attained indicates that the background removal algorithm increases the accuracy obtained when using the word segmentation algorithm. The word segmentation algorithm developed successfully manages to extract a majority of the words while the obtained algorithm has difficulties for some documents. A conclusion made was that the type of document plays the key role in whether a poor result will be obtained or not. Hence, different algorithms may be needed rather than using one for all types of documents.
3

Naming Movement: Nomenclature and Ways of Knowing Dance

Kim, Sue In January 2011 (has links)
This study examines dance terminologies and documentation of Korean and French court dances, Jeongjae and Belle Dance, respectively. For Belle Dance, Raoul Feuillet's Chorégraphie (1700) and Pierre Rameau's Maître à Danser (1725) provide lists of movement terms, definitions of them, and instructions for how to enact them. For Jeongjae, Jeongjae mudo holgi, written in the nineteenth century, comprises diagrams and descriptions of dance movements. These sources have their own ways of converting dance movement into language, revealing the divergent perspectives toward body movement in each culture. Their divergent modes of documenting dance demonstrate the characteristic ways of expressing and constructing knowledge of body movement of their historical and cultural contexts. By comparing the terminologies and documentation that carry historically and culturally specific concepts, I explore underlying assumptions about what kinds of information are considered knowledge and preserved through articulation in words and graphic symbols. This study addresses the research question, what do dance terminologies and processes of documentation suggest about perspectives on dance movement in two distinct dance cultures. To articulate the differences, this study examines selected documents as a whole and dance terms in specific. The significance of characteristic features found in the textual analysis will be illuminated through an exploration of intertextual relationships between the dance texts and important sources of the period that focus on the body and how it is conceived in relation to the human being. This study suggests that, dance documents, which translate selected aspects of dance movement into words and graphic symbols, encapsulate historically and culturally specific ways of knowing dance movement. Intending to capture movement analytically and visually, Belle Dance treatises attempt to establish objective knowledge of dance. This mode of knowing corresponds to philosophical and practical milieus that constructed the theory of mind-body dualism, mathematical foundations of modern science, and reliance on sense perceptions. In contrast, Jeongjae documents take the performer's experience as the standard point of view, considering his or her inner experience as well as observable results of movement. Correspondingly, Korean traditional culture adhered to a holistic view of the body and promoted implicit expressions to describe body movements. / Dance
4

Um estudo comparativo de métodos de segmentação de documentos antigos / A comparative study of segmentation methods of historical documents

Yanque, Nury Yuleny Arosquipa 29 November 2018 (has links)
Há uma vasta quantidade de informação nos textos antigos manuscritos e tipografados, e grandes esforços para a digitalização e disponibilização desses documentos têm sido feitos nos últimos anos. No entanto, os sistemas de Reconhecimento Óptico de Caracteres (OCR) não têm grande sucesso nesses documentos por diversas razões, por exemplo, devido a defeitos por envelhecimento do papel, manchas, iluminação desigual, dobras, escrita do verso transparecendo na frente, pouco contraste entre texto e fundo, entre outros. Uma das etapas importantes para o sucesso de um OCR é a boa segmentação da parte escrita e do fundo da imagem (binarização) e essa etapa é particularmente sensível a esses efeitos que são próprios de documentos históricos. Tanto assim que nos últimos oito anos foram realizadas competições de métodos de binarização de documentos históricos que levaram ao avanço do estado da arte na área. Neste trabalho fizemos um estudo comparativo de diversos métodos de segmentação de documentos antigos e propusemos um método baseado em aprendizado de máquina que resgata as vantagens dos métodos heurísticos. Esse estudo abrangeu documentos históricos manuscritos e tipografados e foi comparado com os métodos do estado da arte via métricas usuais e via um sistema de OCR de código aberto. Os resultados obtidos pelo método proposto são comparáveis com os métodos do estado da arte respeito no resultado do OCR, mostrando algumas vantagens em imagens específicas. / There is a vast amount of information in the ancient handwritten and machine-printed texts, and great efforts for the digitization and availability of these documents have been made in recent years. However, Optical Character Recognition (OCR) systems do not have much success in these documents for a variety of reasons, for example, due to paper aging defects, faded ink, stains, uneven lighting, folds, bleed-through, gosthing, poor contrast between text and background, among others. One of the important steps for the success of an OCR system is the good segmentation of the written part and the background of the image (binarization) and this step is particularly sensitive to those defects that are typical of historical documents. So much so that in the last eight years a competition for the binarization methods of historical documents have been held which led to the advance of the state of the art in the area. In this work we have done a comparative study of several methods of segmentation of historical documents and propose a method based on machine learning that rescues the advantages of the heuristic methods. This study covered both handwritten and typography historical documents and was compared to state-of-the-art methods via DIBCO standard metrics and via an open source OCR system. The results obtained by the proposed method are comparable with the methods of the state of the art respect in the OCR result, showing some advantages in specific images.
5

Clustering of Image Search Results to Support Historical Document Recognition

Espinosa, Javier January 2014 (has links)
Context. Image searching in historical handwritten documents is a challenging problem in computer vision and pattern recognition. The amount of documents which have been digitalized is increasing each day, and the task to find occurrences of a selected sub-image in a collection of documents has special interest for historians and genealogist. Objectives. This thesis develops a technique for image searching in historical documents. Divided in three phases, first the document is segmented into sub-images according to the words on it. These sub-images are defined by a features vector with measurable attributes of its content. And based on these vectors, a clustering algorithm computes the distance between vectors to decide which images match with the selected by the user. Methods. The research methodology is experimentation. A quasi-experiment is designed based on repeated measures over a single group of data. The image processing, features selection, and clustering approach are the independent variables; whereas the accuracies measurements are the dependent variable. This design provides a measurement net based on a set of outcomes related to each other. Results. The statistical analysis is based on the F1 score to measure the accuracy of the experimental results. This test analyses the accuracy of the experiment regarding to its true positives, false positives, and false negatives detected. The average F-measure for the experiment conducted is F1 = 0.59, whereas the actual performance value of the method is matching ratio of 66.4%. Conclusions. This thesis provides a starting point in order to develop a search engine for historical document collections based on pattern recognition. The main research findings are focused in image enhancement and segmentation for degraded documents, and image matching based on features definition and cluster analysis.
6

Text Segmentation of Historical Degraded Handwritten Documents

Nina, Oliver 05 August 2010 (has links) (PDF)
The use of digital images of handwritten historical documents has increased in recent years. This has been possible through the Internet, which allows users to access a vast collection of historical documents and makes historical and data research more attainable. However, the insurmountable number of images available in these digital libraries is cumbersome for a single user to read and process. Computers could help read these images through methods known as Optical Character Recognition (OCR), which have had significant success for printed materials but only limited success for handwritten ones. Most of these OCR methods work well only when the images have been preprocessed by getting rid of anything in the image that is not text. This preprocessing step is usually known as binarization. The binarization of images of historical documents that have been affected by degradation and that are of poor image quality is difficult and continues to be a focus of research in the field of image processing. We propose two novel approaches to attempt to solve this problem. One combines recursive Otsu thresholding and selective bilateral filtering to allow automatic binarization and segmentation of handwritten text images. The other adds background normalization and a post-processing step to the algorithm to make it more robust and to work even for images that present bleed-through artifacts. Our results show that these techniques help segment the text in historical documents better than traditional binarization techniques.
7

História erudita e popular: edição de documentos históricos na obra de Capistrano de Abreu / An erudite and popular history: edition of historical documents in the work of Capistrano de Abreu

Santos, Pedro Afonso Cristovão dos 03 July 2009 (has links)
Nossa pesquisa estuda a edição de textos históricos e/ou historiográficos por João Capistrano de Abreu (1853-1927), prática inserida em um momento de intensa divulgação deste tipo de texto, por vários estudiosos de história e publicações (especializadas ou não), na Europa e nas Américas, movimento já em curso nas primeiras décadas do século XIX. Ocorrendo de várias formas, esta disponibilização ampla de documentos e obras historiográficas de outras épocas concorreu (como era, na maioria das vezes, seu propósito explícito) para estender o acesso a inúmeras peças antes restritas a arquivos e bibliotecas de difícil consulta, ou a colecionadores dotados do cabedal necessário para adquiri-las. Deste modo, popularizou, até certo ponto (dependendo da extensão do círculo letrado em cada contexto), a possibilidade de estudo e escrita da história. Tal divulgação, porém, como dissemos, poderia se dar de várias maneiras, incidindo sobre o divulgador determinadas marcas que, de certa forma, hierarquizavam-nos (do compilador de textos ao historiador propriamente dito), definindo seus graus de prestígio no meio intelectual. Por outro lado, a forma como estas edições de documentos ocorriam também marcava a escrita da história a partir deles; uma edição crítica, por exemplo, buscava facilitar a compreensão histórica do texto, e apresentava ao leitor sinais de uma dada historiografia, como o respeito à procedência dos documentos, por meio de citações rigorosas, e a importância da leitura das fontes no original. No caso particular de Capistrano de Abreu, a pesquisa desenvolve-se sob dupla perspectiva: seu posicionamento em relação a outras formas de divulgação de textos históricos e/ou historiográficos (e, conseqüentemente, em relação a outras formas de escrita da história), e o estudo das suas edições dos mesmos. / This study concerns the edition and publication of historical and/or historiographical texts by João Capistrano de Abreu (1853-1927), a practice that occurred during a moment of intense divulgation of that kind of text, in Europe and the Americas, by several historians and publications (specialized or not), in a movement that goes back at least to the first decades of the nineteenth century. Happening in many ways, this vast vulgarization of documents and historiographical works of different times concurred (as it was, most of the cases, its explicit purpose) to extend the access to many pieces previously restricted to archives and libraries of difficult consult, or to private collectors who could afford them. In this way, this process popularized, to the extent of the intellectual circle in each country, the possibility of study and writing of history. Such publications, however, could happen in different forms, giving its editor different degrees of prestige, and different attributions (from the mere compiler of texts to the historian). On the other hand, the way these documents were presented also contributed to mark the writing of history based on them. A critical edition, for instance, tried to provide a historical comprehension of the text, and introduced to the reader the constitutive signs of a certain historiography, such as the respect to the origin of a cited document, and the importance of the reading of original sources. In the case of Capistrano de Abreu, our research has been developed upon two perspectives: his position concerning different forms of text editions and publications (and, by that, to other forms of history-writing), and an analysis of his own editions.
8

História erudita e popular: edição de documentos históricos na obra de Capistrano de Abreu / An erudite and popular history: edition of historical documents in the work of Capistrano de Abreu

Pedro Afonso Cristovão dos Santos 03 July 2009 (has links)
Nossa pesquisa estuda a edição de textos históricos e/ou historiográficos por João Capistrano de Abreu (1853-1927), prática inserida em um momento de intensa divulgação deste tipo de texto, por vários estudiosos de história e publicações (especializadas ou não), na Europa e nas Américas, movimento já em curso nas primeiras décadas do século XIX. Ocorrendo de várias formas, esta disponibilização ampla de documentos e obras historiográficas de outras épocas concorreu (como era, na maioria das vezes, seu propósito explícito) para estender o acesso a inúmeras peças antes restritas a arquivos e bibliotecas de difícil consulta, ou a colecionadores dotados do cabedal necessário para adquiri-las. Deste modo, popularizou, até certo ponto (dependendo da extensão do círculo letrado em cada contexto), a possibilidade de estudo e escrita da história. Tal divulgação, porém, como dissemos, poderia se dar de várias maneiras, incidindo sobre o divulgador determinadas marcas que, de certa forma, hierarquizavam-nos (do compilador de textos ao historiador propriamente dito), definindo seus graus de prestígio no meio intelectual. Por outro lado, a forma como estas edições de documentos ocorriam também marcava a escrita da história a partir deles; uma edição crítica, por exemplo, buscava facilitar a compreensão histórica do texto, e apresentava ao leitor sinais de uma dada historiografia, como o respeito à procedência dos documentos, por meio de citações rigorosas, e a importância da leitura das fontes no original. No caso particular de Capistrano de Abreu, a pesquisa desenvolve-se sob dupla perspectiva: seu posicionamento em relação a outras formas de divulgação de textos históricos e/ou historiográficos (e, conseqüentemente, em relação a outras formas de escrita da história), e o estudo das suas edições dos mesmos. / This study concerns the edition and publication of historical and/or historiographical texts by João Capistrano de Abreu (1853-1927), a practice that occurred during a moment of intense divulgation of that kind of text, in Europe and the Americas, by several historians and publications (specialized or not), in a movement that goes back at least to the first decades of the nineteenth century. Happening in many ways, this vast vulgarization of documents and historiographical works of different times concurred (as it was, most of the cases, its explicit purpose) to extend the access to many pieces previously restricted to archives and libraries of difficult consult, or to private collectors who could afford them. In this way, this process popularized, to the extent of the intellectual circle in each country, the possibility of study and writing of history. Such publications, however, could happen in different forms, giving its editor different degrees of prestige, and different attributions (from the mere compiler of texts to the historian). On the other hand, the way these documents were presented also contributed to mark the writing of history based on them. A critical edition, for instance, tried to provide a historical comprehension of the text, and introduced to the reader the constitutive signs of a certain historiography, such as the respect to the origin of a cited document, and the importance of the reading of original sources. In the case of Capistrano de Abreu, our research has been developed upon two perspectives: his position concerning different forms of text editions and publications (and, by that, to other forms of history-writing), and an analysis of his own editions.
9

Automatic Transcription of Historical Documents : Transkribus as a Tool for Libraries, Archives and Scholars

Milioni, Nikolina January 2020 (has links)
Digital libraries and archives are major portals to rich sources of information. They undertake large-scale digitization to enhance their digital collections and offer users valuable text data. When it comes to handwritten documents, usually these are only provided as digitized images and not accompanied by their transcriptions. Text in non-machine-readable format restricts contemporary scholars to conduct research, especially by employing digital humanities approaches, such as distant reading and data mining. The purpose of this thesis is to evaluate Transkribus platform as a linguistic tool mainly developed for producing automatic transcriptions of handwritten documents. The results are correlated with the findings of a questionnaire distributed to libraries and archives across Europe to expand our knowledge on the policy they follow regarding manuscripts and transcription provision. A model for a specific writing style in Latin language is trained and the accuracy on various Latin handwritten pages is tested. Finally, the tool’s validation is discussed, as well as to what extent it meets the general needs of the cultural heritage institutions and of humanities scholars.
10

Improvement of Optical Character Recognition on Scanned Historical Documents Using Image Processing

Aula, Lara January 2021 (has links)
As an effort to improve accessibility to historical documents, digitization of historical archives has been an ongoing process at many institutions since the origination of Optical Character Recognition. The old, scanned documents can contain deteriorations acquired over time or caused by old printing methods. Common visual attributes seen on the documents are variations in style and font, broken characters, ink intensity, noise levels and damage caused by folding or ripping and more. Many of these attributes are disfavoring for modern Optical Character Recognition tools and can lead to failed character recognition. This study approaches stated problem by using image processing methods to improve the result of character recognition. Furthermore, common image quality characteristics of scanned historical documents with unidentifiable text are analyzed. The Optical Character Recognition tool used to conduct this research was the open-source Tesseract software. Image processing methods like Gaussian lowpass filtering, Otsu’s optimum thresholding method and morphological operations were used to prepare the historical documents for Tesseract. Using the Precision and Recall classification method, the OCR output was evaluated, and it was seen that the recall improved by 63 percentage points and the precision by 18 percentage points. This shows that using image pre-processing methods as an approach to increase the readability of historical documents for Optical Character Recognition tools is effective. Further it was seen that common characteristics that are especially disadvantageous for Tesseract are font deviations, occurrence of non-belonging objects, character fading, broken characters, and Poisson noise.

Page generated in 0.1069 seconds