• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3
  • 2
  • Tagged with
  • 5
  • 5
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Finite-state canonicalization techniques for historical German

Jurish, Bryan January 2011 (has links)
This work addresses issues in the automatic preprocessing of historical German input text for use by conventional natural language processing techniques. Conventional techniques cannot adequately account for historical input text due to conventional tools' reliance on a fixed application-specific lexicon keyed by contemporary orthographic surface form on the one hand, and the lack of consistent orthographic conventions in historical input text on the other. Historical spelling variation is treated here as an error-correction problem or "canonicalization" task: an attempt to automatically assign each (historical) input word a unique extant canonical cognate, thus allowing direct application-specific processing (tagging, parsing, etc.) of the returned canonical forms without need for any additional application-specific modifications. In the course of the work, various methods for automatic canonicalization are investigated and empirically evaluated, including conflation by phonetic identity, conflation by lemma instantiation heuristics, canonicalization by weighted finite-state rewrite cascade, and token-wise disambiguation by a dynamic Hidden Markov Model. / Diese Arbeit behandelt Themen der automatischen Vorverarbeitung historischen deutschen Textes für die Weiterverarbeitung durch konventionelle computerlinguistische Techniken. Konventionelle Techniken können historischen Text wegen des hohen Grads an graphematischer Variation in solchem Text ohne eine solche Vorverarbeitung nicht zufriedenstellend behandeln. Variation in der historischen Rechtschreibung wird hier als Fehlerkorrekturproblem oder "Kanonikalisierungsaufgabe" behandelt: ein Versuch, jedem (historischen) Eingabewort eine eindeutige extante Äquivalente zuzuordnen; so können konventionelle Techniken ohne weitere Modifikation direkt auf den gelieferten kanonischen Formen arbeiten. Verschiedene Methoden zur automatischen Kanonikalisierung werden im Rahmen dieser Arbeit untersucht, unter anderem Konflation durch phonetische Identität, Konflation durch Lemma-Instanziierungsheuristiken, Kanonikalisierung durch eine Kaskade gewichteter endlicher Transduktoren, und Disambiguiierung von Konflationskandidaten durch ein dynamisches Hidden Markov Modell.
2

Entity-Centric Text Mining for Historical Documents

Coll Ardanuy, Maria 07 July 2017 (has links)
No description available.
3

Spelling Normalisation and Linguistic Analysis of Historical Text for Information Extraction

Pettersson, Eva January 2016 (has links)
Historical text constitutes a rich source of information for historians and other researchers in humanities. Many texts are however not available in an electronic format, and even if they are, there is a lack of NLP tools designed to handle historical text. In my thesis, I aim to provide a generic workflow for automatic linguistic analysis and information extraction from historical text, with spelling normalisation as a core component in the pipeline. In the spelling normalisation step, the historical input text is automatically normalised to a more modern spelling, enabling the use of existing taggers and parsers trained on modern language data in the succeeding linguistic analysis step. In the final information extraction step, certain linguistic structures are identified based on the annotation labels given by the NLP tools, and ranked in accordance with the specific information need expressed by the user. An important consideration in my implementation is that the pipeline should be applicable to different languages, time periods, genres, and information needs by simply substituting the language resources used in each module. Furthermore, the reuse of existing NLP tools developed for the modern language is crucial, considering the lack of linguistically annotated historical data combined with the high variability in historical text, making it hard to train NLP tools specifically aimed at analysing historical text. In my evaluation, I show that spelling normalisation can be a very useful technique for easy access to historical information content, even in cases where there is little (or no) annotated historical training data available. For the specific information extraction task of automatically identifying verb phrases describing work in Early Modern Swedish text, 91 out of the 100 top-ranked instances are true positives in the best setting.
4

Rozpoznávání historických textů pomocí hlubokých neuronových sítí / Convolutional Networks for Historic Text Recognition

Kišš, Martin January 2018 (has links)
The aim of this work is to create a tool for automatic transcription of historical documents. The work is mainly focused on the recognition of texts from the period of modern times written using font Fraktur. The problem is solved with a newly designed recurrent convolutional neural networks and a Spatial Transformer Network. Part of the solution is also an implemented generator of artificial historical texts. Using this generator, an artificial data set is created on which the convolutional neural network for line recognition is trained. This network is then tested on real historical lines of text on which the network achieves up to 89.0 % of character accuracy. The contribution of this work is primarily the newly designed neural network for text line recognition and the implemented artificial text generator, with which it is possible to train the neural network to recognize real historical lines of text.
5

České království v kronice Otakara Štýrského / Czech Kingdom in The Chronicle of Otakar of Styria

Košátková, Anna January 2015 (has links)
The subject of the Masters thesis The Czech Kingdom in The Chronicle of Otakar of Styria is the history of Czech Kingdom in the Otokar of Styrias versified cronicle. The goal of this work is a comprehensive view of that cronicle as a historic source of cetral European history during the second half of the 13th and the beginning of the 14th century. The Masters thesis includes an evaluation the relationships between the central European sources of that time. It investigates both, the knetty question of the autor's live story and his motivation for writing a work around 100 000 verses. In particular chapters, various social groups, which the author focuses on, are examined (royal houses, aristocracy, burgher class, people). No particular social group can be considered in isolation. Thier interrelations are highlighted in the thesis. Following section introduces Otakar's description of certain central European regions (Austria, Styria, Carinthia, Hungary, Polen and the Holy Roman Empire), which is the foundation of my attempt to discover the cronicler's source and information base. Used method is based on the analysis of cronicle sources and the study of historical materials. The history of the Kingdom of Bohemia results from the above mentioned circumstances. Based on this approach, the thesis of...

Page generated in 0.1232 seconds