Global ETD Search

1	Finite-state canonicalization techniques for historical German Jurish, Bryan January 2011 (has links) This work addresses issues in the automatic preprocessing of historical German input text for use by conventional natural language processing techniques. Conventional techniques cannot adequately account for historical input text due to conventional tools' reliance on a fixed application-specific lexicon keyed by contemporary orthographic surface form on the one hand, and the lack of consistent orthographic conventions in historical input text on the other. Historical spelling variation is treated here as an error-correction problem or "canonicalization" task: an attempt to automatically assign each (historical) input word a unique extant canonical cognate, thus allowing direct application-specific processing (tagging, parsing, etc.) of the returned canonical forms without need for any additional application-specific modifications. In the course of the work, various methods for automatic canonicalization are investigated and empirically evaluated, including conflation by phonetic identity, conflation by lemma instantiation heuristics, canonicalization by weighted finite-state rewrite cascade, and token-wise disambiguation by a dynamic Hidden Markov Model. / Diese Arbeit behandelt Themen der automatischen Vorverarbeitung historischen deutschen Textes für die Weiterverarbeitung durch konventionelle computerlinguistische Techniken. Konventionelle Techniken können historischen Text wegen des hohen Grads an graphematischer Variation in solchem Text ohne eine solche Vorverarbeitung nicht zufriedenstellend behandeln. Variation in der historischen Rechtschreibung wird hier als Fehlerkorrekturproblem oder "Kanonikalisierungsaufgabe" behandelt: ein Versuch, jedem (historischen) Eingabewort eine eindeutige extante Äquivalente zuzuordnen; so können konventionelle Techniken ohne weitere Modifikation direkt auf den gelieferten kanonischen Formen arbeiten. Verschiedene Methoden zur automatischen Kanonikalisierung werden im Rahmen dieser Arbeit untersucht, unter anderem Konflation durch phonetische Identität, Konflation durch Lemma-Instanziierungsheuristiken, Kanonikalisierung durch eine Kaskade gewichteter endlicher Transduktoren, und Disambiguiierung von Konflationskandidaten durch ein dynamisches Hidden Markov Modell. Computerlinguistik Orthographie historischer Text Rechtschreibkorrektur computational linguistics orthography historical text spelling correction Language, Linguistics
2	Entity-Centric Text Mining for Historical Documents Coll Ardanuy, Maria 07 July 2017 (has links) No description available. 510 digital humanities text mining toponym disambiguation person name disambiguation historical text mining Informatik (PPN619939052)
3	Spelling Normalisation and Linguistic Analysis of Historical Text for Information Extraction Pettersson, Eva January 2016 (has links) Historical text constitutes a rich source of information for historians and other researchers in humanities. Many texts are however not available in an electronic format, and even if they are, there is a lack of NLP tools designed to handle historical text. In my thesis, I aim to provide a generic workflow for automatic linguistic analysis and information extraction from historical text, with spelling normalisation as a core component in the pipeline. In the spelling normalisation step, the historical input text is automatically normalised to a more modern spelling, enabling the use of existing taggers and parsers trained on modern language data in the succeeding linguistic analysis step. In the final information extraction step, certain linguistic structures are identified based on the annotation labels given by the NLP tools, and ranked in accordance with the specific information need expressed by the user. An important consideration in my implementation is that the pipeline should be applicable to different languages, time periods, genres, and information needs by simply substituting the language resources used in each module. Furthermore, the reuse of existing NLP tools developed for the modern language is crucial, considering the lack of linguistically annotated historical data combined with the high variability in historical text, making it hard to train NLP tools specifically aimed at analysing historical text. In my evaluation, I show that spelling normalisation can be a very useful technique for easy access to historical information content, even in cases where there is little (or no) annotated historical training data available. For the specific information extraction task of automatically identifying verb phrases describing work in Early Modern Swedish text, 91 out of the 100 top-ranked instances are true positives in the best setting. NLP for historical text spelling normalisation digital humanities information extraction SMT Levenshtein edit distance language technology computational linguistics
4	Rozpoznávání historických textů pomocí hlubokých neuronových sítí / Convolutional Networks for Historic Text Recognition Kišš, Martin January 2018 (has links) The aim of this work is to create a tool for automatic transcription of historical documents. The work is mainly focused on the recognition of texts from the period of modern times written using font Fraktur. The problem is solved with a newly designed recurrent convolutional neural networks and a Spatial Transformer Network. Part of the solution is also an implemented generator of artificial historical texts. Using this generator, an artificial data set is created on which the convolutional neural network for line recognition is trained. This network is then tested on real historical lines of text on which the network achieves up to 89.0 % of character accuracy. The contribution of this work is primarily the newly designed neural network for text line recognition and the implemented artificial text generator, with which it is possible to train the neural network to recognize real historical lines of text.
5	„Grammatik“ als Textsorte: Kontinuitäten und Diskontinuitäten in der Entstehung und Entwicklung der spanischen Grammatik (von 1492 bis 2009) Kistereva, Evgeniia 05 June 2024 (has links) Diese Dissertation ist der Grammatik als einer historischen Textsorte gewidmet. Grammatik ist nicht nur ein Teil der Sprach(wissenschafts)geschichte Europas, sondern ein sehr wichtiges Instrument der Sprachpolitik, Sprachpflege, Sprachkodifizierung und Sprachlehre. Im Kontext vor allem integrativ ausgerichteter Textsortenlinguistik wird in dieser Dissertation die Grammatik einer historischen Einzelsprache – in diesem Fall des Spanischen – als Darstellung einer Textsorte bzw. Textsortenvariante betrachtet. In Europa hat die Grammatik seit Beginn der Standardisierung gesprochener Volkssprachen und nach dem Vorbild der lateinischen Grammatiken ein eigenes variables Modell der sprachlich-textuellen Gestaltung entwickelt und zeichnet sich durch Multifunktionalität und eine komplexe Struktur aus. Es wird ein theoretisches Analysemodell entwickelt, das diachronische Variation und Wandel der Textsorte „Handbuch Grammatik“ umfassend beschreiben und erklären kann. Das Ziel ist ferner, dieses Modell auf die Analyse des Untersuchungskorpus anzuwenden. Dazu wurden vier Grammatiken der spanischen bzw. kastilischen Sprache ausgewählt, die als Wendewerke der linguistischen Tradition Spaniens anerkannt sind: - „Gramática de la lengua castellana“ von Antonio de Nebrija, 1492 (als erstes vollständig auf Spanisch geschriebenes grammatisches Handbuch des Spanischen); - „Gramática de la lengua castellana“ von Real Academia Española, 1771 (als erste kollektive, akademische Grammatik des Spanischen); - „Gramática de la lengua castellana“ von Academia Española, 1931 (die vorletzte Fassung der Akademiegrammatik, die fast für das gesamte 20. Jahrhundert autoritativ war); - „La nueva gramática de la lengua española“ von Real Academia Española und Asociación de Academias de la Lengua Española, 2009-2011 (die letzte Fassung der Akademiegrammatik, basierend auf dem gegenwärtigen Zustand der Sprache des 21. Jahrhunderts und mit Charakter „panhispánico“). / In this thesis grammar is analyzed as a historical text type. Grammar is not only part of language and linguistics’ history but also a very important instrument of language policy, codification, teaching, and glottopolitics as well. In the context of predominantly integrative text-type linguistics, grammar of a single historical language - in this case Spanish - is considered in this work as a realization of a text type or variant of a text type. In Europe grammar has been developing its own variable model of text-structure and is characterized by versatility since the beginning of the standardization of spoken vernacular languages and basing on the model of Latin grammars. A theoretical analysis model is developed the way that it can comprehensively describe and explain the diachronic variation and change of the text type - "Handbuch Grammatik". The aim is also to apply this model to the analysis of the study corpus. For this purpose, four grammars of the Spanish (Castilian) language were selected. These texts are recognized as turning points of the Spanish linguistic tradition: - "Gramática de la lengua castellana" by Antonio de Nebrija, 1492 (as the first grammar manual of Spanish written entirely in Spanish); - "Gramática de la lengua castellana" by Real Academia Española, 1771 (as the first collective Academy grammar of Spanish); - "Gramática de la lengua castellana" by Academia Española, 1931 (the penultimate version of the Academy grammar which has been used for almost the entire 20th century); - "La nueva gramática de la lengua española" by Real Academia Española and Asociación de Academias de la Lengua Española, 2009-2011 (the last version of Academy grammar based on the current state of the language of the 21st century and with the character "panhispánico"). Grammatik Textsorte historische Textlinguistik Textsorte Grammatik text linguistics type of text historical text linguistics grammar grammar as a type of text text type 410 Linguistik 460 Spanisch, Portugiesisch, Galicisch 465 Grammatik des Spanischen IM 3250 ddc:410 ddc:460 ddc:465
6	České království v kronice Otakara Štýrského / Czech Kingdom in The Chronicle of Otakar of Styria Košátková, Anna January 2015 (has links) The subject of the Masters thesis The Czech Kingdom in The Chronicle of Otakar of Styria is the history of Czech Kingdom in the Otokar of Styrias versified cronicle. The goal of this work is a comprehensive view of that cronicle as a historic source of cetral European history during the second half of the 13th and the beginning of the 14th century. The Masters thesis includes an evaluation the relationships between the central European sources of that time. It investigates both, the knetty question of the autor's live story and his motivation for writing a work around 100 000 verses. In particular chapters, various social groups, which the author focuses on, are examined (royal houses, aristocracy, burgher class, people). No particular social group can be considered in isolation. Thier interrelations are highlighted in the thesis. Following section introduces Otakar's description of certain central European regions (Austria, Styria, Carinthia, Hungary, Polen and the Holy Roman Empire), which is the foundation of my attempt to discover the cronicler's source and information base. Used method is based on the analysis of cronicle sources and the study of historical materials. The history of the Kingdom of Bohemia results from the above mentioned circumstances. Based on this approach, the thesis of...

1

Page generated in 0.0845 seconds