Global ETD Search

21	Grammatical and pragmatic use of referential expressions in picture-based narratives of bilingual and monolingual children in Russian and German Topaj, Nathalie 14 August 2020 (has links) Die Dissertation befasst sich mit der Verwendung referentieller Ausdrücke im narrativen Diskurs monolingualer und bilingualer Kinder im Russischen und Deutschen. Insgesamt wurden 188 Erzählungen untersucht, elizitiert durch Bildergeschichten von 60 bilingualen und 68 monolingualen Kindern in 3 Altersgruppen (4-, 5- und 6-Jährige). Das Hauptziel der Studie war herauszufinden, wie russisch-deutsch bilinguale Kinder und monolinguale Kinder der jeweiligen Sprachen mit der Wahl der referentiellen Ausdrücke im narrativen Diskurs umgehen und ob ihre Leistung und Entwicklung in Bezug auf die grammatische und pragmatische Verwendung referentieller Ausdrücke für die Einführung, Weiterführung und Wiedereinführung von Referenten ähnlich sind. Die Ergebnisse weisen darauf hin, dass Kinder bereits im Alter von 4 Jahren ein gut ausgebildetes Repertoire an referentiellen Ausdrücken haben und ein gutes Verständnis für deren pragmatische Verwendung sowie für die Unterscheidung zwischen den Informationsstatus von Referenten new (neu), given (gegeben) und accessible (zugänglich) zeigen. Die Verwendung von referentiellen Ausdrücken entwickelt sich bei monolingualen und bilingualen Kindern in der analysierten Altersspanne signifikant, insbesondere in Bezug auf ihre Wahl für die Einführung und Wiedereinführung von Referenten. Trotz teilweise signifikanter Unterschiede in den Altersgruppen monolingualer und bilingualer Kinder zeigen alle Stichproben ähnliche Ergebnisse spätestens im Alter von 6 Jahren, d.h. dass bilinguale Kinder in der Lage sind, im Laufe des Spracherwerbsprozesses bis zu diesem Alter die Referenzsysteme ihrer beiden Sprachen entsprechend zu reorganisieren und referentielle Ausdrücke zielsprachlich zu verwenden. Gleichzeitig verwenden bilinguale Kinder ähnliche referentielle Strategien und zeigen teilweise parallele Entwicklungsmuster in beiden Sprachen. Solche Parallelen sind zum Teil auch zwischen den monolingualen Stichproben im Russischen und Deutschen zu beobachten. / This dissertation deals with the use of referential expressions in the narrative discourse of monolingual and bilingual children in Russian and German. A total of 188 narratives, elicited with picture stories from 60 bilingual and 68 monolingual children in 3 age groups (4, 5, and 6 years of age) were examined. The main aim of the study was to find out how Russian-German bilingual children and monolingual children of the respective languages deal with the choice of referential expressions in narrative discourse and whether their performance and development in terms of grammatical and pragmatic use of referential expressions for introducing, maintaining and reintroducing referents is similar. The results indicate that children already have a well-developed repertoire of referential expressions at age 4 and demonstrate a good understanding of the pragmatic use of referential expressions and of the distinction between different information statuses of referents, defined as new, given, and accessible. The use of referential expressions develops significantly in monolingual and bilingual children in the analyzed age range, especially with regard to the choice of referential expressions for the introduction and reintroduction of referents. Despite partly significant differences within age groups in monolingual and bilingual children, all samples show similar results by age 6 at the latest, i.e., bilingual children are able to reorganize the reference systems of their two languages accordingly during the language acquisition process up to this age and to use referential expressions in a manner that corresponds to the target language. At the same time, bilingual children use similar referential strategies and show partly parallel developmental patterns in their two languages. Such parallels are also observed between monolingual samples in Russian and German to some extent. narrativer Diskurs Referenz Informationsstatus bilingualer Spracherwerb bilinguale Kinder monolinguale Kinder Russisch Deutsch narrative discourse reference information status bilingual language acquisition bilingual children monolingual children Russian German 410 Linguistik ER 925 ddc:410
22	Resultatives / A view from Oceanic verb serialization Hopperdietzel, Jens Philipp 15 December 2020 (has links) Diese Dissertation untersucht die Argument- und Ereignisstruktur von Resultativkonstruktion (z.B., Peter wischte den Tisch sauber.) aus der Perspektive zweier serialisierender, wenig untersuchter und bedrohter Ozeanischen Sprachen, Daakaka und Samoanisch, in welchen sowohl die Manner- als auch die Result-Bedeutungskomponente durch verbale Prädikate ausgedrückt wird. Diese Beobachtung steht im Kontrast zu nicht-serialisierenden Sprachen, wie dem Englischem, in welchen nur einer der beiden Bedeutungskomponenten durch das Hauptverb ausgedrückt wird. Im Zuge einer Untersuchung der mor-phosyntaktischen semantischen Eigenschaften zweier Typen von Resultativkonstruktionen, resultative Sekundärprädikation und die means-Konstruktion, entwickelt diese Arbeit einen neuen konfigurationel-len Ansatz innerhalb der Distributed Morphology, in welchem sprachübergreifende Variation als Inter-aktion von morphosyntaktischer und semantischer Komposition der jeweiligen Bedeutungskomponen-ten in Abhängigkeit von sprachspezifischen Restriktionen auf Wurzelbedeutung und Argumentstruktur beschrieben werden kann. Mit Hilfe eigener Feldforschung zeige ich, dass trotz der oberflächlichen Unterschiede zwischen serialisierenden und nicht-serialisierenden Sprachen Ozeanische Resultativkon-struktionen die zugrundeliegende Struktur der means-Konstruktionen aufweisen, in welchen das Man-nerverb an das kausative Hauptverb adjungiert wird und das darin enthaltende, unterspezifizierte kausa-tive Ereignis spezifiziert. Folglich unterscheiden sich beide Sprachtypen nicht signifikant in ihrer mor-phosyntaktischen und semantischen Komposition mit weitreichenden Implikationen für eine sprachübergreifende Typologie von Resultativkonstruktionen. / This dissertation approaches the event and argument structure of resultative construction (e.g., Peter wiped the table clean) from the perspective of two understudied and endangered Oceanic languages, Daakaka and Samoan, in which both the manner and result components are realized by verbal predicates, i.e. resultative serial verb constructions (RSVCs). This observation contrasts with non-serializing languages, such as English, in which only one of the two meaning components is expressed by the main verb. By examining the morphosyntactic and semantic properties of two types of resultative construc-tions, namely resultative secondary predication and means constructions, I develop a novel configurational analysis within the generative framework of Distributed Morphology that models cross-linguistic variation in terms of the morphosyntactic size and the semantic composition of the respective meaning components and their interaction with idiosyncratic requirements on roots and argument structure. Based on original fieldwork, I demonstrate that despite the superficial differences in categorial status, Oceanic RSVCs are an instance of the means construction, in which the manner verb directly adjoins to a causative verb modifying the underspecified causing event entailed in the event structure of the causa-tive predicate. Consequently, serializing and non-serializing languages do not vary significantly in their morphosyntactic and semantic composition with further implications for the typology of resultatives in the world’s languages. Syntax Semantik Typologie Ereignisstruktur Argumentstruktur Kausation Resultativkonstruktionen Wurzelbedeutung Ozeanische Sprachen Samoanisch Daakaka Syntax Semantics Typology Event structure Argument structure Causation Resultatives Root meaning Oceanic languages Samoan Daakaka 410 Linguistik EF 48300 ddc:410
23	Gender agreement in Native and Heritage Greek: an attraction study Paspali, Anastasia 29 November 2019 (has links) Diese Dissertation betrachtet die Beziehung zwischen Parser und Grammatik bei Muttersprachlern (Native Speakers, NS) und Heritage- (Erb-) Sprechern (HS) des Griechischen, indem sie die Mechanismen untersucht, die einer pseudo-Lizenzierung bei Verletzungen der Kongruenz des grammatischen Geschlechts zugrunde liegen. Diese Verletzungen sind Fehler, die auftreten, wenn eine intervenierende Phrase (Attraktor) nicht mit den Genusmerkmalen des Kopfnomens übereinstimmt, ein Phänomen, das in der Literatur (Gender-)Agreement Attraktion, hier Attraktion von Genuskongruenz, genannt wird. Die Dissertation testet, ob eine solche Attraktion von Genuskongruenz im Griechischen vorhanden ist und ob ein- und zweisprachige Muttersprachler gleichermaßen anfällig für Fehler bei der Attraktion sind. Die Dissertation untersucht für die Gruppe der HS außerdem die Genuskongruenz beim Echtzeit-Sprachverstehen und -produzieren. In der Arbeit zeige ich, dass sowohl NS als auch HS anfällig für Attraktionsfehler bei der Genuskongruenz sind. Das zeigen die Reaktionszeitmuster und die Urteile. Gleichzeitig zeigten bei mündlichen Erzählungen beide Sprechergruppen die gleichen Übergeneralisierungsmuster für maskulines Genus bei belebten Nomen sowie bei mündlichen Erzählungen und beschleunigten Grammatikalitätsurteilen für Neutrum bei unbelebten Nomen. Zusammengenommen deuten diese Ergebnisse darauf hin, dass NS und HS anfällig für die Attraktion von Genuskongruenz sind und dass beide Gruppen ähnliche Hinweise zum Abruf des Genus verwenden und somit ähnliche Attraktionsmuster aufweisen. HS unterscheiden sich jedoch von NS in der Verarbeitung der Genuskongruenz an sich, insbesondere bei femininen Kopfnomen (markiertes Genus) in Objekt-Klitika, was darauf hindeutet, dass sowohl Markiertheit als auch Kongruenz an den Schnittstellen die Leistung von HS beeinflusst. Wenn Fehler auftreten, folgen beide Gruppen den gleichen Mustern der Übergeneralisierung. / This dissertation explores the relationship between the parser and the grammar in Native Speakers (NSs) and Heritage Speakers (HSs) of Greek by examining the mechanisms underpinning the illusory licensing of gender agreement violations: errors occurring when an intervening phrase (attractor) mismatches the gender cues of the head noun, a phenomenon which is usually called (gender) agreement attraction. In this work, I show that both NSs and HSs are prone to gender agreement attraction errors in the nominal domain of Greek, as their reaction time patterns and (speeded or scaled) judgements revealed. At the same time, both groups showed the same overgeneralization patterns of the masculine value in agreement errors with animate nouns in their oral narrations, and of the neuter value with inanimate nouns in their oral narrations and their online speeded judgements. Taken together, these results suggest that NSs and HSs are prone to gender agreement attraction in Greek and that both groups employ retrieval cues similarly showing similar attraction patterns. However, HSs differ from NSs in the processing of gender agreement per se, particularly with feminine head nouns (marked gender value) on object-clitics, suggesting that markedness as well as agreement at Interfaces influence HSs’ performance. Finally, when errors occur, both groups follow the same overegeneralization patterns. (Gender-)Agreement Attraktion Heritage- (Erb-) Sprechern Kongruenz des grammatischen Geschlechts Genuskongruenz im Griechischen zweisprachige Verarbeitung gender agreement agreement attraction Greek gender Heritage speakers Heritage speakers bilingual processing online sentence comprehension 410 Linguistik 400 Sprache FC 4495 ddc:410 ddc:400
24	Modellorientierte Therapie bei Störungen des Leseerwerbs / Empirische Analyse der Wirksamkeit Bischof, Dorothea 24 April 2020 (has links) Ein ausreichendes Lesetempo, eine hohe Lesegenauigkeit und ein entwickeltes Leseverständnis sind in fast allen Schulfächern Voraussetzung für eine erfolgreiche Teilnahme am Unterricht. Internationale Schulleistungsstudien belegen, dass ein hoher Anteil von Schülern bereits in der Grundschule unzureichende Lesekompetenzen aufweist und daher auf zusätzliche Förder- oder Therapiemaßnahmen angewiesen ist. Im Hinblick darauf wurden in einem Gruppen-Prä-Post-Follow-Up-Design mit zweifacher Prä-Messung zwei unterschiedliche Interventionen zur Verbesserung der Lesefähigkeiten bei Zweit- und Drittklässlern evaluiert: Ein modellgeleitetes Therapieverfahren zur Verbesserung der Lesegeschwindigkeit von Wörtern und ein von Eltern durchgeführtes Fördertraining zur Verbesserung der Lesegenauigkeit und Lesegeschwindigkeit von Pseudowörtern. Zur Teilnahme an den Interventionen wurden 58 Zweit- und Drittklässler mit einem gravierenden Leserückstand ausgewählt und entweder dem Therapieprogramm oder dem Fördertraining zugeteilt. Beide Gruppen erhielten über 5 Wochen ein tägliches 45-minütiges Training. Während das Training der Therapiegruppe von einem ausgebildeten Therapeuten durchgeführt wurde und in der Schule stattfand, wurde das Training der Fördergruppe von Eltern zu Hause durchgeführt. Es wurden Veränderungen in der Lesegeschwindigkeit, dem Leseverständnis und verschiedenen Blickbewegungsparametern ausgewertet. / In order to follow the lessons in school, children must be able to read with speed as well as with accuracy and a proven ability to comprehend texts. International school performance studies show that a high proportion of students already have inadequate reading skills in elementary school and therefore need additional support or therapy measures. Based on this observation, an evaluation of two different interventions among second- and third-graders is reported: A pre-post follow-up design with double pre-measurement, aiming at the increase of the students' reading skills. A model-guided therapy method for improving the reading speed of words and a parental training course for the improvement of reading accuracy and reading speed of pseudowords. 58 second- and third-graders with a serious reading backlog were selected to participate in the interventions and were assigned to either the therapy program or the parental training. Both groups received daily 45-minute training over a 5 week period. While the training of the therapy group was held by a therapist and took place at school, the training of the support group was carried out by parents at home. Changes in reading speed, reading comprehension and various eye movement parameters were evaluated. LRS modellorientierte Therapie Lesestörungen Grundschulkinder Logogen-Modell Zwei-Wege-Modell Reading/Writing Disability model-guided treatment dyslexia elementary school children logogen model dual route model 410 Linguistik DT 2100 ddc:410 ddc:371
25	Deverbal Nouns in Modern Hebrew: Between Grammar and Competition Ahdout, Odelia 19 September 2022 (has links) Diese Arbeit beschäftigt sich mit den morphosyntaktischen und derivationellen Eigenschaften von Nominalisierungen im modernen Hebräisch und ihrer strukturelle Repräsentation. Eine zentrale Fragestellung im Rahmen von ‚hybriden‘ Wortbildungen wie Nominalisierungen ist die Ähnlichkeit bzw. die Unähnlichkeit zu den ihr zugrundeliegenden Verben. Unter Heranziehung des Hebräischen, einer Sprache mit reicher morphologischer Markierung, sowohl bei Verben als auch bei Nominalisierungen, werden mehrere Divergenzen zwischen Verben und entsprechenden Nominalisierungen im Bereich der Argument- und Ereignisstruktur eliminiert. Ausgehend von der einflussreichen These der Gleichsetzung von Nominalisierung und Passivierung untersucht diese Studie die syntaktische Struktur und deren Interaktion mit dem Wortbildungsprozess der Nominalisierung und zeigt, dass Eigenschaften, die für Passivformen typisch sind, in Nominalisierungen fehlen. Dabei präsentiert diese Studie mit der Untersuchung morphosyntaktischer Faktoren und deren Beziehungen zu Nominalisierungen, der Inkonsistenzen aufzeigt. Durch einen Vergleich von etwa 3000 Verben auf Basis der Verbklassenmorphologie ergibt sich eine signifikante Asymmetrie zwischen Nominalisierungen, die eine mediale/intransitive Markierung tragen, und Nominalisierungen, die als aktiv markiert sind, wobei sich die mediale Form in zwei klar definierten syntaktischen Kontexten als weniger produktiv erweist. Dies zeigt sich auch dadurch, dass alternierende Wurzeln, also Wurzeln die sowohl aktive als auch mediale Verbformen ausbilden können, bilden ihre Nominalisierungen auf Basis ihrer aktiven Form. Auf Basis der Konzepte von Konkurrenz und Markiertheit werden diese paradigmatischen Lücken nicht als grammatisch bedingte Inkompatibilitäten analysiert, sondern als eine generelle Präferenz für weniger markierte Formen (aktiv-markierte Nominalisierungen) gegenüber komplexeren (medial-markierte Nominalisierungen), wie in der Performanz häufig zu beobachten. / This study is concerned with the properties, structural representation and derivational patterns of deverbal nouns (DNs) in Modern Hebrew. A recurring question arises in the context of such ‘hybrid’ formations: precisely how similar or far-apart are these derivatives from the verbs from which they originate? Enlisting Hebrew, a language with rich morphological marking on both verbs as well as DNs, several loci of divergence between verbs and respective DNs in the domain of argument- and event-structure are eliminated. Taking as a point of reference the influential view which equates the processes of nominalization and passivization, this study scrutinizes syntactic structure and its interaction with nominalization, showing that behaviours typical of passives are absent from DNs. a finding which weakens long-standing beliefs bearing on this class. A novel area of exploration offered in this study is the examination of morpho-syntactic factors and their interaction with nominalization, a domain where inconsistencies do arise. What emerges from a comparison of some 3000 verbs based on verb-class (templatic) morphology is a significant asymmetry between DNs carrying Middle (intransitive) marking and DNs marked as Active, wherein Middle forms are found to be less productive in two well-defined syntactic contexts. Not entirely absent, however, the same roots which fail to surface with Middle morphology are perfectly licit when derived from the corresponding Active verb (in case of alternating roots). Building on the notions of competition and markedness, such paradigmatic gaps are analysed not as grammatically-determined incompatibilities, but as a consistent preference for less-marked forms (Active-marked DNs) over more complex ones (Middle-marked DNs), a trend which lies within the realm of performance. As such, Hebrew DNs constitute a case study of the interrelations between the syntactic and morphological modules, and pragmatics. Hebräisch Nominalisierung Argument- und Ereignisstruktur morphosyntax morphologische Markiertheit Wortbildung mediale (intransitive) Markierung Voice Semitische Sprachen Moden Hebrew Deverbal Nouns Transitivity Alternations Morphosyntax Semitic Grammatical Voice Competition Morphological Markedness Middle (intransitive) Marking Templatic Morphology 410 Linguistik EM 5830 ET 320 ddc:410
26	"Aber immer alle sagen das" The Status of V3 in German: Use, Processing, and Syntactic Representation Bunk, Oliver 11 November 2020 (has links) Für das Deutsche wird gemeinhin eine strikte V2-Beschränkung angenommen, die für deklarative Hauptsätze besagt, dass sich vor dem finiten Verb genau eine Konstituente befinden muss. In der Literatur werden häufig Beispiele angeführt, in denen sich zwei Konstituenten vor dem finiten Verb befinden und die somit gegen die V2-Beschränkung verstoßen. Diese syntaktische Konfiguration, so das Argument, führt zu Ungrammatikalität: (1) Gestern Johann hat getanzt. (Roberts & Roussou 2002:137) Die Bewertung in (1) fußt jedoch nicht auf empirischer Evidenz, sondern spiegelt ein introspektives Urteil der Autorinnen wider. Daten zum tatsächlichen Sprachgebrauch zeigen, dass Sätze wie in (2) im Deutschen durchaus verwendet werden: (2) Aber immer alle sagen das. [BSa-OB, #16] Die Dissertation beschäftigt sich mit dem Status dieser V3-Deklarativsätze im Deutschen. Der Status wird aus drei einander ergänzenden Perspektiven auf Sprache untersucht: Sprachverwendung, Akzeptabilität und Verarbeitung. Hierzu werden Daten, die in einer Korpus-, einer Akzeptabilitäts- und einer Lesezeitstudie erhoben wurden, ausgewertet. Basierend auf den empirischen Befunden diskutiere ich V3-Modellierungen aus generativer Sicht und entwickle einen Modellierungsvorschlag aus konstruktionsgrammatischer Sicht. Die Arbeit zeigt, dass die Einbeziehung von nicht-standardsprachlichen Mustern wichtige Einblicke in die sprachliche Architektur gibt. Insbesondere psycholinguistisch gewonnene Daten als empirische Basis sind essenziell, um mentale sprachliche Prozesse zu verstehen und abbilden zu können. Die Analyse von V3 zeigt, dass solche Ansätze möglich und nötig sind, um Grammatikmodelle zu prüfen und weiterzuentwickeln. Untersuchungen dieser Art stellen Grammatikmodelle in Frage, die oft einer standardsprachlichen Tradition heraus erwachsen sind und nur einen Ausschnitt der sprachlichen Realität erfassen. V3-Sätze entpuppen sich nach dieser Analyse als Strukturen, die fester Bestandteil der Grammatik sind. / German is usually considered to follow a strict V2-constraint. This means that exactly one constituent must precede the finite verb in declarative main clauses. There are many examples for sentences that exhibit two preverbal constituents in the literature, illustrating a violation of the V2-constraint. According to the literature, these configurations lead to ungrammatical structures. (1) *Gestern Johann hat getanzt. (Roberts & Roussou 2002:137) However, the evaluation in (1) is not based on empirical evidence but is introspective and thus might not reflect the linguistic reality. Empirical data from actual language use show that German speakers indeed use these kinds of sentences. (2) Aber immer alle sagen das. [BSa-OB, #16] The dissertation explores the status of these V3 declaratives in German, with ‘status’ comprising three complementary perspectives on language: language use, acceptability, and processing. To this end, I analyze data from three studies: a corpus study, an acceptability judgment study, and a reading time study. Based on the empirical evidence, I discuss existing analyses of V3 and V3-modeling from the generative perspective and develop an analysis taking a construction-based approach. The dissertation shows that including patterns from non-standard language allows for valuable insights into the architecture of language. In particular, psycholinguistic data as an empirical basis are essential to understand and model mental linguistic processes. The analyses presented in the dissertation show that it is possible to follow such an approach in the field of syntactic variation, and it is indeed necessary in order to challenge and further develop existing grammatical theories and our understanding of grammar. Most grammatical models strongly rely on standard language, which is why they only capture a snippet of the linguistic reality. Taking empirical evidence into account, however, V3 sentences turn out to form an integral part of the German grammar. Syntaktische Variation V2 V3 Grammatiktheorie Generative Grammatik Konstruktionsgrammatik Verbstellung Psycholinguistik Syntactic variation V2 V3 Generative Grammar Grammatical theory Construction Grammar Verb placement Psycholinguistics 410 Linguistik GC 7328 GC 7205 ddc:410
27	Integrating Natural Language Processing (NLP) and Language Resources Using Linked Data Hellmann, Sebastian 09 January 2014 (has links) This thesis is a compendium of scientific works and engineering specifications that have been contributed to a large community of stakeholders to be copied, adapted, mixed, built upon and exploited in any way possible to achieve a common goal: Integrating Natural Language Processing (NLP) and Language Resources Using Linked Data The explosion of information technology in the last two decades has led to a substantial growth in quantity, diversity and complexity of web-accessible linguistic data. These resources become even more useful when linked with each other and the last few years have seen the emergence of numerous approaches in various disciplines concerned with linguistic resources and NLP tools. It is the challenge of our time to store, interlink and exploit this wealth of data accumulated in more than half a century of computational linguistics, of empirical, corpus-based study of language, and of computational lexicography in all its heterogeneity. The vision of the Giant Global Graph (GGG) was conceived by Tim Berners-Lee aiming at connecting all data on the Web and allowing to discover new relations between this openly-accessible data. This vision has been pursued by the Linked Open Data (LOD) community, where the cloud of published datasets comprises 295 data repositories and more than 30 billion RDF triples (as of September 2011). RDF is based on globally unique and accessible URIs and it was specifically designed to establish links between such URIs (or resources). This is captured in the Linked Data paradigm that postulates four rules: (1) Referred entities should be designated by URIs, (2) these URIs should be resolvable over HTTP, (3) data should be represented by means of standards such as RDF, (4) and a resource should include links to other resources. Although it is difficult to precisely identify the reasons for the success of the LOD effort, advocates generally argue that open licenses as well as open access are key enablers for the growth of such a network as they provide a strong incentive for collaboration and contribution by third parties. In his keynote at BNCOD 2011, Chris Bizer argued that with RDF the overall data integration effort can be “split between data publishers, third parties, and the data consumer”, a claim that can be substantiated by observing the evolution of many large data sets constituting the LOD cloud. As written in the acknowledgement section, parts of this thesis has received numerous feedback from other scientists, practitioners and industry in many different ways. The main contributions of this thesis are summarized here: Part I – Introduction and Background. During his keynote at the Language Resource and Evaluation Conference in 2012, Sören Auer stressed the decentralized, collaborative, interlinked and interoperable nature of the Web of Data. The keynote provides strong evidence that Semantic Web technologies such as Linked Data are on its way to become main stream for the representation of language resources. The jointly written companion publication for the keynote was later extended as a book chapter in The People’s Web Meets NLP and serves as the basis for “Introduction” and “Background”, outlining some stages of the Linked Data publication and refinement chain. Both chapters stress the importance of open licenses and open access as an enabler for collaboration, the ability to interlink data on the Web as a key feature of RDF as well as provide a discussion about scalability issues and decentralization. Furthermore, we elaborate on how conceptual interoperability can be achieved by (1) re-using vocabularies, (2) agile ontology development, (3) meetings to refine and adapt ontologies and (4) tool support to enrich ontologies and match schemata. Part II - Language Resources as Linked Data. “Linked Data in Linguistics” and “NLP & DBpedia, an Upward Knowledge Acquisition Spiral” summarize the results of the Linked Data in Linguistics (LDL) Workshop in 2012 and the NLP & DBpedia Workshop in 2013 and give a preview of the MLOD special issue. In total, five proceedings – three published at CEUR (OKCon 2011, WoLE 2012, NLP & DBpedia 2013), one Springer book (Linked Data in Linguistics, LDL 2012) and one journal special issue (Multilingual Linked Open Data, MLOD to appear) – have been (co-)edited to create incentives for scientists to convert and publish Linked Data and thus to contribute open and/or linguistic data to the LOD cloud. Based on the disseminated call for papers, 152 authors contributed one or more accepted submissions to our venues and 120 reviewers were involved in peer-reviewing. “DBpedia as a Multilingual Language Resource” and “Leveraging the Crowdsourcing of Lexical Resources for Bootstrapping a Linguistic Linked Data Cloud” contain this thesis’ contribution to the DBpedia Project in order to further increase the size and inter-linkage of the LOD Cloud with lexical-semantic resources. Our contribution comprises extracted data from Wiktionary (an online, collaborative dictionary similar to Wikipedia) in more than four languages (now six) as well as language-specific versions of DBpedia, including a quality assessment of inter-language links between Wikipedia editions and internationalized content negotiation rules for Linked Data. In particular the work described in created the foundation for a DBpedia Internationalisation Committee with members from over 15 different languages with the common goal to push DBpedia as a free and open multilingual language resource. Part III - The NLP Interchange Format (NIF). “NIF 2.0 Core Specification”, “NIF 2.0 Resources and Architecture” and “Evaluation and Related Work” constitute one of the main contribution of this thesis. The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. The core specification is included in and describes which URI schemes and RDF vocabularies must be used for (parts of) natural language texts and annotations in order to create an RDF/OWL-based interoperability layer with NIF built upon Unicode Code Points in Normal Form C. In , classes and properties of the NIF Core Ontology are described to formally define the relations between text, substrings and their URI schemes. contains the evaluation of NIF. In a questionnaire, we asked questions to 13 developers using NIF. UIMA, GATE and Stanbol are extensible NLP frameworks and NIF was not yet able to provide off-the-shelf NLP domain ontologies for all possible domains, but only for the plugins used in this study. After inspecting the software, the developers agreed however that NIF is adequate enough to provide a generic RDF output based on NIF using literal objects for annotations. All developers were able to map the internal data structure to NIF URIs to serialize RDF output (Adequacy). The development effort in hours (ranging between 3 and 40 hours) as well as the number of code lines (ranging between 110 and 445) suggest, that the implementation of NIF wrappers is easy and fast for an average developer. Furthermore the evaluation contains a comparison to other formats and an evaluation of the available URI schemes for web annotation. In order to collect input from the wide group of stakeholders, a total of 16 presentations were given with extensive discussions and feedback, which has lead to a constant improvement of NIF from 2010 until 2013. After the release of NIF (Version 1.0) in November 2011, a total of 32 vocabulary employments and implementations for different NLP tools and converters were reported (8 by the (co-)authors, including Wiki-link corpus, 13 by people participating in our survey and 11 more, of which we have heard). Several roll-out meetings and tutorials were held (e.g. in Leipzig and Prague in 2013) and are planned (e.g. at LREC 2014). Part IV - The NLP Interchange Format in Use. “Use Cases and Applications for NIF” and “Publication of Corpora using NIF” describe 8 concrete instances where NIF has been successfully used. One major contribution in is the usage of NIF as the recommended RDF mapping in the Internationalization Tag Set (ITS) 2.0 W3C standard and the conversion algorithms from ITS to NIF and back. One outcome of the discussions in the standardization meetings and telephone conferences for ITS 2.0 resulted in the conclusion there was no alternative RDF format or vocabulary other than NIF with the required features to fulfill the working group charter. Five further uses of NIF are described for the Ontology of Linguistic Annotations (OLiA), the RDFaCE tool, the Tiger Corpus Navigator, the OntosFeeder and visualisations of NIF using the RelFinder tool. These 8 instances provide an implemented proof-of-concept of the features of NIF. starts with describing the conversion and hosting of the huge Google Wikilinks corpus with 40 million annotations for 3 million web sites. The resulting RDF dump contains 477 million triples in a 5.6 GB compressed dump file in turtle syntax. describes how NIF can be used to publish extracted facts from news feeds in the RDFLiveNews tool as Linked Data. Part V - Conclusions. provides lessons learned for NIF, conclusions and an outlook on future work. Most of the contributions are already summarized above. One particular aspect worth mentioning is the increasing number of NIF-formated corpora for Named Entity Recognition (NER) that have come into existence after the publication of the main NIF paper Integrating NLP using Linked Data at ISWC 2013. These include the corpora converted by Steinmetz, Knuth and Sack for the NLP & DBpedia workshop and an OpenNLP-based CoNLL converter by Brümmer. Furthermore, we are aware of three LREC 2014 submissions that leverage NIF: NIF4OGGD - NLP Interchange Format for Open German Governmental Data, N^3 – A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format and Global Intelligent Content: Active Curation of Language Resources using Linked Data as well as an early implementation of a GATE-based NER/NEL evaluation framework by Dojchinovski and Kliegr. Further funding for the maintenance, interlinking and publication of Linguistic Linked Data as well as support and improvements of NIF is available via the expiring LOD2 EU project, as well as the CSA EU project called LIDER, which started in November 2013. Based on the evidence of successful adoption presented in this thesis, we can expect a decent to high chance of reaching critical mass of Linked Data technology as well as the NIF standard in the field of Natural Language Processing and Language Resources.:CONTENTS i introduction and background 1 1 introduction 3 1.1 Natural Language Processing . . . . . . . . . . . . . . . 3 1.2 Open licenses, open access and collaboration . . . . . . 5 1.3 Linked Data in Linguistics . . . . . . . . . . . . . . . . . 6 1.4 NLP for and by the Semantic Web – the NLP Inter- change Format (NIF) . . . . . . . . . . . . . . . . . . . . 8 1.5 Requirements for NLP Integration . . . . . . . . . . . . 10 1.6 Overview and Contributions . . . . . . . . . . . . . . . 11 2 background 15 2.1 The Working Group on Open Data in Linguistics (OWLG) 15 2.1.1 The Open Knowledge Foundation . . . . . . . . 15 2.1.2 Goals of the Open Linguistics Working Group . 16 2.1.3 Open linguistics resources, problems and chal- lenges . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.4 Recent activities and on-going developments . . 18 2.2 Technological Background . . . . . . . . . . . . . . . . . 18 2.3 RDF as a data model . . . . . . . . . . . . . . . . . . . . 21 2.4 Performance and scalability . . . . . . . . . . . . . . . . 22 2.5 Conceptual interoperability . . . . . . . . . . . . . . . . 22 ii language resources as linked data 25 3 linked data in linguistics 27 3.1 Lexical Resources . . . . . . . . . . . . . . . . . . . . . . 29 3.2 Linguistic Corpora . . . . . . . . . . . . . . . . . . . . . 30 3.3 Linguistic Knowledgebases . . . . . . . . . . . . . . . . 31 3.4 Towards a Linguistic Linked Open Data Cloud . . . . . 32 3.5 State of the Linguistic Linked Open Data Cloud in 2012 33 3.6 Querying linked resources in the LLOD . . . . . . . . . 36 3.6.1 Enriching metadata repositories with linguistic features (Glottolog → OLiA) . . . . . . . . . . . 36 3.6.2 Enriching lexical-semantic resources with lin- guistic information (DBpedia (→ POWLA) → OLiA) . . . . . . . . . . . . . . . . . . . . . . . . 38 4 DBpedia as a multilingual language resource: the case of the greek dbpedia edition. 39 4.1 Current state of the internationalization effort . . . . . 40 4.2 Language-specific design of DBpedia resource identifiers 41 4.3 Inter-DBpedia linking . . . . . . . . . . . . . . . . . . . 42 4.4 Outlook on DBpedia Internationalization . . . . . . . . 44 5 leveraging the crowdsourcing of lexical resources for bootstrapping a linguistic linked data cloud 47 5.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2 Problem Description . . . . . . . . . . . . . . . . . . . . 50 5.2.1 Processing Wiki Syntax . . . . . . . . . . . . . . 50 5.2.2 Wiktionary . . . . . . . . . . . . . . . . . . . . . . 52 5.2.3 Wiki-scale Data Extraction . . . . . . . . . . . . . 53 5.3 Design and Implementation . . . . . . . . . . . . . . . . 54 5.3.1 Extraction Templates . . . . . . . . . . . . . . . . 56 5.3.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . 56 5.3.3 Language Mapping . . . . . . . . . . . . . . . . . 58 5.3.4 Schema Mediation by Annotation with lemon . 58 5.4 Resulting Data . . . . . . . . . . . . . . . . . . . . . . . . 58 5.5 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . 60 5.6 Discussion and Future Work . . . . . . . . . . . . . . . 60 5.6.1 Next Steps . . . . . . . . . . . . . . . . . . . . . . 61 5.6.2 Open Research Questions . . . . . . . . . . . . . 61 6 nlp & dbpedia, an upward knowledge acquisition spiral 63 6.1 Knowledge acquisition and structuring . . . . . . . . . 64 6.2 Representation of knowledge . . . . . . . . . . . . . . . 65 6.3 NLP tasks and applications . . . . . . . . . . . . . . . . 65 6.3.1 Named Entity Recognition . . . . . . . . . . . . 66 6.3.2 Relation extraction . . . . . . . . . . . . . . . . . 67 6.3.3 Question Answering over Linked Data . . . . . 67 6.4 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6.4.1 Gold and silver standards . . . . . . . . . . . . . 69 6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 iii the nlp interchange format (nif) 73 7 nif 2.0 core specification 75 7.1 Conformance checklist . . . . . . . . . . . . . . . . . . . 75 7.2 Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.2.1 Definition of Strings . . . . . . . . . . . . . . . . 78 7.2.2 Representation of Document Content with the nif:Context Class . . . . . . . . . . . . . . . . . . 80 7.3 Extension of NIF . . . . . . . . . . . . . . . . . . . . . . 82 7.3.1 Part of Speech Tagging with OLiA . . . . . . . . 83 7.3.2 Named Entity Recognition with ITS 2.0, DBpe- dia and NERD . . . . . . . . . . . . . . . . . . . 84 7.3.3 lemon and Wiktionary2RDF . . . . . . . . . . . 86 8 nif 2.0 resources and architecture 89 8.1 NIF Core Ontology . . . . . . . . . . . . . . . . . . . . . 89 8.1.1 Logical Modules . . . . . . . . . . . . . . . . . . 90 8.2 Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . 91 8.2.1 Access via REST Services . . . . . . . . . . . . . 92 8.2.2 NIF Combinator Demo . . . . . . . . . . . . . . 92 8.3 Granularity Profiles . . . . . . . . . . . . . . . . . . . . . 93 8.4 Further URI Schemes for NIF . . . . . . . . . . . . . . . 95 8.4.1 Context-Hash-based URIs . . . . . . . . . . . . . 99 9 evaluation and related work 101 9.1 Questionnaire and Developers Study for NIF 1.0 . . . . 101 9.2 Qualitative Comparison with other Frameworks and Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 9.3 URI Stability Evaluation . . . . . . . . . . . . . . . . . . 103 9.4 Related URI Schemes . . . . . . . . . . . . . . . . . . . . 104 iv the nlp interchange format in use 109 10 use cases and applications for nif 111 10.1 Internationalization Tag Set 2.0 . . . . . . . . . . . . . . 111 10.1.1 ITS2NIF and NIF2ITS conversion . . . . . . . . . 112 10.2 OLiA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 10.3 RDFaCE . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 10.4 Tiger Corpus Navigator . . . . . . . . . . . . . . . . . . 121 10.4.1 Tools and Resources . . . . . . . . . . . . . . . . 122 10.4.2 NLP2RDF in 2010 . . . . . . . . . . . . . . . . . . 123 10.4.3 Linguistic Ontologies . . . . . . . . . . . . . . . . 124 10.4.4 Implementation . . . . . . . . . . . . . . . . . . . 125 10.4.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . 126 10.4.6 Related Work and Outlook . . . . . . . . . . . . 129 10.5 OntosFeeder – a Versatile Semantic Context Provider for Web Content Authoring . . . . . . . . . . . . . . . . 131 10.5.1 Feature Description and User Interface Walk- through . . . . . . . . . . . . . . . . . . . . . . . 132 10.5.2 Architecture . . . . . . . . . . . . . . . . . . . . . 134 10.5.3 Embedding Metadata . . . . . . . . . . . . . . . 135 10.5.4 Related Work and Summary . . . . . . . . . . . 135 10.6 RelFinder: Revealing Relationships in RDF Knowledge Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 10.6.1 Implementation . . . . . . . . . . . . . . . . . . . 137 10.6.2 Disambiguation . . . . . . . . . . . . . . . . . . . 138 10.6.3 Searching for Relationships . . . . . . . . . . . . 139 10.6.4 Graph Visualization . . . . . . . . . . . . . . . . 140 10.6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . 141 11 publication of corpora using nif 143 11.1 Wikilinks Corpus . . . . . . . . . . . . . . . . . . . . . . 143 11.1.1 Description of the corpus . . . . . . . . . . . . . 143 11.1.2 Quantitative Analysis with Google Wikilinks Cor- pus . . . . . . . . . . . . . . . . . . . . . . . . . . 144 11.2 RDFLiveNews . . . . . . . . . . . . . . . . . . . . . . . . 144 11.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . 145 11.2.2 Mapping to RDF and Publication on the Web of Data . . . . . . . . . . . . . . . . . . . . . . . . . 146 v conclusions 149 12 lessons learned, conclusions and future work 151 12.1 Lessons Learned for NIF . . . . . . . . . . . . . . . . . . 151 12.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 151 12.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 153 ddc:000 Informatik ddc:Informationswissenschaft ddc:allgemeine Werke ddc:004 Datenverarbeitung; Informatik ddc:410 Linguistik
28	Kausative Konstruktionen mit dem Verb "machen" im Deutschen Fehrmann, Ingo 07 September 2018 (has links) Untersuchungsgegenstand der Dissertation sind sprachliche Strukturen, die aus einer Form des Verbs „machen“ und einer objektsprädikativen Adjektivphrase bestehen. Die Arbeit ist eingebettet in einen konstruktionsgrammatischen Rahmen, nach dem Sprache sich als strukturiertes Inventar von Konstruktionen (Form-Funktions-Beziehungen) beschreiben lässt. Ziele der Arbeit sind a) die korpusbasierte Ermittlung lexikalischer Kollokationen und Gebrauchstendenzen innerhalb der Zielstruktur sowie b) die systematische Beschreibung der damit verbundenen Form-Funktions-Beziehungen. Als Arbeitshypothese wurde übereinstimmend mit bisherigen Arbeiten zum selben sprachlichen Gegenstand eine kausative Bedeutung, also die Kodierung einer Ursache-Wirkung-Relation, angenommen. Da konstruktionsgrammatischen Ansätzen zufolge formale Unterschiede mit Unterschieden auf der Ebene der Funktion korrespondieren sollten, wurde empirisch untersucht, in welchen Fällen formale Unterschiede innerhalb der Zielstruktur tatsächlich systematisch zu unterschiedlichen funktionalen Interpretationen führen. Lexikalische Kollokationen innerhalb der Zielstruktur wurden statistisch anhand von Kollostruktionsanalysen („Covarying Collexeme Analysis“; vgl. Gries/Stefanowitsch, 2004) ermittelt. Zur Beschreibung der Bedeutung oder Funktion dienten Frame-semantische Beschreibungen englischer Verben aus dem FrameNet (vgl. Fillmore/Baker, 2010). Eine wesentliche Beobachtung besteht nun darin, dass entgegen der ursprünglichen Annahme keineswegs alle Vorkommen von „machen“ mit einer objektsprädikativen Adjektivphrase eine Ursache-Wirkung-Relation kodieren. Gerade die in der Kombination mit „machen“ hochfrequenten Adjektive korrelieren signifikant mit abweichenden, nicht im engeren Sinne kausativen, Interpretationen im Sinne der jeweils evozierten semantischen Frames. / This dissertation focuses on combinations of a form of the German verb “machen” with an adjective phrase which, according to a working hypothesis, is said to have a resultative reading. The work is grounded in a Construction Grammar approach, viewing language as a structured inventory of Constructions, i.e. form-function mappings. The aims are a) establishing lexical collocations and usage tendencies within these structures involving “machen” and a resultative adjective phrase, based on corpus studies, and b) describing systematically the relevant form-function mappings. As Construction Grammar approaches predict changes in function corresponding to changes in form, the formal collocations established according to aim a) are systematically analyzed with respect to their respective functional interpretations. The methods used involve a series of „Covarying Collexeme Analyses“ (cf. Gries/Stefanowitsch, 2004) to study lexical collocations within the given formal structure, and the application of frame semantic descriptions of English verbs, as found in FrameNet (cf. Fillmore/Baker, 2010), to the German structures found in the corpora. The results indicate that, contrary to the working hypothesis, a great number of “machen” plus adjective tokens does not lead to a causative or resultative interpretation. Especially the most frequent adjectives combined with “machen” exhibit a significant correlation with structures evoking different, not strictly causative, semantic frames. deutsche Sprache gebrauchsbasierte Linguistik Konstruktionsgrammatik lexikalische Kollokationen Kollostruktionsanalysen Covarying Collexeme Analysis Frame-Semantik kausative Verben German language usage-based linguistics Construction Grammar lexical collocations collostructional analysis Covarying Collexeme Analysis Frame Semantics causative verbs 410 Linguistik 430 Germanische Sprachen; Deutsch GC 7246 ddc:410 ddc:430
29	Modal Particles, Discourse Structure and Common Ground Management. Döring, Sophia 27 September 2018 (has links) Die vorliegende Arbeit beschäftigt sich mit dem Phänomen der deutschen Modalpartikeln (MPn), das in der linguistischen Forschung viel Aufmerksamkeit erhalten hat, aber fast immer nur innerhalb der Satzgrenzen betrachtet wurde. Es wurde mehrfach vorgeschlagen, dass MPn eine Funktion im Hinblick auf Common Ground-Management haben, jedoch wird nie ausgeführt, wie diese zustande kommt. In dieser Arbeit wird gezeigt, wie die Bedeutung und Funktion verschiedener MPn im Rahmen eines erweiterten Common Ground-Modells erfasst werden kann. In einem zweiten Schritt wird in zwei empirischen Studien die Interaktion von MPn mit Diskursstruktur analysiert, wobei Diskursstruktur hier im Rahmen von Diskursrelationen modelliert wird. Dafür wurden in einem Korpus von Parlamentsreden (126.000 Token) alle Sätze, die eine MP (ja, doch, eben, halt, wohl und schon wurden analysiert) enthalten im Hinblick auf ihre Relationen zu adjazenten Diskurseinheiten annotiert. Verwendet wurde dafür die in der Rhetorischen Strukturtheorie (Mann & Thompson 1989) vorgeschlagenen Relationen. Die statistische Analyse der Ergebnisse zeigen signifikante Präferenzen der einzelnen MPn für bestimmte Diskursrelationen. Diese wurden anschließend in einem Lexical Choice Experiment überprüft und bestätigt, bei dem SprecherInnen im Kontext verschiedener Diskursrelationen auswählen sollten, welche MP am natürlichsten in einen Diskurs passt. SprecherInnen verwenden MPn, um zu zeigen, in welchem Verhältnis eine Proposition zu anderen steht oder um die Proposition auf eine bestimmte Art und Weise im Diskurs zu verankern, z.B. in dem sie als Hintergrundinformation markiert wird. Die beiden empirischen Studien zeigen zum ersten Mal, wie SprecherInnen diese Funktionen nutzen – und teilweise ausnutzen – um Diskurs zu strukturieren, Diskursrelationen hervorzuheben und so Kohärenz zu fördern. Gleichzeitig zeigt diese Arbeit, dass ein erweitertes Common Ground-Modell notwendig ist, um den Beitrag von MPn adäquat zu erfassen. / This work focuses on the phenomenon of German modal particles (Mps) which has received much attention in linguistic research – however mainly restricted to an analysis inside the sentence boundaries. It has been proposed that the function of Mps can be described with respect to common ground management, but this has never been spelled out in detail. Here, the meaning and function of different Mps will be captured in a broadened common ground model. In a second step, two empirical studies analyse the interaction of MPs and discourse structure – here modelled in terms of discourse relations. In a corpus of parliament speeches (126.000 word tokens), all sentences containing a modal particle (ja, doch, eben, halt, wohl and schon have been analyzed) were annotated for their discourse relations towards adjacent discourse units. The statistical analysis of the results reveals clear preferences of the single particles for different discourse relations. These preference were tested again in a follow-up experiment, a lexical choice task in which speakers had to decide which particle fits most naturally in contexts of different discourse relations. The results verified the findings of the corpus study. Overall, MPs can be used to indicate to the addressee how a proposition that is asserted by the speaker is related to (an)other proposition(s) and anchor information in discourse structure in a certain way, e.g. by marking it as background information. The results of the empirical studies show for the first time how speakers can make use of these functions – sometimes by exploiting them – to structure discourse, enhance the function of discourse relations and thereby establish coherence. At the same time, it becomes clear that a broader model of common ground is needed to capture this function of MPs in discourse appropriately. Modalpartikeln Diskurspartikeln Rhetorische Strukturtheorie Diskursstruktur Kohärenz Diskursrelationen Rhetorische Relationen Salienz Korpusstudie Experiment Gemeinsamer Redehintergrund modal particles discourse particles German rhetorical structure theory discourse structure coherence discourse relations coherence relations salience corpus study experiment common ground common ground management 410 Linguistik GC 7386 ddc:410
30	Integrating Natural Language Processing (NLP) and Language Resources Using Linked Data Hellmann, Sebastian 12 January 2015 (has links) (PDF) This thesis is a compendium of scientific works and engineering specifications that have been contributed to a large community of stakeholders to be copied, adapted, mixed, built upon and exploited in any way possible to achieve a common goal: Integrating Natural Language Processing (NLP) and Language Resources Using Linked Data The explosion of information technology in the last two decades has led to a substantial growth in quantity, diversity and complexity of web-accessible linguistic data. These resources become even more useful when linked with each other and the last few years have seen the emergence of numerous approaches in various disciplines concerned with linguistic resources and NLP tools. It is the challenge of our time to store, interlink and exploit this wealth of data accumulated in more than half a century of computational linguistics, of empirical, corpus-based study of language, and of computational lexicography in all its heterogeneity. The vision of the Giant Global Graph (GGG) was conceived by Tim Berners-Lee aiming at connecting all data on the Web and allowing to discover new relations between this openly-accessible data. This vision has been pursued by the Linked Open Data (LOD) community, where the cloud of published datasets comprises 295 data repositories and more than 30 billion RDF triples (as of September 2011). RDF is based on globally unique and accessible URIs and it was specifically designed to establish links between such URIs (or resources). This is captured in the Linked Data paradigm that postulates four rules: (1) Referred entities should be designated by URIs, (2) these URIs should be resolvable over HTTP, (3) data should be represented by means of standards such as RDF, (4) and a resource should include links to other resources. Although it is difficult to precisely identify the reasons for the success of the LOD effort, advocates generally argue that open licenses as well as open access are key enablers for the growth of such a network as they provide a strong incentive for collaboration and contribution by third parties. In his keynote at BNCOD 2011, Chris Bizer argued that with RDF the overall data integration effort can be “split between data publishers, third parties, and the data consumer”, a claim that can be substantiated by observing the evolution of many large data sets constituting the LOD cloud. As written in the acknowledgement section, parts of this thesis has received numerous feedback from other scientists, practitioners and industry in many different ways. The main contributions of this thesis are summarized here: Part I – Introduction and Background. During his keynote at the Language Resource and Evaluation Conference in 2012, Sören Auer stressed the decentralized, collaborative, interlinked and interoperable nature of the Web of Data. The keynote provides strong evidence that Semantic Web technologies such as Linked Data are on its way to become main stream for the representation of language resources. The jointly written companion publication for the keynote was later extended as a book chapter in The People’s Web Meets NLP and serves as the basis for “Introduction” and “Background”, outlining some stages of the Linked Data publication and refinement chain. Both chapters stress the importance of open licenses and open access as an enabler for collaboration, the ability to interlink data on the Web as a key feature of RDF as well as provide a discussion about scalability issues and decentralization. Furthermore, we elaborate on how conceptual interoperability can be achieved by (1) re-using vocabularies, (2) agile ontology development, (3) meetings to refine and adapt ontologies and (4) tool support to enrich ontologies and match schemata. Part II - Language Resources as Linked Data. “Linked Data in Linguistics” and “NLP & DBpedia, an Upward Knowledge Acquisition Spiral” summarize the results of the Linked Data in Linguistics (LDL) Workshop in 2012 and the NLP & DBpedia Workshop in 2013 and give a preview of the MLOD special issue. In total, five proceedings – three published at CEUR (OKCon 2011, WoLE 2012, NLP & DBpedia 2013), one Springer book (Linked Data in Linguistics, LDL 2012) and one journal special issue (Multilingual Linked Open Data, MLOD to appear) – have been (co-)edited to create incentives for scientists to convert and publish Linked Data and thus to contribute open and/or linguistic data to the LOD cloud. Based on the disseminated call for papers, 152 authors contributed one or more accepted submissions to our venues and 120 reviewers were involved in peer-reviewing. “DBpedia as a Multilingual Language Resource” and “Leveraging the Crowdsourcing of Lexical Resources for Bootstrapping a Linguistic Linked Data Cloud” contain this thesis’ contribution to the DBpedia Project in order to further increase the size and inter-linkage of the LOD Cloud with lexical-semantic resources. Our contribution comprises extracted data from Wiktionary (an online, collaborative dictionary similar to Wikipedia) in more than four languages (now six) as well as language-specific versions of DBpedia, including a quality assessment of inter-language links between Wikipedia editions and internationalized content negotiation rules for Linked Data. In particular the work described in created the foundation for a DBpedia Internationalisation Committee with members from over 15 different languages with the common goal to push DBpedia as a free and open multilingual language resource. Part III - The NLP Interchange Format (NIF). “NIF 2.0 Core Specification”, “NIF 2.0 Resources and Architecture” and “Evaluation and Related Work” constitute one of the main contribution of this thesis. The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. The core specification is included in and describes which URI schemes and RDF vocabularies must be used for (parts of) natural language texts and annotations in order to create an RDF/OWL-based interoperability layer with NIF built upon Unicode Code Points in Normal Form C. In , classes and properties of the NIF Core Ontology are described to formally define the relations between text, substrings and their URI schemes. contains the evaluation of NIF. In a questionnaire, we asked questions to 13 developers using NIF. UIMA, GATE and Stanbol are extensible NLP frameworks and NIF was not yet able to provide off-the-shelf NLP domain ontologies for all possible domains, but only for the plugins used in this study. After inspecting the software, the developers agreed however that NIF is adequate enough to provide a generic RDF output based on NIF using literal objects for annotations. All developers were able to map the internal data structure to NIF URIs to serialize RDF output (Adequacy). The development effort in hours (ranging between 3 and 40 hours) as well as the number of code lines (ranging between 110 and 445) suggest, that the implementation of NIF wrappers is easy and fast for an average developer. Furthermore the evaluation contains a comparison to other formats and an evaluation of the available URI schemes for web annotation. In order to collect input from the wide group of stakeholders, a total of 16 presentations were given with extensive discussions and feedback, which has lead to a constant improvement of NIF from 2010 until 2013. After the release of NIF (Version 1.0) in November 2011, a total of 32 vocabulary employments and implementations for different NLP tools and converters were reported (8 by the (co-)authors, including Wiki-link corpus, 13 by people participating in our survey and 11 more, of which we have heard). Several roll-out meetings and tutorials were held (e.g. in Leipzig and Prague in 2013) and are planned (e.g. at LREC 2014). Part IV - The NLP Interchange Format in Use. “Use Cases and Applications for NIF” and “Publication of Corpora using NIF” describe 8 concrete instances where NIF has been successfully used. One major contribution in is the usage of NIF as the recommended RDF mapping in the Internationalization Tag Set (ITS) 2.0 W3C standard and the conversion algorithms from ITS to NIF and back. One outcome of the discussions in the standardization meetings and telephone conferences for ITS 2.0 resulted in the conclusion there was no alternative RDF format or vocabulary other than NIF with the required features to fulfill the working group charter. Five further uses of NIF are described for the Ontology of Linguistic Annotations (OLiA), the RDFaCE tool, the Tiger Corpus Navigator, the OntosFeeder and visualisations of NIF using the RelFinder tool. These 8 instances provide an implemented proof-of-concept of the features of NIF. starts with describing the conversion and hosting of the huge Google Wikilinks corpus with 40 million annotations for 3 million web sites. The resulting RDF dump contains 477 million triples in a 5.6 GB compressed dump file in turtle syntax. describes how NIF can be used to publish extracted facts from news feeds in the RDFLiveNews tool as Linked Data. Part V - Conclusions. provides lessons learned for NIF, conclusions and an outlook on future work. Most of the contributions are already summarized above. One particular aspect worth mentioning is the increasing number of NIF-formated corpora for Named Entity Recognition (NER) that have come into existence after the publication of the main NIF paper Integrating NLP using Linked Data at ISWC 2013. These include the corpora converted by Steinmetz, Knuth and Sack for the NLP & DBpedia workshop and an OpenNLP-based CoNLL converter by Brümmer. Furthermore, we are aware of three LREC 2014 submissions that leverage NIF: NIF4OGGD - NLP Interchange Format for Open German Governmental Data, N^3 – A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format and Global Intelligent Content: Active Curation of Language Resources using Linked Data as well as an early implementation of a GATE-based NER/NEL evaluation framework by Dojchinovski and Kliegr. Further funding for the maintenance, interlinking and publication of Linguistic Linked Data as well as support and improvements of NIF is available via the expiring LOD2 EU project, as well as the CSA EU project called LIDER, which started in November 2013. Based on the evidence of successful adoption presented in this thesis, we can expect a decent to high chance of reaching critical mass of Linked Data technology as well as the NIF standard in the field of Natural Language Processing and Language Resources. Linked Data RDF Datenintegration Sprachverarbeitung Datenverarbeitung Linguistik NLP Linked Data RDF Semantic Web Data Integration Data ddc:000 Informatik ddc:Informationswissenschaft ddc:allgemeine Werke ddc:410 Linguistik Sprachverarbeitung Automatische Spracherkennung Datenintegration Linked Data Semantic Web Semantisches Netz Wissensrepräsentation Ontologie

Search results