Global ETD Search

1	Treebanks and meter in 4th century Attic inscriptions Beaulieu, Marie-Claire, Blackwell, Christopher W. 17 March 2017 (has links) (PDF) No description available. Treebank Griechisch Epigraphik Grabinschriften linguistische Annotation Perseids Treebank Greek Epigraphy Funerary Inscriptions Linguistic Annotation Perseids ddc:930
2	The Digital Rosetta Stone Berti, Monica, Jushaninowa, Julia, Naether, Franziska, Celano, Giuseppe G. A., Yordanova, Polina 20 April 2016 (has links) (PDF) In cooperation with projects from colleagues in Berlin and powered by the British Museum in London, we present an ongoing project whose aim is to produce a digital edition of the Rosetta Stone (the “Decree of Memphis”). The project has two main goals: 1) textual alignment of the Hieroglyphic, Demotic and Greek versions of the Rosetta Stone; 2) morphosyntactic annotation of the three versions of the inscription. As first results, we present: 1) examples of alignment of the Hieroglyphic version of the text with translations into modern languages (through the Alpheios alignment editor; 2) the complete morphosyntactic annotation of the Greek text of the Rosetta Stone (through the Arethusa treebanking editor). dreisprachiges Dekret Zeilenabgleich von Sprachen am Foto lingistische Annotation Tree-Banking Alpheios trilingual decree alignment of languages and photo linguistic annotation tree banking Alpheios ddc:930
3	Construction automatique d'outils et de ressources linguistiques à partir de corpus parallèles / Automatic creation of linguistic tools and resources from parallel corpora Zennaki, Othman 11 March 2019 (has links) Cette thèse porte sur la construction automatique d’outils et de ressources pour l’analyse linguistique de textes des langues peu dotées. Nous proposons une approche utilisant des réseaux de neurones récurrents (RNN - Recurrent Neural Networks) et n'ayant besoin que d'un corpus parallèle ou mutli-parallele entre une langue source bien dotée et une ou plusieurs langues cibles moins bien ou peu dotées. Ce corpus parallèle ou mutli-parallele est utilisé pour la construction d'une représentation multilingue des mots des langues source et cible. Nous avons utilisé cette représentation multilingue pour l’apprentissage de nos modèles neuronaux et nous avons exploré deux architectures neuronales : les RNN simples et les RNN bidirectionnels. Nous avons aussi proposé plusieurs variantes des RNN pour la prise en compte d'informations linguistiques de bas niveau (informations morpho-syntaxiques) durant le processus de construction d'annotateurs linguistiques de niveau supérieur (SuperSenses et dépendances syntaxiques). Nous avons démontré la généricité de notre approche sur plusieurs langues ainsi que sur plusieurs tâches d'annotation linguistique. Nous avons construit trois types d'annotateurs linguistiques multilingues: annotateurs morpho-syntaxiques, annotateurs en SuperSenses et annotateurs en dépendances syntaxiques, avec des performances très satisfaisantes. Notre approche a les avantages suivants : (a) elle n'utilise aucune information d'alignement des mots, (b) aucune connaissance concernant les langues cibles traitées n'est requise au préalable (notre seule supposition est que, les langues source et cible n'ont pas une grande divergence syntaxique), ce qui rend notre approche applicable pour le traitement d'un très grand éventail de langues peu dotées, (c) elle permet la construction d'annotateurs multilingues authentiques (un annotateur pour N langages). / This thesis focuses on the automatic construction of linguistic tools and resources for analyzing texts of low-resource languages. We propose an approach using Recurrent Neural Networks (RNN) and requiring only a parallel or multi-parallel corpus between a well-resourced language and one or more low-resource languages. This parallel or multi-parallel corpus is used to construct a multilingual representation of words of the source and target languages. We used this multilingual representation to train our neural models and we investigated both uni and bidirectional RNN models. We also proposed a method to include external information (for instance, low-level information from Part-Of-Speech tags) in the RNN to train higher level taggers (for instance, SuperSenses taggers and Syntactic dependency parsers). We demonstrated the validity and genericity of our approach on several languages and we conducted experiments on various NLP tasks: Part-Of-Speech tagging, SuperSenses tagging and Dependency parsing. The obtained results are very satisfactory. Our approach has the following characteristics and advantages: (a) it does not use word alignment information, (b) it does not assume any knowledge about target languages (one requirement is that the two languages (source and target) are not too syntactically divergent), which makes it applicable to a wide range of low-resource languages, (c) it provides authentic multilingual taggers (one tagger for N languages). Corpus parallèle Corpus comparable Annotation linguistique Projection interlingue d’annotations Induction Parallel corpora Comparable corpora Linguistic annotation Cross-Language projection of annotations Induction 004
4	Treebanks and meter in 4th century Attic inscriptions Beaulieu, Marie-Claire, Blackwell, Christopher W. January 2016 (has links) No description available. info:eu-repo/classification/ddc/930 ddc:930
5	The corpus of Greek medical papyri and digital papyrology Reggiani, Nicola 20 April 2016 (has links) (PDF) The ongoing project of digitising a corpus of ancient Greek texts on papyrus dealing with medical topics raises some problematic questions involving general issues of digital papyrology. The main electronic resource of papyrological texts, the Papyrological Navigator (papyri.info), has indeed been designed to host documentary items, while the special technical, even literary nature of medical papyri (which include, besides documents related to medicine, also handbooks, school books, and treatises by both known and unknown authors) requires new ways to treat the relevant data (paratextual devices such as diacriticals, punctuation, abbreviatios, layout features). Such issues are currently under discussion by the team charged of the forthcoming Digital Corpus of Literary Papyri (DCLP), but further options need to be taken into consideration in order to develop a fully functional, interactive, dynamic database of ancient technical texts: in particular, this paper will present and discuss the potentialities of a multi-layer linguistic annotation (useful to fulfil the needs of a multifaceted technical language) and of a multitextual digital edition (helpful in consideration of the fragmentary condition of the texts and of their often problematic relationship with the known manuscript tradition). griechische medizinische Papyri digitiale Edition mit TEI/EpiDoc mehrschichtige linguistische Annotation Tree-Banking Lemmatisierung greek medical papyri digital encoding with TEI/EpiDoc multi-layer linguistic annotation treebanking lemmatisation ddc:930
6	The Digital Rosetta Stone: textual alignment and linguistic annotation Berti, Monica, Jushaninowa, Julia, Naether, Franziska, Celano, Giuseppe G. A., Yordanova, Polina January 2016 (has links) In cooperation with projects from colleagues in Berlin and powered by the British Museum in London, we present an ongoing project whose aim is to produce a digital edition of the Rosetta Stone (the “Decree of Memphis”). The project has two main goals: 1) textual alignment of the Hieroglyphic, Demotic and Greek versions of the Rosetta Stone; 2) morphosyntactic annotation of the three versions of the inscription. As first results, we present: 1) examples of alignment of the Hieroglyphic version of the text with translations into modern languages (through the Alpheios alignment editor; 2) the complete morphosyntactic annotation of the Greek text of the Rosetta Stone (through the Arethusa treebanking editor). info:eu-repo/classification/ddc/930 ddc:930
7	The corpus of Greek medical papyri and digital papyrology: new perspectives from an ongoing project Reggiani, Nicola January 2016 (has links) The ongoing project of digitising a corpus of ancient Greek texts on papyrus dealing with medical topics raises some problematic questions involving general issues of digital papyrology. The main electronic resource of papyrological texts, the Papyrological Navigator (papyri.info), has indeed been designed to host documentary items, while the special technical, even literary nature of medical papyri (which include, besides documents related to medicine, also handbooks, school books, and treatises by both known and unknown authors) requires new ways to treat the relevant data (paratextual devices such as diacriticals, punctuation, abbreviatios, layout features). Such issues are currently under discussion by the team charged of the forthcoming Digital Corpus of Literary Papyri (DCLP), but further options need to be taken into consideration in order to develop a fully functional, interactive, dynamic database of ancient technical texts: in particular, this paper will present and discuss the potentialities of a multi-layer linguistic annotation (useful to fulfil the needs of a multifaceted technical language) and of a multitextual digital edition (helpful in consideration of the fragmentary condition of the texts and of their often problematic relationship with the known manuscript tradition). info:eu-repo/classification/ddc/930 ddc:930
8	[en] METHODOLOGIES FOR CHARACTERIZING AND DETECTING EMOTIONAL DESCRIPTION IN THE PORTUGUESE LANGUAGE / [pt] METODOLOGIAS PARA CARACTERIZAÇÃO E DETECÇÃO DA DESCRIÇÃO DE EMOÇÃO NA LÍNGUA PORTUGUESA BARBARA CRISTINA MARQUES P RAMOS 29 May 2023 (has links) [pt] O interesse desta tese recai sobre compreender como os falantes de língua portuguesa a utilizam para materializar a menção de emoção através de um trabalho, sobretudo, linguístico. O objetivo geral da pesquisa é criar recursos para aprimorar a anotação do campo semântico das emoções na língua portuguesa a partir do projeto AC/DC, projeto que reúne e disponibiliza publicamente corpora anotados e recursos para pesquisas na língua portuguesa, e do Emocionário, projeto de anotação semântica e léxico de emoções. Inicialmente, a pesquisa dá um panorama dos estudos de emoção; se alinha às perspectivas que refutam a universalidade das emoções e abordagens que postulam emoções básicas; e contrapõe seu interesse por menção de emoção à já consolidada área de Análise de Sentimento, contrastando cinco léxicos de sentimento e/ou polaridades em língua portuguesa e o Emocionário. A partir de uma ampla varredura nos corpora do AC/DC, três principais caminhos foram percorridos para investigar palavras de emoção: (i) uma análise dos vinte e quatro grupos de emoção que já existiam no léxico do Emocionário a fim de delinear características e desafios no estudo de emoção na língua portuguesa; (ii) a revisão completa um terço dos grupos do léxico do Emocionário; e (iii) buscas pelo padrão léxico-sintático sentimento de N e por expressões anotadas pelo projeto Esqueleto usadas para descrever emoção. A análise dos corpora à luz dos lemas previamente pertencentes aos grupos do léxico do Emocionário evidenciou, dentre outras características, a relevância de expressões lexicalizadas para a análise da descrição de emoção, dos tipos de argumentos de verbos e afixos que podem causar variação de sentido, e de variações de tempo e modo verbal que acarretam mudança de significado. Dentre os desafios estão palavras e expressões polissêmicas e a dificuldade na detecção de diferentes sentidos em palavras que compartilham da mesma classe gramatical, tendo como base somente informações morfossintáticas. Esta análise possibilitou a estruturação e documentação de uma metodologia de revisão que pode vir a ser aplicada nos demais grupos futuramente. As principais contribuições desta tese são decorrentes das análises e explorações em corpora: a limpeza de lemas com sentidos não-emocionais dos grupos do léxico do Emocionário; a criação dos grupos de emoção Ausência e Outra, enriquecendo o léxico; a detecção de mais de novecentos lemas e expressões provenientes das buscas pelo padrão sentimento de N e das conexões estabelecidas entre os campos semânticos de emoção e do corpo humano; além de descobertas de campos lexicais pouco mencionados na literatura sobre emoção, como coletividade, estranhamento, espiritualidade, parentesco e atos automotivados, que auxiliaram na investigação de como os falantes do português cristalizam emoções na língua. / [en] The interest of this thesis lies in understanding how Portuguese speakers use it to materialize the mention of emotion through a linguistic perspective. The general objective of the research is to create resources to improve the annotation of the semantic field of emotions in the Portuguese language based on the AC/DC project, which gathers and makes publicly available annotated corpora and tools for linguistic research on Portuguese language. and Emocionário, which is both a semantic annotation project and lexicon of emotions. Initially, the research gives an overview of emotion studies; aligning itself with perspectives that refute the universality of emotions and approaches that postulate basic emotions; and contrasts the interest in emotion description to the already consolidated area of Sentiment Analysis, comparing five lexicons of emotion and/or polarities in Portuguese to Emocionário. From a broad sweep of the AC/DC corpora, three main paths were taken towards investigating emotion words: (i) an analysis of the twenty-four emotion groups previously composing the Emocionário lexicon in order to delineate characteristics and challenges in the study of emotion description in the Portuguese language; (ii) a thorough revision of one-third of the Emocionário lexicon groups; and (iii) searches for the lexical-syntactic pattern sentimento de N and for expressions annotated by the Esqueleto project used to describe emotion. The corpora analysis in the light of the lemmas previously belonging to the Emocionário lexicon groups showed, amongst other characteristics, the relevance of lexicalized expressions for the analysis of the emotion description, the types of arguments of verbs and affixes that can cause variation in meaning, and variations in tense and verbal mode that lead to a change in meaning. Amongst the challenges are polysemous words and expressions and the difficulty in detecting different meanings in words that share the same grammatical class, based only on morphosyntactic information. This analysis enabled the structuring and documentation of a revision methodology that may be applied in other groups in the future. The main contributions of this thesis derive from the analyzes and explorations in corpora: the exclusion of lemmas with non-emotional meanings from the Emocionário lexicon groups; the creation of emotion groups Ausência and Outra, enriching the lexicon; the detection of more than nine hundred lemmas and expressions from the searches for the sentimento de N pattern and the connections established between the semantic fields of emotion and the human body; in addition to discoveries of lexical fields rarely mentioned in the literature on emotion, such as coletividade, estranhamento, espiritualidade, parentesco e atos automotivados, which helped in the investigation of how Portuguese speakers crystallize emotions in language. [pt] PROCESSAMENTO DE LINGUAGEM NATURAL [pt] LEXICOS [pt] ANOTACAO LINGUISTICA [pt] ESTUDOS LINGUISTICOS COM CORPUS [pt] DESCRICAO DE EMOCAO [pt] DESCRICAO DO PORTUGUES [en] NATURAL LANGUAGE PROCESSING [en] LEXICONS [en] LINGUISTIC ANNOTATION [en] CORPORA ANALYSIS [en] EMOTION ANALYSIS [en] PORTUGUESE DESCRIPTION

Search results