1 |
Treebanks and meter in 4th century Attic inscriptionsBeaulieu, Marie-Claire, Blackwell, Christopher W. 17 March 2017 (has links) (PDF)
No description available.
|
2 |
The Digital Rosetta StoneBerti, Monica, Jushaninowa, Julia, Naether, Franziska, Celano, Giuseppe G. A., Yordanova, Polina 20 April 2016 (has links) (PDF)
In cooperation with projects from colleagues in Berlin and powered by the British Museum in London, we present an ongoing project whose aim is to produce a digital edition of the Rosetta Stone (the “Decree of Memphis”). The project has two main goals: 1) textual alignment of the Hieroglyphic, Demotic and Greek versions of the Rosetta Stone; 2) morphosyntactic annotation of the three versions of the inscription. As first results, we present: 1) examples of alignment of the Hieroglyphic version of the text with translations into modern languages (through the Alpheios alignment editor; 2) the complete morphosyntactic annotation of the Greek text of the Rosetta Stone (through the Arethusa treebanking editor).
|
3 |
Construction automatique d'outils et de ressources linguistiques à partir de corpus parallèles / Automatic creation of linguistic tools and resources from parallel corporaZennaki, Othman 11 March 2019 (has links)
Cette thèse porte sur la construction automatique d’outils et de ressources pour l’analyse linguistique de textes des langues peu dotées. Nous proposons une approche utilisant des réseaux de neurones récurrents (RNN - Recurrent Neural Networks) et n'ayant besoin que d'un corpus parallèle ou mutli-parallele entre une langue source bien dotée et une ou plusieurs langues cibles moins bien ou peu dotées. Ce corpus parallèle ou mutli-parallele est utilisé pour la construction d'une représentation multilingue des mots des langues source et cible. Nous avons utilisé cette représentation multilingue pour l’apprentissage de nos modèles neuronaux et nous avons exploré deux architectures neuronales : les RNN simples et les RNN bidirectionnels. Nous avons aussi proposé plusieurs variantes des RNN pour la prise en compte d'informations linguistiques de bas niveau (informations morpho-syntaxiques) durant le processus de construction d'annotateurs linguistiques de niveau supérieur (SuperSenses et dépendances syntaxiques). Nous avons démontré la généricité de notre approche sur plusieurs langues ainsi que sur plusieurs tâches d'annotation linguistique. Nous avons construit trois types d'annotateurs linguistiques multilingues: annotateurs morpho-syntaxiques, annotateurs en SuperSenses et annotateurs en dépendances syntaxiques, avec des performances très satisfaisantes. Notre approche a les avantages suivants : (a) elle n'utilise aucune information d'alignement des mots, (b) aucune connaissance concernant les langues cibles traitées n'est requise au préalable (notre seule supposition est que, les langues source et cible n'ont pas une grande divergence syntaxique), ce qui rend notre approche applicable pour le traitement d'un très grand éventail de langues peu dotées, (c) elle permet la construction d'annotateurs multilingues authentiques (un annotateur pour N langages). / This thesis focuses on the automatic construction of linguistic tools and resources for analyzing texts of low-resource languages. We propose an approach using Recurrent Neural Networks (RNN) and requiring only a parallel or multi-parallel corpus between a well-resourced language and one or more low-resource languages. This parallel or multi-parallel corpus is used to construct a multilingual representation of words of the source and target languages. We used this multilingual representation to train our neural models and we investigated both uni and bidirectional RNN models. We also proposed a method to include external information (for instance, low-level information from Part-Of-Speech tags) in the RNN to train higher level taggers (for instance, SuperSenses taggers and Syntactic dependency parsers). We demonstrated the validity and genericity of our approach on several languages and we conducted experiments on various NLP tasks: Part-Of-Speech tagging, SuperSenses tagging and Dependency parsing. The obtained results are very satisfactory. Our approach has the following characteristics and advantages: (a) it does not use word alignment information, (b) it does not assume any knowledge about target languages (one requirement is that the two languages (source and target) are not too syntactically divergent), which makes it applicable to a wide range of low-resource languages, (c) it provides authentic multilingual taggers (one tagger for N languages).
|
4 |
Treebanks and meter in 4th century Attic inscriptionsBeaulieu, Marie-Claire, Blackwell, Christopher W. January 2016 (has links)
No description available.
|
5 |
The corpus of Greek medical papyri and digital papyrologyReggiani, Nicola 20 April 2016 (has links) (PDF)
The ongoing project of digitising a corpus of ancient Greek texts on papyrus dealing with medical topics raises some problematic questions involving general issues of digital papyrology. The main electronic resource of papyrological texts, the Papyrological Navigator (papyri.info), has indeed been designed to host documentary items, while the special technical, even literary nature of medical papyri (which include, besides documents related to medicine, also handbooks, school books, and treatises by both known and unknown authors) requires new ways to treat the relevant data (paratextual devices such as diacriticals, punctuation, abbreviatios, layout features). Such issues are currently under discussion by the team charged of the forthcoming Digital Corpus of Literary Papyri (DCLP), but further options need to be taken into consideration in order to develop a fully functional, interactive, dynamic database of ancient technical texts: in particular, this paper will present and discuss the potentialities of a multi-layer linguistic annotation (useful to fulfil the needs of a multifaceted technical language) and of a multitextual digital edition (helpful in consideration of the fragmentary condition of the texts and of their often problematic relationship with the known manuscript tradition).
|
6 |
The Digital Rosetta Stone: textual alignment and linguistic annotationBerti, Monica, Jushaninowa, Julia, Naether, Franziska, Celano, Giuseppe G. A., Yordanova, Polina January 2016 (has links)
In cooperation with projects from colleagues in Berlin and powered by the British Museum in London, we present an ongoing project whose aim is to produce a digital edition of the Rosetta Stone (the “Decree of Memphis”). The project has two main goals: 1) textual alignment of the Hieroglyphic, Demotic and Greek versions of the Rosetta Stone; 2) morphosyntactic annotation of the three versions of the inscription. As first results, we present: 1) examples of alignment of the Hieroglyphic version of the text with translations into modern languages (through the Alpheios alignment editor; 2) the complete morphosyntactic annotation of the Greek text of the Rosetta Stone (through the Arethusa treebanking editor).
|
7 |
The corpus of Greek medical papyri and digital papyrology: new perspectives from an ongoing projectReggiani, Nicola January 2016 (has links)
The ongoing project of digitising a corpus of ancient Greek texts on papyrus dealing with medical topics raises some problematic questions involving general issues of digital papyrology. The main electronic resource of papyrological texts, the Papyrological Navigator (papyri.info), has indeed been designed to host documentary items, while the special technical, even literary nature of medical papyri (which include, besides documents related to medicine, also handbooks, school books, and treatises by both known and unknown authors) requires new ways to treat the relevant data (paratextual devices such as diacriticals, punctuation, abbreviatios, layout features). Such issues are currently under discussion by the team charged of the forthcoming Digital Corpus of Literary Papyri (DCLP), but further options need to be taken into consideration in order to develop a fully functional, interactive, dynamic database of ancient technical texts: in particular, this paper will present and discuss the potentialities of a multi-layer linguistic annotation (useful to fulfil the needs of a multifaceted technical language) and of a multitextual digital edition (helpful in consideration of the fragmentary condition of the texts and of their often problematic relationship with the known manuscript tradition).
|
8 |
[en] METHODOLOGIES FOR CHARACTERIZING AND DETECTING EMOTIONAL DESCRIPTION IN THE PORTUGUESE LANGUAGE / [pt] METODOLOGIAS PARA CARACTERIZAÇÃO E DETECÇÃO DA DESCRIÇÃO DE EMOÇÃO NA LÍNGUA PORTUGUESABARBARA CRISTINA MARQUES P RAMOS 29 May 2023 (has links)
[pt] O interesse desta tese recai sobre compreender como os falantes de língua
portuguesa a utilizam para materializar a menção de emoção através de um
trabalho, sobretudo, linguístico. O objetivo geral da pesquisa é criar recursos para
aprimorar a anotação do campo semântico das emoções na língua portuguesa a
partir do projeto AC/DC, projeto que reúne e disponibiliza publicamente corpora
anotados e recursos para pesquisas na língua portuguesa, e do Emocionário,
projeto de anotação semântica e léxico de emoções. Inicialmente, a pesquisa dá
um panorama dos estudos de emoção; se alinha às perspectivas que refutam a
universalidade das emoções e abordagens que postulam emoções básicas; e
contrapõe seu interesse por menção de emoção à já consolidada área de Análise de
Sentimento, contrastando cinco léxicos de sentimento e/ou polaridades em língua
portuguesa e o Emocionário. A partir de uma ampla varredura nos corpora do
AC/DC, três principais caminhos foram percorridos para investigar palavras de
emoção: (i) uma análise dos vinte e quatro grupos de emoção que já existiam no
léxico do Emocionário a fim de delinear características e desafios no estudo de
emoção na língua portuguesa; (ii) a revisão completa um terço dos grupos do
léxico do Emocionário; e (iii) buscas pelo padrão léxico-sintático sentimento de
N e por expressões anotadas pelo projeto Esqueleto usadas para descrever
emoção. A análise dos corpora à luz dos lemas previamente pertencentes aos
grupos do léxico do Emocionário evidenciou, dentre outras características, a
relevância de expressões lexicalizadas para a análise da descrição de emoção, dos
tipos de argumentos de verbos e afixos que podem causar variação de sentido, e
de variações de tempo e modo verbal que acarretam mudança de significado.
Dentre os desafios estão palavras e expressões polissêmicas e a dificuldade na
detecção de diferentes sentidos em palavras que compartilham da mesma classe
gramatical, tendo como base somente informações morfossintáticas. Esta análise
possibilitou a estruturação e documentação de uma metodologia de revisão que
pode vir a ser aplicada nos demais grupos futuramente. As principais
contribuições desta tese são decorrentes das análises e explorações em corpora: a
limpeza de lemas com sentidos não-emocionais dos grupos do léxico do
Emocionário; a criação dos grupos de emoção Ausência e Outra,
enriquecendo o léxico; a detecção de mais de novecentos lemas e expressões
provenientes das buscas pelo padrão sentimento de N e das conexões
estabelecidas entre os campos semânticos de emoção e do corpo humano; além de
descobertas de campos lexicais pouco mencionados na literatura sobre emoção,
como coletividade, estranhamento, espiritualidade, parentesco e atos
automotivados, que auxiliaram na investigação de como os falantes do português
cristalizam emoções na língua. / [en] The interest of this thesis lies in understanding how Portuguese speakers use
it to materialize the mention of emotion through a linguistic perspective. The
general objective of the research is to create resources to improve the annotation
of the semantic field of emotions in the Portuguese language based on the AC/DC
project, which gathers and makes publicly available annotated corpora and tools
for linguistic research on Portuguese language. and Emocionário, which is both a
semantic annotation project and lexicon of emotions. Initially, the research gives
an overview of emotion studies; aligning itself with perspectives that refute the
universality of emotions and approaches that postulate basic emotions; and
contrasts the interest in emotion description to the already consolidated area of
Sentiment Analysis, comparing five lexicons of emotion and/or polarities in
Portuguese to Emocionário. From a broad sweep of the AC/DC corpora, three
main paths were taken towards investigating emotion words: (i) an analysis of the
twenty-four emotion groups previously composing the Emocionário lexicon in
order to delineate characteristics and challenges in the study of emotion
description in the Portuguese language; (ii) a thorough revision of one-third of the
Emocionário lexicon groups; and (iii) searches for the lexical-syntactic pattern
sentimento de N and for expressions annotated by the Esqueleto project used to
describe emotion. The corpora analysis in the light of the lemmas previously
belonging to the Emocionário lexicon groups showed, amongst other
characteristics, the relevance of lexicalized expressions for the analysis of the
emotion description, the types of arguments of verbs and affixes that can cause
variation in meaning, and variations in tense and verbal mode that lead to a
change in meaning. Amongst the challenges are polysemous words and
expressions and the difficulty in detecting different meanings in words that share
the same grammatical class, based only on morphosyntactic information. This
analysis enabled the structuring and documentation of a revision methodology that
may be applied in other groups in the future. The main contributions of this thesis
derive from the analyzes and explorations in corpora: the exclusion of lemmas
with non-emotional meanings from the Emocionário lexicon groups; the creation
of emotion groups Ausência and Outra, enriching the lexicon; the detection of
more than nine hundred lemmas and expressions from the searches for the
sentimento de N pattern and the connections established between the semantic
fields of emotion and the human body; in addition to discoveries of lexical fields
rarely mentioned in the literature on emotion, such as coletividade,
estranhamento, espiritualidade, parentesco e atos automotivados, which
helped in the investigation of how Portuguese speakers crystallize emotions in
language.
|
Page generated in 0.1075 seconds