Spelling suggestions: "subject:"thereference"" "subject:"dereference""
1 |
Semi-automated co-reference identification in digital humanities collectionsCroft, David January 2014 (has links)
Locating specific information within museum collections represents a significant challenge for collection users. Even when the collections and catalogues exist in a searchable digital format, formatting differences and the imprecise nature of the information to be searched mean that information can be recorded in a large number of different ways. This variation exists not just between different collections, but also within individual ones. This means that traditional information retrieval techniques are badly suited to the challenges of locating particular information in digital humanities collections and searching, therefore, takes an excessive amount of time and resources. This thesis focuses on a particular search problem, that of co-reference identification. This is the process of identifying when the same real world item is recorded in multiple digital locations. In this thesis, a real world example of a co-reference identification problem for digital humanities collections is identified and explored. In particular the time consuming nature of identifying co-referent records. In order to address the identified problem, this thesis presents a novel method for co-reference identification between digitised records in humanities collections. Whilst the specific focus of this thesis is co-reference identification, elements of the method described also have applications for general information retrieval. The new co-reference method uses elements from a broad range of areas including; query expansion, co-reference identification, short text semantic similarity and fuzzy logic. The new method was tested against real world collections information, the results of which suggest that, in terms of the quality of the co-referent matches found, the new co-reference identification method is at least as effective as a manual search. The number of co-referent matches found however, is higher using the new method. The approach presented here is capable of searching collections stored using differing metadata schemas. More significantly, the approach is capable of identifying potential co-reference matches despite the highly heterogeneous and syntax independent nature of the Gallery, Library Archive and Museum (GLAM) search space and the photo-history domain in particular. The most significant benefit of the new method is, however, that it requires comparatively little manual intervention. A co-reference search using it has, therefore, significantly lower person hour requirements than a manually conducted search. In addition to the overall co-reference identification method, this thesis also presents: • A novel and computationally lightweight short text semantic similarity metric. This new metric has a significantly higher throughput than the current prominent techniques but a negligible drop in accuracy. • A novel method for comparing photographic processes in the presence of variable terminology and inaccurate field information. This is the first computational approach to do so.
|
2 |
A influência da coesão e da coerência no processamento Correferencial de pronomes e nomes repetidos em português brasileiroSimões, Antonia Barros Gibson 25 March 2014 (has links)
Submitted by Maike Costa (maiksebas@gmail.com) on 2017-03-29T13:09:38Z
No. of bitstreams: 1
arquivototal.pdf: 1881713 bytes, checksum: a9b89058147d2b2baedc5e63b41e978a (MD5) / Made available in DSpace on 2017-03-29T13:09:38Z (GMT). No. of bitstreams: 1
arquivototal.pdf: 1881713 bytes, checksum: a9b89058147d2b2baedc5e63b41e978a (MD5)
Previous issue date: 2014-03-25 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / This dissertation, in the context of Experimental Psycholinguistics, aims to investigate the influence of cohesion and coherence in the co-referential processing of repeated nouns and pronouns in brazilian portuguese, increasing the comprehension about the co-referential processing. Through the application of two experiments, using the online technique of automonitored reading, we manipulated a cohesion element (more precisely the presence or absence of connectives) as well as the coherence of the experimental sentences to investigate the anaphoric solution behavior in the face of different experimental conditions raised by such manipulations. The pronominal processing, observed in experiment 1, was not affected by incongruity issues or by the presence of connectives. Sentences of experiment 2 had significant differences in the time of reading, from the reading of anaphoric resumption with repeated noun. We observed that the processing of incongruent linguistic elements and anaphoric resumption with repeated nouns is more costly to the work memory when compared with the processing of incongruities and pronominal anaphora. Furthermore, through the acquisition and analysis of the reading times of the subjects who participate in the experiments, we could observe the psychological reality of certain postulates of Textual Linguistics, contributing to the approximation between two important areas of study about the human language. / Esta dissertação, no âmbito da Psicolinguística Experimental, tem como objetivo investigar a influência da coesão e da coerência no processamento correferencial de pronomes e nomes repetidos em português brasileiro, ampliando a compreensão sobre o processamento correferencial. Através da aplicação de dois experimentos, utilizando a técnica online de leitura automonitorada, manipulamos um elemento de coesão (mais precisamente a presença ou ausência de conectivos) assim como a coerência das sentenças experimentais para investigar o comportamento da solução anafórica diante das diferentes condições experimentais suscitadas por tais manipulações. O processamento pronominal, observado no experimento 1, não foi afetado por questões de incongruência ou pela presença de conectivos. Sentenças do experimento 2 tiveram diferenças significativas no tempo de leitura, a partir da leitura da retomada anafórica com nome repetido. Observamos que o processamento de elementos linguísticos incongruentes e retomada anafórica com nomes repetidos é mais custoso para a memória de trabalho quando comparados ao processamento de incongruências e anáfora pronominal. Além disso, através da obtenção e análise dos tempos de leitura dos sujeitos participantes dos experimentos, pudemos observar a realidade psicológica de certos postulados da Linguística Textual, contribuindo para a aproximação entre duas áreas importantes de estudo a respeito da linguagem humana.
|
3 |
Processamento da co-referência: pronomes lexicais, nomes repetidos, hiperônimos e hipônimos coino formas de retomada anaforica inter-sentencial do sujeito em português brasileiro / Processing the co-reference: lexical pronouns, repetead NPs, hiperonimos and hipônimos as ways of inter-sentential anaphoric recalling in Brazilian PortuguesQueiroz, Karla Lima de 22 December 2009 (has links)
Submitted by Viviane Lima da Cunha (viviane@biblioteca.ufpb.br) on 2016-09-23T12:22:42Z
No. of bitstreams: 1
arquivototal.pdf: 3967529 bytes, checksum: 17e8000091e75546f18d4625ec646445 (MD5) / Made available in DSpace on 2016-09-23T12:22:42Z (GMT). No. of bitstreams: 1
arquivototal.pdf: 3967529 bytes, checksum: 17e8000091e75546f18d4625ec646445 (MD5)
Previous issue date: 2009-12-22 / The co-reference is usually defined as a strategy of textual progression that recalls a previous entity, called antecedent, through the use of an anaphora. Different areas of science have studied the co-reference for its relevance to the local coherence, as well as to the discourse comprehension. However, a crucial question remains and requires to be clarified, that is about the cognitive mechanisms and the linguistics principles that underlie the choice of anaphora among the various forms of the language. The current work has its theoretical connection with Experimental Psycholinguistics which deals with the process of the phases, and more especifically with the processing of the co-reference. This work compares the efficiency of the lexical pronouns vs. repeated-name, as well as hiperonimos vs. hiponimos, both seen as ways of one's inter-sentential anaphoric recalling in Brazilian Portuguese. It was also verified that the wide range of the Centering Theory (Grosz, Joshi e Weinstein, 1983, 1995), which states a slower effect known as Repeated NP Penalty when recalling an antecedent using the repeated noun instead of a pronoun, as well as the The Informational Load Hypotheses (Almor 1990, 1999, 2000), as an alternative concept that connects the processing cost with the discourse function. For that, the current research used three on-line self paced reading experiments and analyzed its results by using both the T-Test and the ANOVA one. In the first experiment the lexical pronouns were read faster than the repeated pronouns, in accordance to the Centering Theory and the he Informational Load Hypotheses. The second experiment, the co-reference was better established by the hiperônimos instead of hipônimos, reinforcing the cost-funcion principle stated by Almor (1990, 1999, 2000), while the Repeated NP Penalty by Gordon et. al. (1993, 1995), is concerned about the dichotomy between lexical pronouns vs repeated nouns. The syntactic prominence was also discussed in Chamber's & Smith (1999) research as well as in Leitão's (2005), but the third experiment stated the independent act of the syntactic prominence in relation to the type of the anaphora, even though that does not exclude the influence of the structural parallelism. / A co-referência é cornumente definida como uma estratégia de progressão textual e
caracterizada pela retomada de urna entidade prévia, também denominada antecedente,
através de urna anáfora. Ela vem sendo estudada por várias áreas do conhecimento científico, devido a sua importância para a coerência local e, conseqüentemente, para a compreensão do discurso, mas uma questão crucial continua em aberto e requer maiores esclarecimentos: quais os mecanismos cognitivos e os princípios lingüísticos que subjazem a escolha da anáfora, entre a multiplicidade de formas existente na língua. O presente trabalho se insere no quadro teórico da Psicolingüística Experimental que trata do processamento de frases e, mais especificamente, do processamento da co-referência. Nele, compararnos a eficiência dos pronomes lexicais vs. nomes repetidos e dos hiperônimos vs. hipônimos, como formas de retomada anafórica inter-sentencial do sujeito em Português Brasileiro. Verificamos também a abrangência explicativa da Teoria da Centralização (Grosz, Joshi e Weinstein, 1983, 1995), que postula urn efeito de retardamento, mais conhecido corno Penalidade do Nome Repetido, ao retomar um antecedente proeminente sintaticamente usando urn nome repetido em vez de urn pronome, e da Hipótese da Carga Informacional (Almor 1990, 1999, 2000), com uma concepção alternativa que relaciona o custo de processarnento e atenção discursiva.
Para isso, aplicamos três experimentos on-line de leitura auto-monitorada e validamos estatisticamente seus resultados através do Teste-T e da ANOVA. No primeiro experimento, os pronomes lexicais foram lidos mais rapidamente do que os names repetidos, em consonância corn a Teoria da Centralização e com a Hipótese da Carga Informacional. No segundo experimento, a co-referenda foi estabelecida rnais facilrnente pelos hiperônimos do que pelos hipônimos, ratificando o principio de otimização entre custo de processarnento e função discursiva, defendido por Almor (1990, 1999, 2000), enquanto a Penalidade do Nome-Repetido, constatada pioneiramente por Gordon et. al. (1993, 1995), limita-se a dicotornia pronornes lexicais vs. nomes repetidos. A proeminência sintática também foi questionada no estudo de Chambers e Smith (1999) em Inglês e de Leitão (2005) em Português Brasileiro, mas o terceiro e último experirnento comprovou sua atuação independente do tipo de anáfora, apesar de não descartar a influência do paralelismo estrutural.
|
4 |
Pozice antecedentu zájmena třetí osoby: tendence v současné češtině / Position of Antecedent of the 3rd Person Pronoun: Trends in Contemporary CzechPoncarová, Alena January 2013 (has links)
The aim of this thesis is to find out how co-referential relationships in Czech texts are developed in terms of the information and constituent structures. The data was processed using the following methods. Firstly I examined the available literature on information and constituent structures with special attention being paid to the formation of coreferential relations. This information was then compared to my own research. With regards to disparities between the approaches, it was necessary to introduce some basic terms that are used within my thesis (Chapter 2). Thereafter, I was able to introduce current approaches to the thesis, including the methodological basis of my work, which is represented by the foreign Centering Theory, because unlike other Czech approaches this theory includes predictions on the frequency of different ways of co-referential relationship development (Chapter 3). Secondly, the knowledge gained from literature was applied to two different types of data: corpus material of Prague Dependency Treebank and material obtained through a questionnaire survey (Chapter 4). The most appropriate methods of developing co-referential chains in the Czech text were determined by challenging the prediction model with the obtained data set. Using this method we gained information about...
|
5 |
Syntaxe et sémantique de IT référentiel en anglais contemporain / The syntax and semantics of referential IT in contemporary englishDali, Narjes 31 May 2011 (has links)
Partant du constat que le pronom IT connaît, en anglais contemporain, une grande richesse d‘emplois, cette thèse propose une étude de IT référentiel et vise à examiner ses fonctions, son positionnement phrastique et son pouvoir référentiel. Ce pronom occupe toutes les places syntaxiques au sein de la phrase. Il a la spécificité de renvoyer à une entité beaucoup plus complexe qu‘un groupe nominal. De plus, le rapport de IT avec ses antécédents est au cœur de cette étude qui examine aussi les différents facteurs jouant un rôle dans l‘identification du bon référent où la présence textuelle ou situationnelle d‘un antécédent n‘est pas une condition nécessaire pour que le pronom soit référentiel. Un traitement global de tous les emplois référentiels de IT est proposé, car quelle que soit la place de l‘objet désigné par IT, cet objet appartient à la mémoire commune du locuteur et de l‘allocutaire. / In contemporary English, the pronoun IT is used in a great variety of contexts. This doctoral thesis proposes a study of referential IT and aims at examining its functions, its phrasal positions and its referential potential. This pronoun occupies all the syntactical places in the sentence. IT also has the specificity to refer to a more complex entity than to a simple nominal group. The relationship between IT and its antecedents is also in the heart of the present study that examines the various factors playing a role in the identification of the good referent where the textual or the situational presence of an antecedent is not sufficient for the pronoun to be referential. A global treatment of all the referential uses of IT is proposed: whatever is the place of the object indicated by IT, this object belongs to the common memory of the speaker.
|
6 |
A Grammar of Ese Ejja, a Bolivian language of the Amazon- Grammaire de l'ese ejja, langue tacana d'Amazonie bolivienne / Grammaire de l'ese ejja, langue tacana d'Amazonie bolivienneVuillermet, Marine 14 September 2012 (has links)
L’ese ejja (takana) est une langue amazonienne en danger, parlée en Bolivie et au Pérou par environ 1 500 locuteurs. La première partie offre un profil sociolinguistique et décrit la méthodologie de collecte des données auprès d’une douzaine de locuteurs, lors de 5 terrains réalisés dans la communauté de Portachuelo, Bolivie, entre 2005 et 2009. La deuxième partie est une grammaire qui situe l’ese ejja typologiquement parmi les langues du monde, aréalement en tant que langue amazonienne et génétiquement au sein de la famille takana. Phonologiquement la langue est remarquable pour ses deux implosives sourdes et un système accentuel verbal très complexe sensible, entre autre, à la valence du radical. La complexité morphologique est frappante : parmi les 13 positions du prédicat verbal, on trouve des combinaisons lexicales de deux racines, de l’incorporation nominale et de nombreux suffixes plus au moins lexicaux. Particulièrement intéressants sont les suffixes d’Aktionsart qui ont une sémantique d’adverbes, et le riche système (10 suffixes) de ‘mouvement associé’, aussi attesté dans la langue sœur cavineña et des langues australiennes. Les adjectifs les plus fréquents sont prédicatifs et peuvent productivement avoir un nom incorporé. Polygrammaticalisés, les 4 verbes de posture sont omniprésents dans la grammaire, dans les constructions locative, existentielle et possessive, et comme suffixes de présent et d’imperfectif. Enfin, il existe 2 systèmes de co-référence pour 4 types de subordonnées : tous les deux sont tripartites et vont au-delà de l’opposition binaire ‘sujet identique/différent’ mieux connue. Un DVD avec les fichiers audio des textes en annexe et le matériel de revitalisation produit est joint. / Ese Ejja (Takana) is an endangered language of the Amazon, spoken by about 1,500 people in Peru and Bolivia. The first part is a sociolinguistic profile and describes the methodology: the data were recorded from a dozen speakers, in the course of 5 fieldtrips between 2005 and 2009 in Portachuelo, a Bolivian community. The second part is a grammar that places Ese Ejja typologically among the world languages, areally as an Amazonian language and genetically within the Takanan family. Among its interesting phonological features are two voiceless implosives and its complex verbal accent that is sensitive to stem valency. The morphology of the verb predicate is also intricate, with its 13 slots: roots can combine to form a compound stem, nouns can be incorporated and numerous morphemes of a (more or less) clear lexical origin suffixed. Of specific interest are the Aktionsart verbal suffixes with their adverbial semantics and the rich system of 10 ‘associated motion’ morphemes, also attested in the sister-language Cavineña and in some Australian languages. Predicative adjectives are the most frequent of the two adjective classes, and productively incorporate nouns. The 4 posture verbs are polygrammaticalized and thus omnipresent in the grammar: they appear in basic locative, existential and possessive constructions or as suffixes of present and of imperfective. Two systems of co-reference are distributed among 4 types of subordinate clauses: both systems are tripartite, i.e. go far beyond the better-known ‘same subject/different subject’ binary opposition. A DVD with the audio-files of the texts in the appendix and with the produced revitalization material accompanies the dissertation.
|
Page generated in 0.0923 seconds