Global ETD Search

21	A Performance Analysis Framework for Coreference Resolution Algorithms Patel, Chandankumar Johakhim 29 August 2016 (has links) No description available. Computer Science Computer Engineering Coreference Levenshtein LCS OWL RDF longest common substring jaro winkler
22	Prerequisites for Extracting Entity Relations from Swedish Texts Lenas, Erik January 2020 (has links) Natural language processing (NLP) is a vibrant area of research with many practical applications today like sentiment analyses, text labeling, questioning an- swering, machine translation and automatic text summarizing. At the moment, research is mainly focused on the English language, although many other lan- guages are trying to catch up. This work focuses on an area within NLP called information extraction, and more specifically on relation extraction, that is, to ex- tract relations between entities in a text. What this work aims at is to use machine learning techniques to build a Swedish language processing pipeline with part-of- speech tagging, dependency parsing, named entity recognition and coreference resolution to use as a base for later relation extraction from archival texts. The obvious difficulty lies in the scarcity of Swedish annotated datasets. For exam- ple, no large enough Swedish dataset for coreference resolution exists today. An important part of this work, therefore, is to create a Swedish coreference solver using distantly supervised machine learning, which means creating a Swedish dataset by applying an English coreference solver on an unannotated bilingual corpus, and then using a word-aligner to translate this machine-annotated En- glish dataset to a Swedish dataset, and then training a Swedish model on this dataset. Using Allen NLP:s end-to-end coreference resolution model, both for creating the Swedish dataset and training the Swedish model, this work achieves an F1-score of 0.5. For named entity recognition this work uses the Swedish BERT models released by the Royal Library of Sweden in February 2020 and achieves an overall F1-score of 0.95. To put all of these NLP-models within a single Lan- guage Processing Pipeline, Spacy is used as a unifying framework. / Natural Language Processing (NLP) är ett stort och aktuellt forskningsområde idag med många praktiska tillämpningar som sentimentanalys, textkategoriser- ing, maskinöversättning och automatisk textsummering. Forskningen är för när- varande mest inriktad på det engelska språket, men många andra språkområ- den försöker komma ikapp. Det här arbetet fokuserar på ett område inom NLP som kallas informationsextraktion, och mer specifikt relationsextrahering, det vill säga att extrahera relationer mellan namngivna entiteter i en text. Vad det här ar- betet försöker göra är att använda olika maskininlärningstekniker för att skapa en svensk Language Processing Pipeline bestående av part-of-speech tagging, de- pendency parsing, named entity recognition och coreference resolution. Denna pipeline är sedan tänkt att användas som en bas for senare relationsextrahering från svenskt arkivmaterial. Den uppenbara svårigheten med detta ligger i att det är ont om stora, annoterade svenska dataset. Till exempel så finns det inget till- räckligt stort svenskt dataset för coreference resolution. En stor del av detta arbete går därför ut på att skapa en svensk coreference solver genom att implementera distantly supervised machine learning, med vilket menas att använda en engelsk coreference solver på ett oannoterat engelskt-svenskt corpus, och sen använda en word-aligner för att översätta detta maskinannoterade engelska dataset till ett svenskt, och sen träna en svensk coreference solver på detta dataset. Det här arbetet använder Allen NLP:s end-to-end coreference solver, både för att skapa det svenska datasetet, och för att träna den svenska modellen, och uppnår en F1-score på 0.5. Vad gäller named entity recognition så använder det här arbetet Kungliga Bibliotekets BERT-modeller som bas, och uppnår genom detta en F1- score på 0.95. Spacy används som ett enande ramverk för att samla alla dessa NLP-komponenter inom en enda pipeline. Machine Learning Natural Language Processing Relation Extraction Named Entity Recognition Coreference resolution BERT Maskininlärning Natural Language Processing Relationsextrahering Named Entity Recognition Coreference resolution BERT Computer and Information Sciences Data- och informationsvetenskap
23	Knowledge acquisition from user reviews for interactive question answering Konstantinova, Natalia January 2013 (has links) Nowadays, the effective management of information is extremely important for all spheres of our lives and applications such as search engines and question answering systems help users to find the information that they need. However, even when assisted by these various applications, people sometimes struggle to find what they want. For example, when choosing a product customers can be confused by the need to consider many features before they can reach a decision. Interactive question answering (IQA) systems can help customers in this process, by answering questions about products and initiating a dialogue with the customers when their needs are not clearly defined. The focus of this thesis is how to design an interactive question answering system that will assist users in choosing a product they are looking for, in an optimal way, when a large number of similar products are available. Such an IQA system will be based on selecting a set of characteristics (also referred to as product features in this thesis), that describe the relevant product, and narrowing the search space. We believe that the order in which these characteristics are presented in terms of these IQA sessions is of high importance. Therefore, they need to be ranked in order to have a dialogue which selects the product in an efficient manner. The research question investigated in this thesis is whether product characteristics mentioned in user reviews are important for a person who is likely to purchase a product and can therefore be used when designing an IQA system. We focus our attention on products such as mobile phones; however, the proposed techniques can be adapted for other types of products if the data is available. Methods from natural language processing (NLP) fields such as coreference resolution, relation extraction and opinion mining are combined to produce various rankings of phone features. The research presented in this thesis employs two corpora which contain texts related to mobile phones specifically collected for this thesis: a corpus of Wikipedia articles about mobile phones and a corpus of mobile phone reviews published on the Epinions.com website. Parts of these corpora were manually annotated with coreference relations, mobile phone features and relations between mentions of the phone and its features. The annotation is used to develop a coreference resolution module as well as a machine learning-based relation extractor. Rule-based methods for identification of coreference chains describing the phone are designed and thoroughly evaluated against the annotated gold standard. Machine learning is used to find links between mentions of the phone (identified by coreference resolution) and phone features. It determines whether some phone feature belong to the phone mentioned in the same sentence or not. In order to find the best rankings, this thesis investigates several settings. One of the hypotheses tested here is that the relatively low results of the proposed baseline are caused by noise introduced by sentences which are not directly related to the phone and phone feature. To test this hypothesis, only sentences which contained mentions of the mobile phone and a phone feature linked to it were processed to produce rankings of the phones features. Selection of the relevant sentences is based on the results of coreference resolution and relation extraction. Another hypothesis is that opinionated sentences are a good source for ranking the phone features. In order to investigate this, a sentiment classification system is also employed to distinguish between features mentioned in positive and negative contexts. The detailed evaluation and error analysis of the methods proposed form an important part of this research and ensure that the results provided in this thesis are reliable. 006.3
24	Coreference resolution with and for Wikipedia Ghaddar, Abbas 06 1900 (has links) Wikipédia est une ressource embarquée dans de nombreuses applications du traite- ment des langues naturelles. Pourtant, aucune étude à notre connaissance n’a tenté de mesurer la qualité de résolution de coréférence dans les textes de Wikipédia, une étape préliminaire à la compréhension de textes. La première partie de ce mémoire consiste à construire un corpus de coréférence en anglais, construit uniquement à partir des articles de Wikipédia. Les mentions sont étiquetées par des informations syntaxiques et séman- tiques, avec lorsque cela est possible un lien vers les entités FreeBase équivalentes. Le but est de créer un corpus équilibré regroupant des articles de divers sujets et tailles. Notre schéma d’annotation est similaire à celui suivi dans le projet OntoNotes. Dans la deuxième partie, nous allons mesurer la qualité des systèmes de détection de coréférence à l’état de l’art sur une tâche simple consistant à mesurer les mentions du concept décrit dans une page Wikipédia (p. ex : les mentions du président Obama dans la page Wiki- pédia dédiée à cette personne). Nous tenterons d’améliorer ces performances en faisant usage le plus possible des informations disponibles dans Wikipédia (catégories, redi- rects, infoboxes, etc.) et Freebase (information du genre, du nombre, type de relations avec autres entités, etc.). / Wikipedia is a resource of choice exploited in many NLP applications, yet we are not aware of recent attempts to adapt coreference resolution to this resource, a prelim- inary step to understand Wikipedia texts. The first part of this master thesis is to build an English coreference corpus, where all documents are from the English version of Wikipedia. We annotated each markable with coreference type, mention type and the equivalent Freebase topic. Our corpus has no restriction on the topics of the documents being annotated, and documents of various sizes have been considered for annotation. Our annotation scheme follows the one of OntoNotes with a few disparities. In part two, we propose a testbed for evaluating coreference systems in a simple task of measuring the particulars of the concept described in a Wikipedia page (eg. The statements of Pres- ident Obama the Wikipedia page dedicated to that person). We show that by exploiting the Wikipedia markup (categories, redirects, infoboxes, etc.) of a document, as well as links to external knowledge bases such as Freebase (information of the type, num- ber, type of relationship with other entities, etc.), we can acquire useful information on entities that helps to classify mentions as coreferent or not. Résolution de Coréférences Création du corpus Wikipédia Coreference Resolution Corpus Creation Wikipedia
25	Resolução de correferência em múltiplos documentos utilizando aprendizado não supervisionado / Co-reference resolution in multiples documents through unsupervised learning Silva, Jefferson Fontinele da 05 May 2011 (has links) Um dos problemas encontrados em sistemas de Processamento de Línguas Naturais (PLN) é a dificuldade de se identificar que elementos textuais referem-se à mesma entidade. Esse fenômeno, no qual o conjunto de elementos textuais remete a uma mesma entidade, é denominado de correferência. Sistemas de resolução de correferência podem melhorar o desempenho de diversas aplicações do PLN, como: sumarização, extração de informação, sistemas de perguntas e respostas. Recentemente, pesquisas em PLN têm explorado a possibilidade de identificar os elementos correferentes em múltiplos documentos. Neste contexto, este trabalho tem como foco o desenvolvimento de um método aprendizado não supervisionado para resolução de correferência em múltiplos documentos, utilizando como língua-alvo o português. Não se conhece, até o momento, nenhum sistema com essa finalidade para o português. Os resultados dos experimentos feitos com o sistema sugerem que o método desenvolvido é superior a métodos baseados em concordância de cadeias de caracteres / One of the problems found in Natural Language Processing (NLP) systems is the difficulty of identifying textual elements that refer to the same entity. This phenomenon, in which the set of textual elements refers to a single entity, is called coreference. Coreference resolution systems can improve the performance of various NLP applications, such as automatic summarization, information extraction systems, question answering systems. Recently, research in NLP has explored the possibility of identifying the coreferent elements in multiple documents. In this context, this work focuses on the development of an unsupervised method for coreference resolution in multiple documents, using Portuguese as the target language. Until now, it is not known any system for this purpose for the Portuguese. The results of the experiments with the system suggest that the developed method is superior to methods based on string matching Aprendizado não supervisionado Coreference Correferência Multiple documents Múltiplos documentos Natural language processing Processamento de línguas naturais Unsupervised learning
26	Populando ontologias através de informações em HTML - o caso do currículo lattes / Populating ontologies using HTML information - the currículo lattes case Castaño, André Casado 06 May 2008 (has links) A Plataforma Lattes é, hoje, a principal base de currículos dos pesquisadores brasileiros. Os currículos da Plataforma Lattes armazenam de forma padronizada dados profissionais, acadêmicos, de produções bibliográficas e outras informações dos pesquisadores. Através de uma base de Currículos Lattes, podem ser gerados vários tipos de relatórios consolidados. As ferramentas existentes da Plataforma Lattes não são capazes de detectar alguns problemas que aparecem na geração dos relatórios consolidados como duplicidades de citações ou produções bibliográficas classificadas de maneiras distintas por cada autor, gerando um número total de publicações errado. Esse problema faz com que os relatórios gerados necessitem ser revistos pelos pesquisadores e essas falhas deste processo são a principal inspiração deste projeto. Neste trabalho, utilizamos como fonte de informações currículos da Plataforma Lattes para popular uma ontologia e utilizá-la principalmente como uma base de dados a ser consultada para geração de relatórios. Analisamos todo o processo de extração de informações a partir de arquivos HTML e seu posterior processamento para inserí-las corretamente dentro da ontologia, de acordo com sua semântica. Com a ontologia corretamente populada, mostramos também algumas consultas que podem ser realizadas e fazemos uma análise dos métodos e abordagens utilizadas em todo processo, comentando seus pontos fracos e fortes, visando detalhar todas as dificuldades existentes no processo de população (instanciação) automática de uma ontologia. / Lattes Platform is the main database of Brazilian researchers resumés in use nowadays. It stores in a standardized form professional, academic, bibliographical productions and other data from these researchers. From these Lattes resumés database, several types of reports can be generated. The tools available for Lattes platform are unable to detect some of the problems that emerge when generating consolidated reports, such as citation duplicity or bibliographical productions misclassified by their authors, generating an incorrect number of publications. This problem demands a revision performed by the researcher on the reports generated, and the flaws of this process are the main inspiration for this project. In this work we use the Lattes platform resumés database as the source for populating an ontology that is intended to be used to generate reports. We analyze the whole process of information gathering from HTML files and its post-processing to insert them correctly in the ontology, according to its semantics. With this ontology correctly populated, we show some new reports that can be generated and we perform also an analysis of the methods and approaches used in the whole process, highlighting their strengths and weaknesses, detailing the dificulties faced in the automated populating process (instantiation) of an ontology.
27	A constraint-based hypergraph partitioning approach to coreference resolution Sapena Masip, Emili 16 May 2012 (has links) The objectives of this thesis are focused on research in machine learning for coreference resolution. Coreference resolution is a natural language processing task that consists of determining the expressions in a discourse that mention or refer to the same entity. The main contributions of this thesis are (i) a new approach to coreference resolution based on constraint satisfaction, using a hypergraph to represent the problem and solving it by relaxation labeling; and (ii) research towards improving coreference resolution performance using world knowledge extracted from Wikipedia. The developed approach is able to use entity-mention classi cation model with more expressiveness than the pair-based ones, and overcome the weaknesses of previous approaches in the state of the art such as linking contradictions, classi cations without context and lack of information evaluating pairs. Furthermore, the approach allows the incorporation of new information by adding constraints, and a research has been done in order to use world knowledge to improve performances. RelaxCor, the implementation of the approach, achieved results in the state of the art, and participated in international competitions: SemEval-2010 and CoNLL-2011. RelaxCor achieved second position in CoNLL-2011. / La resolució de correferències és una tasca de processament del llenguatge natural que consisteix en determinar les expressions d'un discurs que es refereixen a la mateixa entitat del mon real. La tasca té un efecte directe en la minería de textos així com en moltes tasques de llenguatge natural que requereixin interpretació del discurs com resumidors, responedors de preguntes o traducció automàtica. Resoldre les correferències és essencial si es vol poder “entendre” un text o un discurs. Els objectius d'aquesta tesi es centren en la recerca en resolució de correferències amb aprenentatge automàtic. Concretament, els objectius de la recerca es centren en els següents camps: + Models de classificació: Els models de classificació més comuns a l'estat de l'art estan basats en la classificació independent de parelles de mencions. Més recentment han aparegut models que classifiquen grups de mencions. Un dels objectius de la tesi és incorporar el model entity-mention a l'aproximació desenvolupada. + Representació del problema: Encara no hi ha una representació definitiva del problema. En aquesta tesi es presenta una representació en hypergraf. + Algorismes de resolució. Depenent de la representació del problema i del model de classificació, els algorismes de ressolució poden ser molt diversos. Un dels objectius d'aquesta tesi és trobar un algorisme de resolució capaç d'utilitzar els models de classificació en la representació d'hypergraf. + Representació del coneixement: Per poder administrar coneixement de diverses fonts, cal una representació simbòlica i expressiva d'aquest coneixement. En aquesta tesi es proposa l'ús de restriccions. + Incorporació de coneixement del mon: Algunes correferències no es poden resoldre només amb informació lingüística. Sovint cal sentit comú i coneixement del mon per poder resoldre coreferències. En aquesta tesi es proposa un mètode per extreure coneixement del mon de Wikipedia i incorporar-lo al sistem de resolució. Les contribucions principals d'aquesta tesi son (i) una nova aproximació al problema de resolució de correferències basada en satisfacció de restriccions, fent servir un hypergraf per representar el problema, i resolent-ho amb l'algorisme relaxation labeling; i (ii) una recerca per millorar els resultats afegint informació del mon extreta de la Wikipedia. L'aproximació presentada pot fer servir els models mention-pair i entity-mention de forma combinada evitant així els problemes que es troben moltes altres aproximacions de l'estat de l'art com per exemple: contradiccions de classificacions independents, falta de context i falta d'informació. A més a més, l'aproximació presentada permet incorporar informació afegint restriccions i s'ha fet recerca per aconseguir afegir informació del mon que millori els resultats. RelaxCor, el sistema que ha estat implementat durant la tesi per experimentar amb l'aproximació proposada, ha aconseguit uns resultats comparables als millors que hi ha a l'estat de l'art. S'ha participat a les competicions internacionals SemEval-2010 i CoNLL-2011. RelaxCor va obtenir la segona posició al CoNLL-2010. coreference resolution hypergraph partitioning constraint satisfaction relaxation labeling natural language processing artificial intelligence machine learning information extraction 004
28	Applying Coreference Resolution for Usage in Dialog Systems Rolih, Gabi January 2018 (has links) Using references in language is a major part of communication, and understanding them is not a challenge for humans. Recent years have seen increased usage of dialog systems that interact with humans in natural language to assist them in various tasks, but even the most sophisticated systems still struggle with understanding references. In this thesis, we adapt a coreference resolution system for usage in dialog systems and try to understand what is needed for an efficient understanding of references in dialog systems. We annotate a portion of logs from a customer service system and perform an analysis of the most common coreferring expressions appearing in this type of data. This analysis shows that most coreferring expressions are nominal and pronominal, and they usually appear within two sentences of each other. We implement Stanford's Multi-Pass Sieve with some adaptations and dialog-specific changes and integrate it into a dialog system framework. The preprocessing pipeline makes use of already existing NLP-tools, while some new ones are added, such as a chunker, a head-finding algorithm and a NER-like system. To analyze both user input and output of the system, we deploy two separate coreference resolution systems that interact with each other. An evaluation is performed on the system and its separate parts in five most common evaluation metrics. The system does not achieve state-of-the art numbers, but because of its domain-specific nature that is expected. Some parts of the system do not have any effect on the performance, while the dialog-specific changes contribute to it greatly. An error analysis is concluded and reveals some problems with the implementation, but more importantly, it shows how the system could be further improved by using other types of knowledge and dialog-specific features. natural language processing coreference resolution dialog systems human-computer interaction
29	Neural Language Models with Explicit Coreference Decision Kunz, Jenny January 2019 (has links) Coreference is an important and frequent concept in any form of discourse, and Coreference Resolution (CR) a widely used task in Natural Language Understanding (NLU). In this thesis, we implement and explore two recent models that include the concept of coreference in Recurrent Neural Network (RNN)-based Language Models (LM). Entity and reference decisions are modeled explicitly in these models using attention mechanisms. Both models learn to save the previously observed entities in a set and to decide if the next token created by the LM is a mention of one of the entities in the set, an entity that has not been observed yet, or not an entity. After a theoretical analysis where we compare the two LMs to each other and to a state of the art Coreference Resolution system, we perform an extensive quantitative and qualitative analysis. For this purpose, we train the two models and a classical RNN-LM as the baseline model on the OntoNotes 5.0 corpus with coreference annotation. While we do not reach the baseline in the perplexity metric, we show that the models’ relative performance on entity tokens has the potential to improve when including the explicit entity modeling. We show that the most challenging point in the systems is the decision if the next token is an entity token, while the decision which entity the next token refers to performs comparatively well. Our analysis in the context of a text generation task shows that a wide-spread error source for the mention creation process is the confusion of tokens that refer to related but different entities in the real world, presumably a result of the context-based word representations in the models. Our re-implementation of the DeepMind model by Yang et al. 2016 performs notably better than the re-implementation of the EntityNLM model by Ji et al. 2017 with a perplexity of 107 compared to a perplexity of 131. Coreference Reference Entity Language Models LM Neural Networks RNN Attention Deep Learning
30	[en] REVISION IN WRITING AND COREFERENCE ISSUES / [pt] O PROCESSO DE REVISÃO NA PRODUÇÃO ESCRITA E QUESTÕES DE CORREFERÊNCIA ENEIDA FIGUEIRA DE ALMEIDA WERNER 17 January 2019 (has links) [pt] O objetivo desta tese é investigar o processo de revisão da escrita e o processo de estabelecimento da correferência quanto à forma como são monitorados por grupos com diferentes graus de experiência em escrita. A pesquisa insere-se no quadro dos estudos sobre processamento da escrita, focalizando o processo da produção, e ancora-se, teoricamente, no tocante à pesquisa em escrita, no modelo de processamento cognitivo da escrita de Flower e Hayes (1980) e no modelo de revisão de Hayes (1987). Nos estudos da correferência, consideram-se as principais teorias voltadas para a investigação da influência de fatores que favorecem a acessibilidade à memória para seu estabelecimento, a Teoria da Acessibilidade (Ariel, 1990), a Teoria da Centralização (Grosz, Joshi e Weinstein, 1995) e a Hipótese da Carga Informacional (Almor, 1999). Relacionamos as questões teóricas aos dados de natureza cognitiva obtidos por meio de metodologia experimental. O laboratório utilizado foi o LAPAL, na PUC-Rio. Os experimentos conduzidos basearam-se em tarefas de produção e revisão de textos. Foi utilizada a ferramenta de keystroke logging Inputlog (http://www.inputlog.net/) para gravação e análise dos dados. Os participantes eram alunos de graduação e de pós-graduação de uma instituição pública e de uma instituição privada no Rio de Janeiro. No primeiro experimento foram analisados dados de natureza global do processamento da escrita e do processamento da correferência a partir de imagens-estímulos de duas histórias em quadrinhos, sem material verbal. No que tange ao comportamento global do processamento de escrita, foram verificadas medidas relativas ao processo e ao produto do texto produzido (em termos de número de caracteres e de palavras) e também relativas a pausas e tipos de revisões realizadas. No âmbito das medidas voltadas especificamente para o processamento da correferência, foramanalisados dados relacionados aos tipos de expressões referenciais selecionadas para introduzir e retomar entidades discursivas, bem como quanto ao momento em que elementos de retomada foram revistos (revisão do tipo imediata ou posterior) e à natureza do tipo de alteração implementada no que tange ao grau de especificidade do termo usado na substituição (mais/menos específico). O segundo experimento objetivou investigar os fatores que influenciam a escolha de uma expressão referencial anafórica a partir da informação contida no antecedente. Foi conduzida tarefa de revisão com quatro textos de mesmo tipo narrativo. Em cada tipo de texto avaliou-se os tipos de retomadas anafóricas das expressões referenciais em função do grau de ativação de informação na memória favorecido pela acessibilidade ao antecedente. Foram tomadas como variáveis independentes a função sintática do antecedente (mais suj.; menos suj.), o papel temático (mais agente; menos agente), e a distância entre o antecedente e o elemento de retomada (igual período; diferente período). No primeiro experimento os resultados apontaram divergências entre os tipos de revisões efetuadas (imediatas/posteriores) e quanto à proporção de revisões efetuadas (apagamentos/inserções) indicando que o grupo de alunos de pós-graduação empregou mais qualitativamente estratégias e recursos de revisão no monitoramento de seus textos do que os alunos de graduação. No segundo experimento, na análise estatística conduzida para cada grupo separadamente, foi verificado efeito principal de posição sintática (nos 2 grupos), distância (nos 2 grupos), e papel temático (no grupo de pós-graduação). Além disso, verificaram-se efeitos de interação entre posição e distância, e entre posição, papel temático e distância (grupo de graduação) e de posição e distância (grupo de pós-graduação). A qualidade das revisões efetuadas foi diferente, tendo o grupo de alunos de pós-graduação efetuado mais revisões do tipo posterior e percentualmente mais revisões que implicaram modificações na qualidade textual. Em conjunt / [en] The purpose of this doctoral thesis is to investigate the writing process and the process of establishing coreference as to how they are monitored by groups of different degrees of writing experience. The research is part of the study of writing processing, focusing on the production process, and is theoretically anchored in writing research related to the Cognitive Writing Model of Hayes and Flower (1980) and in Hayes s Writing Revision Model (1987). In the studies of coreference, we consider the main theories that investigate the influence of factors that favour accessibility to memory, Accessibility Theory (Ariel, 1990), the Centering Theory (Grosz, Joshi and Weinstein, 1995) and the Information Load Hypothesis (Almor, 1999). We related the theoretical questions to the data captured by means of experimental methodology. The laboratory used was LAPAL, at PUC-Rio. The experiments conducted were based on writing production and revision tasks and we used the technological tool of keystroke logging Inputlog (http://www.inputlog.net/) to record and analyse data. Participants were graduate and post graduate students of public and private institutions in Rio de Janeiro.In the first experiment the data analysed related to production of writing and coreference processing from image-stimuli of two comic strips without verbal material. Concerning the measures related to writing production, we analysed the relation between the process and product in terms of the number of characters and words as well as pauses and the types of revisions made. Regarding the measures of coreference processing, we examined the types of of referential expressions selected to introduce and to establish coreference within discourse entities, as well as data related to the moment when correferential elements were revised (immediate or delayed revisions) and the degree of specificity implied in the alterations worked out. The second experiment aimed to investigate the factors that influence the choice of anaphoric referential expressions from the type of information contained in the antecedent. We conducted an experiment of writing revision consisting of four different texts of the same discursive genre. In each of them we took into account the degree of activation in memory provided by information that favours accessibility to memory stored items. The independent variables were the syntactic function of the antecedent(more subject/less subject), the thematic role of the antecedent (more agent/less agent) and the the distance between the antecedent and the anaphoric referential expression (equal period/different period). Results from the first experiment pointed out differences between the types of revisions (immediate/delayed) and the proportion of revisions made (deletions/insertions) indicating that post-graduate group used more revision strategy resources while monitoring their production as compared to the group of graduates. In the second experiment, statistical analysis conducted for each group separately revealed effects of the factors considered as for syntactic position (in the 2 groups), thematic role (in the post-graduates group) and distance (in both groups). In addition, interaction effects between distance and syntactic position and between position, thematic role and distance (graduates group) and position and distance (post-graduates group) were significant. The quality of the revisions was proven diverse, having post-graduates proceeded to more delayed revisions that imply alteration in overall text quality than the group of graduates. As a whole, the experiments conducted allowed us to identify differences between the experimental groups and suggest evidence that schooling level plays an important role in writing and in the choices made in for coreference processing. [pt] ESCRITA [en] WRITING [pt] REVISAO [en] REVISION [pt] CORREFERENCIA [en] COREFERENCE [pt] PRODUCAO DA LINGUAGEM [en] LANGUAGE PRODUCTION [pt] EXPERTISE [en] EXPERTISE

Search results