Global ETD Search

181	Vocabulario e leitura : a elaboração de uma lista de palavras de uso academico em portugues do Brasil Santos, Vanderlei dos 29 September 2006 (has links) Orientador: Matilde Virginia Ricardi Scaramucci / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Estudos da Linguagem / Made available in DSpace on 2018-08-07T17:45:22Z (GMT). No. of bitstreams: 1 Santos_Vanderleidos_M.pdf: 343677 bytes, checksum: d406b17a6352a2350cf5929ba2ee4280 (MD5) Previous issue date: 2006 / Resumo: Listas de palavras baseadas em corpora têm sido importantes no ensino e aprendizagem de vocabulário em língua estrangeira. Este trabalho visa compilar uma lista de palavras do português do Brasil, freqüentes e de grande alcance no contexto acadêmico brasileiro, e discutir sua utilidade no ensino de vocabulário, na preparação de textos modificados e de testes de vocabulário para aprendizes de Português como língua estrangeira. Discutimos também a questão do ensino direto e indireto de vocabulário. Para a compilação da lista, foi criado um corpus composto de teses e dissertações de 3 universidades públicas brasileiras, perfazendo um total de pouco mais de 7 milhões de palavras (aproximadamente 15 mil páginas de texto). Foram usados os critérios de freqüência e alcance para a seleção das palavras a serem incluídas na lista. As palavras são apresentadas com a freqüência e, nos casos de polissemia, com os diferentes significados com que ocorrem no corpus / Abstract: Corpora based word lists have been important for the teaching of vocabulary in foreign language. This paper aims to put together a list of high frequency and range Brazilian Portuguese academic words, and discuss direct and indirect vocabulary acquisition, the usefulness of modified texts for language learners and the use of word lists for the elaboration of vocabulary tests. A corpus was assembled from thesis and dissertations from 3 Brazilian public universities, totaling a little over 7 million words (approximately 15.000 pages of text). Frequency and range were the criteria used to select the words to be included in the list. Information about the number of times the words occur in the corpus and the different meanings they have, in the case of polysemy, is provided. / Mestrado / Lingua Estrangeira / Mestre em Linguística Aplicada Lingua estrangeira - Leitura Vocabulário - Estudo e ensino Portuguese as foreign language Vocabulary, teaching and learning Reading, foreign language Corpora and word lists
182	Modèle de comportement communicatif conventionnel pour un agent en interaction avec des humains : Approche par jeux de dialogue / A conventional communicative behaviour model for an agent interacting with humans Dubuisson Duplessis, Guillaume 23 May 2014 (has links) Cette thèse a pour objectif l’amélioration des capacités communicatives des agents logiciels en interaction avec des humains. Dans ce but, nous proposons une méthodologie basée sur l’étude d’un corpus d’interactions Homme-Homme orientées vers la réalisation d’une tâche. Nous proposons un cadre qui s’appuie sur les jeux de dialogue afin de modéliser des motifs dialogiques observés. Nous illustrons la spécification de tels jeux depuis des motifs extraits en appliquant l'ensemble des étapes de noter méthodologie à un corpus. Les jeux spécifiés sont validés en montrant qu’ils décrivent de façon appropriée les motifs apparaissant dans le corpus de référence. Enfin, nous montrons l’intérêt interprétatif et génératif de notre modèle pour le fondement du comportement communicatif conventionnel d’un agent interagissant avec un humain. Nous implémentons ce modèle dans le module Dogma, exploitable par un agent dans un dialogue impliquant deux interlocuteurs. / This research work aims at improving the communicative behaviour of software agents interacting with humans. To this purpose, we present a data-driven methodology based on the study of a task oriented corpus consisting of Human-Human interactions. We present a framework to specify dialogue games from observed interaction patterns based on the notion of social commitments and conversational gameboard. We exemplify the specification of dialogue games by implementing all the steps of our methodology ona task-oriented corpus. The produced games are validated by showing that they appropriately describe the patterns appearing in a reference corpus. Eventually, we show that an agent can take advantage of our model to regulate its conventional communicative behaviour on both interpretative and generative levels. We implement this model into Dogma, a module that can be used by an agent to manage its communicative behaviour in a two-interlocutor dialogue. Interaction homme-machine Système de dialogue Corpus Molécule du dialogue Jeu de dialogue Processus d'annotation Human machine-interaction Dialogue system Corpora Dialogue modeling Dialogue game Annotation process
183	A Probabilistic Tagging Module Based on Surface Pattern Matching Eklund, Robert January 1993 (has links) A problem with automatic tagging and lexical analysis is that it is never 100 % accurate. In order to arrive at better figures, one needs to study the character of what is left untagged by automatic taggers. In this paper untagged residue outputted by the automatic analyser SWETWOL (Karlsson 1992) at Helsinki is studied. SWETWOL assigns tags to words in Swedish texts mainly through dictionary lookup. The contents of the untagged residue files are described and discussed, and possible ways of solving different problems are proposed. One method of tagging residual output is proposed and implemented: the left-stripping method, through which untagged words are bereaved their left-most letters, searched in a dictionary, and if found, tagged according to the information found in the said dictionary. If the stripped word is not found in the dictionary, a match is searched in ending lexica containing statistical information about word classes associated with that particular word form (i.e., final letter cluster, be this a grammatical suffix or not), and the relative frequency of each word class. If a match is found, the word is given graduated tagging according to the statistical information in the ending lexicon. If a match is not found, the word is stripped of what is now its left-most letter and is recursively searched in a dictionary and ending lexica (in that order). The ending lexica employed in this paper are retrieved from a reversed version of Nusvensk Frekvensordbok (Allén 1970), and contain endings of between one and seven letters. The contents of the ending lexica are to a certain degree described and discussed. The programs working according to the principles described are run on files of untagged residual output. Appendices include, among other things, LISP source code, untagged and tagged files, the ending lexica containing one and two letter endings and excerpts from ending lexica containing three to seven letters. Tagging computational linguistics word-class probabilistic morphology swetwol statistical corpus linguistics corpora endings suffixes word class frequency lexical analysis
184	Englishes Online: : A comparison of the varieties of English used in blogs Ruuska, Sofia January 2013 (has links) This study is based on data gathered from two corpora. It investigates and analyses the written English of second language users, in this case English used by Swedes, with the English used online in blogs found in the Birmingham Blog Corpus, which includes blogs written in English by authors of various nationalities. The aim is to compare Swedes’ use of English in blogs and the English used in general in blogs. The study focuses on typical features associated with either American English (AmE) or British English (BrE) and investigates which variety is the most prominent online. The results indicate that features that are generally associated with AmE have a higher frequency in both analysed corpora in this thesis. The conclusion is therefore that AmE tends to dominate both Swedish and international authors’ use of English in blogs. American English Blog language British English Computer-mediated communication Corpora research English as a lingua franca English as a native language English as a second language Grammar Semantics. Humanities Humaniora
185	Fanfictions, linguística de corpus e aprendizagem direcionada por dados : tarefas de produção escrita com foco no uso autêntico de língua e atividades que visam à autonomia dos alunos de letras em analisar preposições / Garcia, William Danilo January 2020 (has links) Orientador: Paula Tavares Pinto / Resumo: A relação da Linguística de Corpus com o Ensino de Línguas, apesar de receber foco mesmo antes do advento dos computadores, se intensificou por volta da década de 90, momento em que pesquisas em corpora de aprendizes e em Aprendizagem Direcionada por Dados foram enfatizadas. Considerado esse estreitamento, esta pesquisa objetiva compilar quatro corpora de aprendizes a partir do uso autêntico da língua com o intuito de desenvolver atividades didáticas direcionadas por dados dos próprios alunos que promovam nos discentes um perfil autônomo de investigação linguística (mais precisamente das preposições with, in, on, at, for e to). No tocante à fundamentação teórica, destacam-se Prabhu (1987), Skehan (1996), Willis (1996), Nunan (2004) e Ellis (2006) a respeito do Ensino de Línguas por Tarefas, Jenkins (2012) e Neves (2014) que discorrem sobre as ficções de fã. Já sobre a Linguística de Corpus, tem-se Sinclair (1991), Berber Sardinha (2000) e Viana (2011). Granger (1998, 2002, 2013) mais relacionado a Corpus de Aprendizes, e Johns (1991, 1994), Berber Sardinha (2011) e Boulton (2010) no que diz respeito à Aprendizagem Direcionada por Dados. Como metodologia, levantaram-se textos escritos pelos alunos a partir de uma tarefa de produção escrita em que eles redigiram uma ficção de fã. Em seguida, esses textos formaram dois corpora de aprendizes iniciais, que foram analisados com o auxílio da ferramenta AntConc (ANTHONY, 2018) no intuito de observar a presença ou não de inadequações ... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: Although the relation between Corpus Linguistics and Language Teaching has been emphasized even before the advent of computers, it has been highlighted around the 90s. This was the moment when research on learner corpora and Data-Driven Learning was focused. Having said that, this study aimed to compile four learner corpora based on the authentic use of the language. This was done in order to develop data-driven teaching activities that could promote, among the students, an autonomous profile of linguistic investigation (more precisely about the prepositions with, in, on, at, for and to). Concerning the existing literature, we highlight the works of Prabhu (1987), Skehan (1996), Willis (1996), Nunan (2004) and Ellis (2006) about Task-Based Language Teaching, and Jenkins (2012) and Neves (2014) about fanfictions. In relation to Corpus Linguistics, this study is based on Sinclair (1991), Berber Sardinha (2000) and Viana (2011). Granger (1998, 2012, 2013) is referenced to define learner corpora, and Johns (1991, 1994), Berber Sardinha (2011) and Boulton (2010) to discuss Data-Driven Learning. The methodological approach involved the collection of the compositions from Language Teaching undergraduate students who developed a writing task in which they had to write a fanfiction. These texts composed two learner corpora, which were analyzed with the AntConc tool (ANTHONY, 2018) with the purpose of observing the occurrence of prepositions in English and whether they were accurately ... (Complete abstract click electronic access below) / Mestre Ficções de fã Ensino de línguas por tarefas Corpus de aprendizes Aprendizagem direcionada por dados Fanfictions Task-based language teaching Learner corpora Data-driven learning
186	Auf dem Weg zu einem TEI-Austauschformat für ägyptisch-koptische Texte Gerhards, Simone, Schweitzer, Simon January 2016 (has links) Diverse ägyptologische Großprojekte (TLA: http://aaew.bbaw.de/tla; Ramses: http://ramses.ulg.ac.be/; Rubensohn: http://elephantine.smb.museum/; Karnak: http://www.cfeetk.cnrs.fr/karnak/) erstellen annotierte Korpora. Für einen Datenaustausch ist ein standardisiertes Austauschformat, das auf TEI beruht, dringend erforderlich. Dazu haben sich diese Großprojekte zusammengeschlossen, um einen gemeinsamen Vorschlag zu erarbeiten. In unserem Vortrag möchten wir den aktuellen Stand der Diskussion präsentieren: Was ist der Basistext in der Auszeichnung: hieroglyphische Annotation oder die Umschrift des Textes? Wie geht man mit den verschiedenen Schriftformaten um? Können die Metadatenangaben im Header mithilfe gemeinsamer Thesauri standardisiert werden? Was wird inline, was wird stand-off annotiert? info:eu-repo/classification/ddc/930 ddc:930
187	OCR of hand-written transcriptions of hieroglyphic text Nederhof, Mark-Jan January 2016 (has links) Encoding hieroglyphic texts is time-consuming. If a text already exists as hand-written transcription, there is an alternative, namely OCR. Off-the-shelf OCR systems seem difficult to adapt to the peculiarities of Ancient Egyptian. Presented is a proof-of-concept tool that was designed to digitize texts of Urkunden IV in the hand-writing of Kurt Sethe. It automatically recognizes signs and produces a normalized encoding, suitable for storage in a database, or for printing on a screen or on paper, requiring little manual correction. The encoding of hieroglyphic text is RES (Revised Encoding Scheme) rather than (common dialects of) MdC (Manuel de Codage). Earlier papers argued against MdC and in favour of RES for corpus development. Arguments in favour of RES include longevity of the encoding, as its semantics are font-independent. The present study provides evidence that RES is also much preferable to MdC in the context of OCR. With a well-understood parsing technique, relative positioning of scanned signs can be straightforwardly mapped to suitable primitives of the encoding. info:eu-repo/classification/ddc/930 ddc:930
188	Conception et développement d'un outil d'aide à la traduction anglais/arabe basé sur des corpus parallèles / Conception and development of an English/Arabic translation aid tool based on parallel corpora Yahiaoui, Abdelghani 29 May 2017 (has links) Dans cette thèse, nous abordons la réalisation d’un outil innovant d’aide à la traduction anglais/arabe pour répondre au besoin croissant en termes d’outils en ligne d’aide à la traduction centrés sur la langue arabe. Cet outil combine des dictionnaires adaptés aux spécificités de la langue arabe et un concordancier bilingue issu des corpus parallèles. Compte tenu de sa nature agglutinante et non voyellée, le mot arabe nécessite un traitement spécifique. C’est pourquoi, et pour construire nos ressources lexicales, nous nous sommes basés sur l’analyseur morphologique de Buckwalter qui, d’une part, permet une analyse morphologique en tenant compte de la composition complexe du mot arabe (proclitique, préfixe, radical, suffixe, enclitique), et qui, d’autre part, fournit des ressources traductionnelles permettant une réadaptation au sein d’un système de traduction. Par ailleurs, cet analyseur morphologique est compatible avec l’approche définie autour de la base de données DIINAR (DIctionnaire Informatisé de l’Arabe), qui a été construite, entre autres, par des membres de notre équipe de recherche. Pour répondre à la problématique du contexte dans la traduction, un concordancier bilingue a été développé à partir des corpus parallèles Ces derniers représentent une ressource linguistique très intéressante et ayant des usages multiples, en l’occurrence l’aide à la traduction. Nous avons donc étudié de près ces corpus, leurs méthodes d’alignement, et nous avons proposé une approche mixte qui améliore significativement la qualité d’alignement sous-phrastique des corpus parallèles anglais-arabes. Plusieurs technologies informatiques ont été utilisées pour la mise en œuvre de cet outil d’aide à la traduction qui est disponible en ligne (tarjamaan.com), et qui permet à l’utilisateur de chercher la traduction de millions de mots et d’expressions tout en visualisant leurs contextes originaux. Une évaluation de cet outil a été faite en vue de son optimisation et de son élargissement pour prendre en charge d’autres paires de langues. / We create an innovative English/Arabic translation aid tool to meet the growing need for online translation tools centered on the Arabic language. This tool combines dictionaries appropriate to the specificities of the Arabic language and a bilingual concordancer derived from parallel corpora. Given its agglutinative and unvoweled nature, Arabic words require specific treatment. For this reason, and to construct our dictionary resources, we base on Buckwalter's morphological analyzer which, on the one hand, allows a morphological analysis taking into account the complex composition of the Arabic word (proclitic, prefix, stem, suffix, enclitic), and on the other hand, provides translational resources enabling rehabilitation in a translation system. Furthermore, this morphological analyzer is compatible with the approach defined around the DIINAR database (DIctionnaire Informatisé de l’Arabe - Computerized Dictionary for Arabic), which was constructed, among others, by members of our research team. In response to the contextual issue in translation, a bilingual concordancer was developed from parallel corpora. The latter represent a novel linguistic resource with multiple uses, in this case aid for translation. We therefore closely analyse these corpora, their alignment methods, and we proposed a mixed approach that significantly improves the quality of sub-sentential alignment of English-Arabic corpora. Several technologies have been used for the implementation of this translation aid tool which have been made available online (tarjamaan.com) and which allow the user to search the translation of millions of words and expressions while visualizing their original contexts. An evaluation of this tool has been made with a view to its optimization and its enlargement to support other language pairs. Linguistique informatique Traduction assistée par ordinateur Corpus parallèles Concordancier bilingue Dictionnaires Arabe Anglais Computational linguistics Computer-assisted translation Parallel corpora Bilingual concordancer Dictionaries Arabic English 418
189	The Syntax of Similes: A Treebank-Based Exploration of Simile in Greek Poetry Mambrini, Francesco 19 March 2018 (has links) No description available. info:eu-repo/classification/ddc/930 ddc:930 info:eu-repo/classification/ddc/000 ddc:000
190	Diachronní srovnání synchronních korpusů / Diachronic comparison of synchronic corpora Křen, Michal January 2012 (has links) The thesis presents a method for diachronic comparison of synchronic corpora that reflect language of very close time periods. Its primary aim is the assessment of possi- bilities and limitations of language change detection based on the synchronic written SYN-series corpora. The approach is corpus-driven, based on a statistical evaluation of differences among normalized average reduced frequencies of lemmata and lexical combinations. There are several variants of the method applied on various subcorpora of corpus SYN and their results examined in detail. Difficulty of the comparison lies in the influence of corpus composition and the interconnection of changes in language with changes in society. As it is not easy to distinguish the signs of diachronic shift from naturally existing synchronic variability, the statistically discovered significance of frequency differences is additionally verified by querying the base corpora. The interpretation of the results is also adjusted by the knowledge of their exact composition. The conclusions are based mainly on the newspapers as a written text type that is most receptive to the changes. The changes can be characterized as a thematic diversion from the original political and economical orientation of the newspapers towards real- life and free-time topics...

Search results