Global ETD Search

101	Verblexpor : um recurso léxico com anotação de papéis semânticos para o português Zilio, Leonardo January 2015 (has links) Esta tese propõe um recurso léxico de verbos com anotação de papéis semânticos, denominado VerbLexPor, baseado em recursos como VerbNet, PropBank e FrameNet. As bases teóricas da proposta são interdisciplinares e retiradas da Linguística de Corpus e do Processamento de Linguagem Natural (PLN), visando-se a contribuir para a Linguística e para a Computação. As hipóteses de pesquisa são: a) um mesmo conjunto de papéis semânticos pode ser aplicado a diferentes gêneros textuais; e b) as diferenças entre esses gêneros se destacam no ranqueamento dos papéis semânticos. O desenvolvimento do VerbLexPor se apoia em dois corpora: um especializado, com mais de 1,6 milhão de palavras, composto por artigos científicos de Cardiologia de três periódicos brasileiros; e um não especializado, com mais de 1 milhão de palavras composto por artigos do jornal popular Diário Gaúcho. Os corpora foram anotados com o parser PALAVRAS, e as informações de sentenças, verbos e argumentos foram extraídas e armazenadas em um banco de dados. O VerbLexPor tem 192 verbos e mais de 15 mil argumentos anotados distribuídos em mais de 6 mil sentenças. Observou-se que o corpus do Diário Gaúcho privilegia uma sintaxe direta e pouco uso de voz passiva e adjuntos, enquanto o corpus de Cardiologia apresenta mais voz passiva e um maior uso de INSTRUMENTOS na posição de sujeito, além de uma menor incidência de AGENTES. Foram realizados também alguns experimentos paralelos, como a anotação de papéis semânticos por vários anotadores e o agrupamento automático de verbos. Na tarefa de múltiplos anotadores, cada um anotou exatamente as mesmas 25 orações. Os anotadores receberam um manual de anotação e um treinamento básico (explicação sobre a tarefa e dois exemplos de anotação). Usou-se o cálculo de multi-π para avaliar a concordância entre os anotadores, e o resultado foi de π = 0,25. Os motivos para essa concordância baixa podem estar na falta de um treinamento mais completo. A tarefa de agrupamento de verbos mostrou que a sintaxe e a semântica são igualmente importantes para o agrupamento. Este estudo contribui para a área de Linguística, com um léxico de verbos anotados semanticamente, e também para a Computação, com dados que podem ser consultados e processados para diversas aplicações do PLN, principalmente por estarem disponíveis nos formatos XML e SQL. / This dissertation aims at developing a lexical resource of verbs annotated with semantic roles, called VerbLexPor, and based on other resources, such as VerbNet, PropBank, and FrameNet. The theoretical bases of this study lies in Corpus Linguistics and Natural Language Processing (NLP), so that it aims at contributing to both Linguistics and Computer Science. The hypotheses are: a) one set of semantic roles can be applied to different genres; and b) the differences among genres are shown by the ranking of semantic roles. The development of VerbLexPor has two corpora at the basis: a specialized one, with more than 1.6 million words, composed by scientific papers in the field of Cardiology from three Brazilian journals; and a non-specialized one, with more than 1 million words, composed by newspaper articles from Diário Gaúcho. The corpora were analyzed with the parser PALAVRAS, and sentence, verb and argument information was extracted and stored in a database. VerbLexPor has 192 verbs and more than 15 thousand arguments annotated with semantic roles, distributed among more than 6 thousand sentences. We observed that Diário Gaúcho has a more direct syntax, with less passive voice and adjuncts, while Cardiology has more passive voice and more INSTRUMENTS for subjects, and fewer AGENTS. We also conducted some parallel experiments, such as semantic role labeling with multiple annotators and automatic verbal clustering. In the multiple annotators task, each of them annotated exactly the same 25 sentences. They received an annotation manual and basic training (explanation on the task and two annotation examples). We used multi-π to evaluate agreement among annotators, and results were π = 0,25. Reasons for this low agreement may be a lack of a thoroughly developed training. The verbal clustering task showed that syntax and semantics are equally important for verbal clustering. This study contributes to Linguistics, with a verbal lexicon annotated with semantic roles, and also to Computer Science, with data that can be assessed and processed for various NLP applications, especially because the data are available in both XML and SQL formats. Língua portuguesa Linguística computacional Corpus Linguagem especializada Semantic role labeling Lexical resource NLP Corpus linguistics
102	Irish English modal verbs from the fourteenth to the twentieth centuries Van Hattum, Marije January 2012 (has links) The thesis provides a corpus-based study of the development of Irish English modal verbs from the fourteenth to the twentieth centuries in comparison to mainland English. More precisely, it explores the morpho-syntax of CAN, MAY, MUST, SHALL and WILL and the semantics of BE ABLE TO, CAN, MAY and MUST in the two varieties. The data of my study focuses on the Kildare poems, i.e. fourteenth-century Irish English religious poetry, and a self-compiled corpus consisting of personal letters, largely emigrant letters, and trial proceedings from the late seventeenth to the twentieth centuries. The analysis of the fourteenth and nineteenth centuries is further compared to a similar corpus of English English. The findings are discussed in the light of processes associated with contact-induced language change, new-dialect formation and supraregionalization. Contact-induced language change in general, and new-dialect formation in particular, can account for the findings of the fourteenth century. The semantics of the Irish English modal verbs in this century were mainly conservative in comparison to English English. The Irish English morpho-syntax showed an amalgam of features from different dialects of Middle English in addition to some forms which seem to be unique to Irish English. The Irish English poems recorded a high number of variants per function in comparison to a selection of English English religious poems, which does not conform to predictions based on the model of new-dialect formation. I suggest that this might be due to the fact that the English language had not been standardized by the time it was introduced to Ireland, and thus the need to reduce the number of variants was not as great as it is suggested to be in the post-standardization scenarios on which the model is based. In seventeenth- and eighteenth-century Ireland, increased Irish/English bilingualism caused the formation of a second-language (L2) variety of English. In the nineteenth century the bilingual speakers massively abandoned the Irish language and integrated into the English-speaking community. As a result, the varieties of English as spoken by the bilingual speakers and as spoken by the monolingual English speakers blended and formed a new variety altogether. The use of modal verbs in this new variety of Irish English shows signs of colonial lag (e.g. in the development of a deontic possibility meaning for CAN). Additionally, the subtle differences between BE ABLE TO and CAN in participant-internal possibility contexts and between epistemic MAY and MIGHT in present time contexts were not fully acquired by the L2 speakers, which resulted in a higher variability between the variants in the new variety of Irish English. In the late nineteenth and early twentieth centuries the use of modal verbs converged on the patterns found in English English, either as a result of linguistic accommodation in the case of informants who had migrated to countries such as Australia and the United States, or as a result of supraregionalization in the case of those who remained in Ireland. 425
103	Corpop : um corpus de referência do português popular escrito do Brasil Pasqualini, Bianca Franco January 2018 (has links) Esta tese propõe um corpus do Português popular brasileiro escrito, denominado CorPop, com textos selecionados com base no nível de letramento médio dos leitores do país. As bases teórico-metodológicas do CorPop são interdisciplinares e inserem-se no âmbito dos Estudos da Linguagem e disciplinas afins, como Estudos do Léxico e Linguística de Corpus, Linguística Textual e Psicolinguística, dialogando também com estudos de Processamento de Língua Natural. Desse modo, esta investigação abriga-se na Linha de Pesquisa Lexicografia, Terminologia e Tradução: Relações Textuais do PPG-Letras-UFRGS, e nosso recorte, por isso, tende ao destaque para o Léxico. O desenvolvimento do CorPop deu-se através da compilação de dados sobre o nível de letramento dos leitores brasileiros e das características que poderiam compor um padrão de simplicidade textual em um corpus de textos adequados a esses leitores. Tais dados foram coletados das pesquisas do Indicador de Alfabetismo Funcional (INAF) e Retratos da Leitura no Brasil, além de um questionário com leitores. Os textos selecionados para o CorPop são (1) textos do jornalismo popular do Projeto PorPopular (jornal Diário Gaúcho), consumido maciçamente pelas classes C e D, que é o leitor médio brasileiro; (2) textos e autores mais lidos pelos respondentes das últimas edições da pesquisa Retratos da Leitura no Brasil; (3) coleção “É Só o Começo” (adaptação de clássicos da literatura brasileira para leitores com baixo letramento, adaptação esta realizada por linguistas); (4) textos do jornal Boca de Rua, produzido por pessoas em situação de rua, com baixa escolaridade e baixo letramento; e (5) textos do Diário da Causa Operária, imprensa operária brasileira produzida também por pessoas dentro da faixa média de letramento do país. Realizamos, após a coleta, preparação e processamento dos textos do corpus, uma série de experimentos com a lista bruta de frequências e com a lista de frequências lematizada do CorPop. Os resultados obtidos mostram aplicações promissoras do CorPop em diversas tarefas linguísticas, desde simplificação de textos até uso como vocabulário controlado para redação de paráfrases definitórias em dicionários e comprovam que um corpus pequeno pode ter a mesma validade que um corpus de grandes proporções. / This thesis proposes a corpus of Brazilian popular Portuguese written, called CorPop, with texts selected based on the average level of literacy of the country 's readers. CorPop's theoretical and methodological bases are interdisciplinary and fall within the scope of Language Studies and related disciplines, such as Corpus Lexicon and Linguistics Studies, Textual Linguistics and Psycholinguistics, and also dialogues with Natural Language Processing studies. Thus, this research is housed in the Lexicography, Terminology and Translation Research Line: Textual Relations of PPG-Letras-UFRGS, and our cut, therefore, tends to highlight the Lexicon. The development of CorPop took place through the compilation of data about the level of literacy of Brazilian readers and the characteristics that could compose a standard of textual simplicity in a corpus of texts suitable for these readers. These data were collected from the surveys of the Indicator of Functional Literacy (INAF) and Reading Portraits in Brazil, as well as a questionnaire with readers. The texts selected for CorPop are (1) texts of the popular journalism of the PorPopular Project (newspaper Diário Gaúcho), massively consumed by the C and D classes, which is the average Brazilian reader; (2) texts and authors most read by the respondents of the last editions of the research Retratos da Leitura no Brasil; (3) collection "É Só o Começo" (adaptation of classics from Brazilian literature to readers with low literacy, adaptation by linguists); (4) texts of the newspaper Boca de Rua, produced by street people, with low schooling and low literacy; and (5) texts of the Diário da Causa Operária, the Brazilian working press produced also by people within the average literacy range of the country. After the collection, preparation and processing of the texts of the corpus, a series of experiments with the crude list of frequencies and the list of frequencies typed in CorPop. The results obtained show promising applications of CorPop in several linguistic tasks, such as text simplification and use as controlled vocabulary for writing definitions in dictionaries. Also, CorPop proves that a small corpus can have the same validity as a corpus of large proportions. Língua portuguesa Leitura : Compreensão Lingüística de corpus Corpus of popular Brazilian Portuguese Corpus linguistics Text simplification
104	Análise de um corpus de produção escrita em português por crianças e adultos indígenas bilíngues/monolíngues de Dourados/MS a partir da linguistíca de corpus Espindola, Sandra January 2014 (has links) Com a finalidade de entender a origem das dificuldades apresentadas por crianças e adultos indígenas na produção de textos em português, surgiu a presente investigação. a partir da Linguística de Corpus. Para tanto, foi construído um corpus de 483 textos de crianças e 349 textos de adultos escritos em língua portuguesaproduzidos por crianças e adultos indígenas e não indígenas. A amostra do grupo das crianças contou um total de 175 crianças, sendo 111 indígenas (71 bilíngues Guarani/Kaiowá e 40 Terena monolíngues) e 64 não indígenas, falantes monolíngues de português, alunos do 4º e do 5º ano do Ensino Fundamental. O grupo de adultos foi formado por um total de 118 adultos, sendo 74 indígenas (36 bilíngues Guarani/Kaiowá e 38 Terena monolíngues) e 44 não indígenas, falantes monolíngues de português, do1o e do último ano do Ensino Superior. Os objetivos específicos da pesquisa foram: (a) verificar se existem diferenças entre o tipo de dificuldades reveladas pelos indígenas monolíngues e bilíngues de diferentes etnias – Kaiowá/Guarani e Terena – em comparação com os monolíngues não indígenas na produção de textos narrativos em português; (b) na comparação entre os dois grupos etários, crianças e adultos, observar em que medida o caminho percorrido do ensino básico à formação acadêmica interferiu no desenvolvimento da habilidade de escrita de textos; e (c) no caso dos grupos de participantes adultos, investigar se o tempo de permanência no curso de graduação (alunos que estão no primeiro e no quarto ano de curso) interfere no nível de dificuldade na produção de textos. Os dados foram analisados através da ferramenta AntConc, a partir do viés teórico da Linguística de Corpus. A partir dessa proposta de pesquisa espera-se contribuir para que os professores, tanto os que atendem os acadêmicos quanto os que atendem as crianças, compreendam como a escrita desses dois grupos indígenas se estrutura. Essas informações são essenciais para futuras orientações nos trabalhos de leitura e escritas propostos pela escola e pelos cursos universitários que recebem acadêmicos indígenas. / In order to underste the origin of the difficulties faced by indigenous children e adults in the production of texts in Portuguese, this research emerged, from Corpus Linguistics. To that end, was built a corpus of 483 children e 349 adults texts of texts written in Portuguese produced by children e indigenous e non-indigenous adults.The sample of children group counted a total of 175 children, with 111 indigenous (71 bilingual Guarani / Kaiowá e Terena 40 monolingual) e 64 non-indigenous, monolingual speakers of Portuguese, students of the 4th e 5th year of elementary school.The adult group consisted of a total of 118 adults, with 74 indigenous (36 bilingual Guarani / Kaiowá e Terena 38 monolingual) e 44 non-indigenous, monolingual speakers of Portuguese, the first e last years of higher education.The specific objectives of the research were: (a) determine whether there are differences between the kinds of problems revealed by monolingual e bilingual indigenous ethnic groups - Kaiowá / Guarani e Terena - compared to non-indigenous monolingual in the production of narrative texts in Portuguese;(b) the comparison between the two age groups, children e adults, to observe to what extent the traveled way of basic education to academic interfered in the development of written texts skill;e (c) in the case of adults participating groups, to investigate whether the time spent in the undergraduate course (students who are the first e fourth year of course) interferes with the level of difficulty in producing texts.Data were analyzed by AntConc tool from the theoretical bias of Corpus Linguistics. From this research proposal is expected to contribute to teachers, both those who meet the academic e the attending children, underste how the writing of these two indigenous groups structure.This information is essential for future guidance in reading e written work proposed by schools e university courses receiving indigenous academics. Escrita Língua portuguesa Ensino e aprendizagem Educação indígena Produção textual Language education Teaching indigenous Corpus linguistics
105	As formações x-inho nas modalidades oral e escrita: um estudo contrastivo baseado na lingüística de corpus / The suffix "inho" in spoken and written discourse:a contrastive study based on corpus linguistics Marcos Antônio Gonçalves 13 February 2006 (has links) As formações x-inho são descritas, na maioria das gramáticas de Língua Portuguesa como contendo noções dimensiva e afetiva. Entretanto, essas mesmas gramáticas não incluem os fatores extraligüísticos e contextuais nos quais os anunciadores estão inseridos quando optam por uma formação em x-inho. Sob esta perspectiva, tem-se no presente trabalho, o objetivo de investigar a produtividade das formações x-inho em dois corpora eletrônicos: um oral, subdividido em dois subcorpora contendo respectivamente narrativas e descrições e um escrito, oriundo exclusivamente das variadas seções e cadernos de um jornal de grande circulação e qualidade. A dissertação quantifica as ocorrências das formações x-inho em cada um dos corpora. Em seguida cada uma dessas ocorrências é analisada para se verificar que tipo de noção (dimensiva, afetiva positiva ou negativa, intensificadora, etc) ela contem. Por fim são contrastados os dados de freqüência e dispersão de cada uma das noções encontradas para cada um dos corpora. A metodologia de nossa análise está centrada na área de investigação lingüística denominada Lingüística de corpus, que serve de base para que os dados colhidos sejam analisados e interpretados. / The items ending in -inho are described in the majority of grammars of Portuguese as conveying two notions, namely affect and dimension. However, the same grammars do not seem to include either the extralinguistic or contextual factors in which speakers are inserted when they opt for a word ending in -inho. The aim of the present work thus is to investigate the productivity of such items in two electronic corpora: one of an oral nature which is further subdivided into two sub-corpora containing narratives and descriptions and a second one compiled exclusively from the various sections of a widely read quality newspaper. The dissertation quantifies the various instances of items ending in inho in each of the corpora. Next, each of these occurrences is analysed and classified to check which notion (dimentio,positive affect, negative affect, intensification) they convey. Last the results of both frequency and dispension counts are contrasted for each of the corpora. The methodology of our analyses is centered on the area known as Corpus Linguistics, which provides a basis for the data to be compiled and interpreted. Lingüística de corpus Sufixo Análise lingüística Modalidades da língua Modality of linguistics Corpus linguistics Suffix Linguistics analysis LINGUISTICA
106	Gramaticalização e Preposições Complexas do Português: um estudo baseado em corpus / Grammaticalization and complex prepositions Camilla Canella Moraes Luzorio 31 March 2008 (has links) Este trabalho apresenta um estudo que aplica a teoria de gramaticalização a um corpus eletrônico diacrônico a fim de dar conta das mudanças ocorridas em estruturas da língua portuguesa normalmente denominadas Preposições Complexas. O estudo teve como objetivos: 1) investigar as preposições complexas em face de, em face a, face a, em vista de, em frente de, em frente a e frente a com vistas a compreender seu funcionamento em termos sintáticos e semânticos a fim de verificar se elas estão se gramaticalizando; 2) examinar textos de períodos históricos diferentes de modo que se compreenda a possível trajetória empreendida por tais formas entre os séculos XIV e XX; 3) averiguar se os itens frente a e face a podem ser considerados reduções das formas em frente a e em face a, respectivamente. A teoria da gramaticalização forneceu um arcabouço teórico para explicar os fenômenos de mudança que afetam os itens lingüísticos. O processo de gramaticalização consiste na passagem de uma construção de um status lexical para um status gramatical ou de um status menos gramatical para um mais gramatical. Um dos fatores desencadeantes desse processo é a freqüência de uso que leva o item a ser mais previsível e estável. A Lingüística de Corpus entra nesta pesquisa fornecendo a metodologia de compilação, extração e observação dos dados, pois à semelhança dos estudos de Hoffman (2005) foi realizada uma investigação baseada em corpora eletrônicos. O corpus base foi o Corpus do Português, composto por textos em língua portuguesa escritos a partir do século XIV até o século XX, disponível online em http://www.corpusdoportugues.org/. Verificou-se que as preposições complexas analisadas ascenderam a escala de gramaticalidade, pois se expandiram suas possibilidades de uso através do desenvolvimento de polissemias de semântica abstrata. Constatou-se, ainda, que, em muitos sentidos, elas coexistem como camadas, mas que pode haver uma tendência que conduzirá a escolha de uma forma para expressar cada sentido evidenciado / The present dissertation introduces a study which applies the theory of Grammaticalization to a digital diachronic corpus, with a view to mapping some of the changes which have taken place in certain structures of Portuguese, the so-called prepositional phrases. The objectives of the research were threefold. First, the study aimed at investigating the complex prepositions em face de, em face a, face a, em vista de, em frente de, em frente a e frente a, in order to understand their syntactic and semantic development and, in turn, to evaluate whether they are undergoing a process of grammaticalization. Secondly, the study sought to examine texts from a variety of historical periods, so as to map a possible trajectory taken by the afore mentioned forms between the 14th and the 20th centuries. Thirdly, the study intended to verify whether the items frente a e face a may be considered reductions of em frente a and em face a, respectively. The theoretical framework for the study has been taken from Grammaticalization, a theory which explains phenomena which affect linguistic items. The process of grammaticalization may consists in one item, lexical or grammatical, becoming more grammatical. The triggering factor in this case is said to be the frequency of use. Corpus Linguistics has provided a methodology for the compilation, extraction and treatment of the textual data in this dissertation. Similarly to Hoffman (2005) the investigation here was based on electronic corpora. The study corpus was the Corpus do Português, which consists of texts in Portuguese, written between the 14th and the 20th century, available at http://www.corpusdoportugues.org/. The study suggests that the complex prepositions analysed have become increasingly grammaticalised, because they have acquired additional abstract meanings. It has also been observed that, in many ways, these abstract meanings coexist as layers. However, there seems to be a tendency for one form to become the preferred way of expressing each of these new meanings Gramaticalização Preposição complexa Lingüística de Corpus Grammaticalization Complex prepositions Corpus Linguistics LINGUISTICA
107	Brasil brasileiro: o léxico e a identidade nacional / Brazilian Brazil : lexis and national identity Lúcia Deborah Ramos de Araújo 15 May 2010 (has links) Esta pesquisa dedica-se a realizar um trabalho com base no diálogo entre teorias semióticas e a Linguística de Córpus, estudando, especificamente, marcas linguísticas que possam caracterizar o perfil do brasileiro e suas características socioculturais plurais. Interessam-nos, sobretudo, os substantivos e adjetivos em função nomeadora e/ou qualificadora dos termos Brasil e brasileiro. Com isso, pretende-se oferecer um panorama bastante próximo da realidade linguística do brasileiro e de sua identidade. Para que os resultados sejam significativos, contamos com o concurso da Linguística de Córpus, servindo-nos de base a obra Linguística de Corpus (SARDINHA, 2004). Com a Linguística de Córpus, adotando a pesquisa direcionada pelo córpus (corpus-driven research) como metodologia, se pôde levantar, quantificar e tabular os signos em uso, identificando-lhes a frequência e a organização em feixes lexicais para avaliá-los quanto à significância no trato comunicativo. No desenvolvimento da análise e leitura crítica dos dados coletados, amparou-nos a Semiótica de extração peirceana, mais especificamente da Teoria da Iconicidade Verbal (SIMÕES, 2007), que permitiu delinear o potencial icônico das palavras de busca e de seus colocados. Com relação ao conceito de identidade em suas faces filosófica, social e antropológica, fornecem-nos suporte os pensamentos de NIETZSCHE (1991) acerca da necessidade do esquecimento para a construção de uma identidade e de HALL (1998), quanto aos eixos temporais que presidem o processamento discursivo dos fatos históricos e, por conseguinte, da construção identitária. O contraponto entre estes últimos autores contribui para a definição dos gêneros textuais interessantes à pesquisa, basicamente os textos argumentativos, publicados em jornais de grande circulação, no eixo Rio-São Paulo. A respeito da identidade na sociedade em rede, característica da contemporaneidade, apoia-nos obra de CASTELLS (2006). Os estudos específicos sobre a identidade nacional amparam-se sobretudo em DAMATTA (1978 e 1989) e LEITE (2007). A pesquisa demonstrou que a iconicidade lexical vem a ser mais apropriadamente delineada a partir de um universo de dados amplo, ao qual se tem acesso a partir da Linguística de Córpus, sendo, portanto, correto afirmar que os traços componentes da identidade brasileira podem ser apreendidos em seu estágio atual com base na análise de um córpus construído a partir de textos publicados em jornais, representativos das vozes e do pensamento de um estrato social formador de opinião. No contexto de transformações sociais e políticas que ocorrem no Brasil entre os anos 2005 e 2010, a investigação da identidade nacional e a apuração do autoconceito do brasileiro pôde apontar que alguns paradigmas historicamente estabelecidos estão sendo alterados, enquanto outros ainda persistem. O perfil identitário apurado pela pesquisa favorece a construção, por parte do estudioso da linguagem e, mais especificamente, do docente de língua portuguesa, de uma visão atualizada da identidade nacional, no recorte analisado, permitindo um trabalho consciente com as habilidades e competências vinculadas ao desenvolvimento da identidade nacional, conforme orientam os Parâmetros Curriculares Nacionais / This research has the purpose to perform a survey based on the dialogue between semiotic theories and Corpus Linguistics, studying, specifically, the language marks that may characterize the profile of the various Brazilian socio-cultural characteristics. Our special interest is to focus on the nouns and adjectives that nominate and / or qualify the terms 'Brazil' and 'Brazilian'. Through this study, we intend to reach a panorama which is very close to the linguistic reality of the Brazilian people and their identity. We have worked with the Corpus Linguistics, based on the book Corpus Linguistics (SARDINHA, 2004). We chose the corpus-driven research as a method, which allows raising, quantifying and tabulating the signs in use, in order to identify their frequency and lexical organization in bundles, so that they could be evaluated as to their significance in the communicative scene. The theories and works that bolstered this thesis were the Semiotics by Charles Sanders PEIRCE (2000), the works on semiotics by ECO (2007) and SANTAELLA (1996, 2000 e 2001), and the Theory of Verbal Iconicity (SIMOES, 2007). This one aims to establish the iconic potential of the search words in their context. Regarding the philosophical, social and anthropological readings on identity, this work is supported by the thoughts of NIETZSCHE (1991) in an article on the need of forgetfulness in order to build an identity. Another work which supports our conclusions is HALLs paper (1998) on the timelines that govern the discourse processing of the historical facts, which shows how they interfere in the construction of the identity. The counterpoint between these latter authors contribute to the definition of the text genre relevant to this research there were used basically argumentative texts, published in major newspapers in Rio and Sao Paulo. Regarding the identity in the network society as a contemporary issue, the work of CASTELLS (2006) was of great help. The studies on the Brazilian identity by DAMATTA (1978 and 1989) and LEITE (2002) also give basis to the considerations of this thesis. The research showed that the lexical iconicity comes to be more appropriately viewed from a broad universe of data, which has been provided by a large corpus (8 million words approximately) dealt with in the Corpus Linguistics methodology. Its therefore correct to say that components of Brazilian identity may be seized in its current state based on the analysis of a corpus built from texts published in newspapers, representing the voices and thoughts of a social stratum and opinion formers. The investigation of national identity and the self-concept of the Brazilian in the context of social and political transformations that have occurred in Brazil between 2005 and 2010 pointed out that some historically established paradigms have been going through a process of change, while others have persisted. The National Curriculum Parameters in Brazil establish topics on national identity to be developed by native teachers of Portuguese language. The results of this work are meant to be helpful to the aforementioned teachers Iconicidade Ensino Língua Portuguesa Linguística de córpus Identity Corpus Linguistics Portuguese Language Education Iconicity LINGUA PORTUGUESA
108	Análise de quadrigramas na escrita em inglês como língua estrangeira: um estudo baseado em corpus / Analysis of quadrigrams in EFL writing: a corpus-based study Gustavo Estef Lino da Silveira 19 March 2014 (has links) O presente estudo tem como objetivo geral traçar um perfil das escolhas léxico-gramaticais da escrita em inglês de um grupo de aprendizes brasileiros na cidade do Rio de Janeiro, ao longo dos anos de 2009 a 2012, através da análise de sua produção de quadrigramas (ou blocos de quatro itens lexicais usados com frequência por vários aprendizes) em composições escritas como parte da avaliação final de curso. Como objetivo específico, a pesquisa pretendeu analisar se os quadrigramas produzidos estavam dentre aqueles que haviam sido previamente ensinados para a execução da redação ou se pertenceriam a alguma outra categoria, isto é, quadrigramas já incorporados ao uso da língua ou quadrigramas errôneos usados com abrangência pela população investigada. Para tal, foram coletadas composições escritas por aprendizes de mesmo nível de proficiência de várias filiais de um mesmo curso livre de inglês na cidade do Rio de Janeiro. Em seguida, essas composições foram digitadas e anotadas para constituírem um corpus digital facilmente identificável em termos do tipo e gênero textual, perfil do aprendiz, filial e área de origem do Rio de Janeiro. O estudo faz uso de preceitos e métodos da Linguística de Corpus, área da Linguística que compila grandes quantidades de textos e deles extrai dados com o auxílio de um programa de computador para mapear uso, frequência, distribuição e abrangência de determinados fenômenos linguístico ou discursivo. O resultado demonstra que os aprendizes investigados usaram poucos quadrigramas ensinados e, coletivamente, preferiram usar outros que não haviam sido ensinados nas aulas específicas para o nível cursado. O estudo também demonstrou que quando o gênero textual faz parte de seu mundo pessoal, os aprendizes parecem utilizar mais quadrigramas previamente ensinados. Isto pode querer dizer que o gênero pode influenciar nas escolhas léxico-gramaticais corretas. O estudo abre portas para se compreender a importância de blocos léxico-gramaticais em escrita em L2 como forma de assegurar fluência e acuracidade no idioma e sugere que é preciso proporcionar maiores oportunidades de prática e conscientização dos aprendizes quanto ao uso de tais blocos / This study seeks to trace the profile of lexico-grammatical choices of a group of apprentice writers in the city of Rio de Janeiro, between 2009 and 2012. To this end it analyses the apprentices production of 4-grams (or rather blocks of four lexical items used with relative frequency by a number of apprentices) in written compositions, as part of their final assessment. Specifically, the research aimed to analyse whether the 4-grams produced by the apprentices had been taught previously as part of their composition lessons or whether they belonged to some other category. In other words, namely 4-grams already internalized as part of their language use of erroneous 4-grams used frequently and extensively by the subjects investigated. Thus, compositions written by apprentices at the same proficiency level were collected at various branches of a private English school in the city of Rio de Janeiro. Subsequently, these compositions were typed and tagged in order to compile a digital corpus easily identified in terms of type and textual genre, apprentice profile, branch and area of the city of Rio de Janeiro. The study makes use of precepts and methods of Corpus Linguistics, an area of Linguistics that collects large quantities of texts and from them extracts data with the help of a computer programme in order to map use, frequency, distribution and range of a certain linguistic or discursive phenomena. The results demonstrate that the apprentices studied made little use of 4-grams that had been taught them and, collectively, they preferred to use other n-grams that had not been taught in the specific lessons of the level. The study has also shown that when the textual genre is part of ones personal life, the apprentices seem to make use of more previously taught 4-grams. This may lead to believe that the genre may influence the choice of correct lexico-grammatical items. The study creates a research space for the understanding of the importance of lexico-grammatical chunks in L2 writing as a means of ensuring fluency and accuracy in the target language. In addition, it also suggests that more opportunities of practice should be offered to learners so that they become aware of the use of such chunks Escrita em inglês Linguística de corpus Quadrigramas Applied linguistics Writing in English Corpus linguistics 4-grams LINGUISTICA APLICADA
109	Pronoun Usage in the State of the Union Address and Weekly Addresses by Donald Trump : A Critical Discourse Analysis and Corpus Linguistics Approach Tęcza, Karolina Katarzyna January 2018 (has links) In the modern world of politics, convincing the audience is the key to democratically gain power in society—and the amount of power politicians gain depends on how convincing they are. In this competitive domain, elites use discourse not only to persuade the audience, but also to manipulate the audience. According to van Dijk (2006), persuasion is a legitimate and ethical way to influence the audience, while manipulation is an illegitimate and unethical way of influencing the audience. The present study examines pronoun usage in the political discourse of Donald Trump; it examines the State of the Union Speech and 37 Weekly Addresses. The quantitative approach to the data was taken by incorporating corpus linguistic methods, namely frequency counts, concordances, word list tools, and downsampling. The qualitative approach was taken by using methods from rhetoric and Critical Discourse Analysis. To analyse the examined phenomenon, the Aristotelian persuasion framework, Fairclough’s theory on the pronouns we and you, van Dijk’s triangulation framework with its focus on manipulation, and Wieczorek’s taxonomy of speakers were used. The study concluded that in both the State of the Union Address and the Weekly Addresses, Donald Trump frequently and interchangeably uses the pronouns we and our to refer to two groups with unequal power relations to one another. The identified patterns placed within the societal context of the examined text persuade the recipients. Pronouns such as we, our, I, and they play a key role in the elements of ethos and pathos. Furthermore, the identified patterns placed within the societal context of the examined text also showed that Donald Trump uses discourse structure to use short term memory and long term memory properties to manipulate the audience. Critical Discourse Analysis Corpus Linguistics Persuasion Manipulation Political Discourse Pronoun Use Languages and Literature Språk och litteratur
110	Corpus-based study of the use of English general extenders spoken by Japanese users of English across speaking proficiency levels and task types Watanabe, Tomoko January 2015 (has links) There is a pronounced shift in English language teaching policy in Japan with the recognition not only of the importance of spoken English and interactional competence in a globalised world, but also the need to emphasise it within English language pedagogy. Given this imperative to improve the oral communication skills of Japanese users of English (JUEs), it is vital for teachers of English to understand the cultural complexities surrounding the language, one of which is the use of vague language, which has been shown to serve both interpersonal and interactional functions in communications. One element of English vague language is the general extender (for example, or something). The use of general extenders by users of English as a second language (L2) has been studied extensively. However, there is a lack of research into the use of general extenders by JUEs, and their functional differences across speaking proficiency levels and contexts. This study sought to address the knowledge gap, critically exploring the use of general extenders spoken by JUEs across speaking proficiency levels and task types. The study drew on quantitative and qualitative corpus-based tools and methodologies using the National Institute of Information and Communications Technology Japanese Learner English Corpus (Izumi, Uchimoto, & Isahara, 2004), which contains transcriptions of a speaking test. An in-depth analysis of individual frequently-occurring general extenders was carried out across speaking proficiency levels and test tasks (description, narrative, interview and role-play) in order to reveal the frequency, and the textual and functional complexity of general extenders used by JUEs. In order to ensure the relevance of the application of the findings to the context of language education, the study also sought language teachers’ beliefs on the use of general extenders by JUEs. Three general extenders (or something (like that), and stuff, and and so on) were explored due to their high frequency within the corpus. The study showed that the use of these forms differed widely across the JUEs’ speaking proficiency levels and task types undertaken: or something (like that) is typically used in description tasks at the higher level and in interview and description tasks at the intermediate level; and stuff is typical of the interview at the higher level; and so on of the interview at the lower-intermediate level. The study also revealed that a greater proportion of the higher level JUEs use general extenders than do those at lower levels, while those with lower speaking proficiency level who do use general extenders, do so at an high density. A qualitative exploration of concordance lines and extracts revealed a number of interpersonal and discourse-oriented functions across speaking proficiency levels: or something (like that) functions to show uncertainty about information or linguistic choice and helps the JUEs to hold their turn; and stuff serves to make the JUEs’ expression emphatic; and so on appears to show the JUEs’ lack of confidence in their language use, and signals the desire to give up their turn. The findings suggest that the use of general extenders by JUEs is multifunctional, and that this multi-functionality is linked to various elements, such as the level of language proficiency, the nature of the task, the real time processing of their speech and the power asymmetry where the time and floor are mainly managed by the examiners. The study contributes to extending understanding of how JUEs use general extenders to convey interpersonal and discourse-oriented functions in the context of language education, in speaking tests and possibly also in classrooms, and provides new insights into the dynamics of L2 users’ use of general extenders. It brings into questions the generally-held view that the use of general extenders by L2 users as a group is homogenous. The findings from this study could assist teachers to understand JUEs’ intentions in their speech and to aid their speech production. More importantly, it may raise language educators’ awareness of how the use of general extenders by JUEs varies across speaking proficiency levels and task types. These findings should have pedagogical implications in the context of language education, and assist teachers in improving interactional competence, in line with emerging English language teaching policy in Japan. 428.2

Search results