Global ETD Search

131	Verblexpor : um recurso léxico com anotação de papéis semânticos para o português Zilio, Leonardo January 2015 (has links) Esta tese propõe um recurso léxico de verbos com anotação de papéis semânticos, denominado VerbLexPor, baseado em recursos como VerbNet, PropBank e FrameNet. As bases teóricas da proposta são interdisciplinares e retiradas da Linguística de Corpus e do Processamento de Linguagem Natural (PLN), visando-se a contribuir para a Linguística e para a Computação. As hipóteses de pesquisa são: a) um mesmo conjunto de papéis semânticos pode ser aplicado a diferentes gêneros textuais; e b) as diferenças entre esses gêneros se destacam no ranqueamento dos papéis semânticos. O desenvolvimento do VerbLexPor se apoia em dois corpora: um especializado, com mais de 1,6 milhão de palavras, composto por artigos científicos de Cardiologia de três periódicos brasileiros; e um não especializado, com mais de 1 milhão de palavras composto por artigos do jornal popular Diário Gaúcho. Os corpora foram anotados com o parser PALAVRAS, e as informações de sentenças, verbos e argumentos foram extraídas e armazenadas em um banco de dados. O VerbLexPor tem 192 verbos e mais de 15 mil argumentos anotados distribuídos em mais de 6 mil sentenças. Observou-se que o corpus do Diário Gaúcho privilegia uma sintaxe direta e pouco uso de voz passiva e adjuntos, enquanto o corpus de Cardiologia apresenta mais voz passiva e um maior uso de INSTRUMENTOS na posição de sujeito, além de uma menor incidência de AGENTES. Foram realizados também alguns experimentos paralelos, como a anotação de papéis semânticos por vários anotadores e o agrupamento automático de verbos. Na tarefa de múltiplos anotadores, cada um anotou exatamente as mesmas 25 orações. Os anotadores receberam um manual de anotação e um treinamento básico (explicação sobre a tarefa e dois exemplos de anotação). Usou-se o cálculo de multi-π para avaliar a concordância entre os anotadores, e o resultado foi de π = 0,25. Os motivos para essa concordância baixa podem estar na falta de um treinamento mais completo. A tarefa de agrupamento de verbos mostrou que a sintaxe e a semântica são igualmente importantes para o agrupamento. Este estudo contribui para a área de Linguística, com um léxico de verbos anotados semanticamente, e também para a Computação, com dados que podem ser consultados e processados para diversas aplicações do PLN, principalmente por estarem disponíveis nos formatos XML e SQL. / This dissertation aims at developing a lexical resource of verbs annotated with semantic roles, called VerbLexPor, and based on other resources, such as VerbNet, PropBank, and FrameNet. The theoretical bases of this study lies in Corpus Linguistics and Natural Language Processing (NLP), so that it aims at contributing to both Linguistics and Computer Science. The hypotheses are: a) one set of semantic roles can be applied to different genres; and b) the differences among genres are shown by the ranking of semantic roles. The development of VerbLexPor has two corpora at the basis: a specialized one, with more than 1.6 million words, composed by scientific papers in the field of Cardiology from three Brazilian journals; and a non-specialized one, with more than 1 million words, composed by newspaper articles from Diário Gaúcho. The corpora were analyzed with the parser PALAVRAS, and sentence, verb and argument information was extracted and stored in a database. VerbLexPor has 192 verbs and more than 15 thousand arguments annotated with semantic roles, distributed among more than 6 thousand sentences. We observed that Diário Gaúcho has a more direct syntax, with less passive voice and adjuncts, while Cardiology has more passive voice and more INSTRUMENTS for subjects, and fewer AGENTS. We also conducted some parallel experiments, such as semantic role labeling with multiple annotators and automatic verbal clustering. In the multiple annotators task, each of them annotated exactly the same 25 sentences. They received an annotation manual and basic training (explanation on the task and two annotation examples). We used multi-π to evaluate agreement among annotators, and results were π = 0,25. Reasons for this low agreement may be a lack of a thoroughly developed training. The verbal clustering task showed that syntax and semantics are equally important for verbal clustering. This study contributes to Linguistics, with a verbal lexicon annotated with semantic roles, and also to Computer Science, with data that can be assessed and processed for various NLP applications, especially because the data are available in both XML and SQL formats. Língua portuguesa Linguística computacional Corpus Linguagem especializada Semantic role labeling Lexical resource NLP Corpus linguistics
132	Análise de um corpus de produção escrita em português por crianças e adultos indígenas bilíngues/monolíngues de Dourados/MS a partir da linguistíca de corpus Espindola, Sandra January 2014 (has links) Com a finalidade de entender a origem das dificuldades apresentadas por crianças e adultos indígenas na produção de textos em português, surgiu a presente investigação. a partir da Linguística de Corpus. Para tanto, foi construído um corpus de 483 textos de crianças e 349 textos de adultos escritos em língua portuguesaproduzidos por crianças e adultos indígenas e não indígenas. A amostra do grupo das crianças contou um total de 175 crianças, sendo 111 indígenas (71 bilíngues Guarani/Kaiowá e 40 Terena monolíngues) e 64 não indígenas, falantes monolíngues de português, alunos do 4º e do 5º ano do Ensino Fundamental. O grupo de adultos foi formado por um total de 118 adultos, sendo 74 indígenas (36 bilíngues Guarani/Kaiowá e 38 Terena monolíngues) e 44 não indígenas, falantes monolíngues de português, do1o e do último ano do Ensino Superior. Os objetivos específicos da pesquisa foram: (a) verificar se existem diferenças entre o tipo de dificuldades reveladas pelos indígenas monolíngues e bilíngues de diferentes etnias – Kaiowá/Guarani e Terena – em comparação com os monolíngues não indígenas na produção de textos narrativos em português; (b) na comparação entre os dois grupos etários, crianças e adultos, observar em que medida o caminho percorrido do ensino básico à formação acadêmica interferiu no desenvolvimento da habilidade de escrita de textos; e (c) no caso dos grupos de participantes adultos, investigar se o tempo de permanência no curso de graduação (alunos que estão no primeiro e no quarto ano de curso) interfere no nível de dificuldade na produção de textos. Os dados foram analisados através da ferramenta AntConc, a partir do viés teórico da Linguística de Corpus. A partir dessa proposta de pesquisa espera-se contribuir para que os professores, tanto os que atendem os acadêmicos quanto os que atendem as crianças, compreendam como a escrita desses dois grupos indígenas se estrutura. Essas informações são essenciais para futuras orientações nos trabalhos de leitura e escritas propostos pela escola e pelos cursos universitários que recebem acadêmicos indígenas. / In order to underste the origin of the difficulties faced by indigenous children e adults in the production of texts in Portuguese, this research emerged, from Corpus Linguistics. To that end, was built a corpus of 483 children e 349 adults texts of texts written in Portuguese produced by children e indigenous e non-indigenous adults.The sample of children group counted a total of 175 children, with 111 indigenous (71 bilingual Guarani / Kaiowá e Terena 40 monolingual) e 64 non-indigenous, monolingual speakers of Portuguese, students of the 4th e 5th year of elementary school.The adult group consisted of a total of 118 adults, with 74 indigenous (36 bilingual Guarani / Kaiowá e Terena 38 monolingual) e 44 non-indigenous, monolingual speakers of Portuguese, the first e last years of higher education.The specific objectives of the research were: (a) determine whether there are differences between the kinds of problems revealed by monolingual e bilingual indigenous ethnic groups - Kaiowá / Guarani e Terena - compared to non-indigenous monolingual in the production of narrative texts in Portuguese;(b) the comparison between the two age groups, children e adults, to observe to what extent the traveled way of basic education to academic interfered in the development of written texts skill;e (c) in the case of adults participating groups, to investigate whether the time spent in the undergraduate course (students who are the first e fourth year of course) interferes with the level of difficulty in producing texts.Data were analyzed by AntConc tool from the theoretical bias of Corpus Linguistics. From this research proposal is expected to contribute to teachers, both those who meet the academic e the attending children, underste how the writing of these two indigenous groups structure.This information is essential for future guidance in reading e written work proposed by schools e university courses receiving indigenous academics. Escrita Língua portuguesa Ensino e aprendizagem Educação indígena Produção textual Language education Teaching indigenous Corpus linguistics
133	Edição semidiplomática e estudo lexicográfico de habilitações à herança setecentistas do Juízo de Órfãos de São Paulo / Semidiplomatic editing and lexicographical study of enable to heritage the eighteenth century of the Orphan Court of São Paulo Fabio Gimenez 28 January 2015 (has links) Os objetivos deste trabalho são apresentar a transcrição semidiplomática justalinear, fidedigna, numerada a cada cinco linhas, de cinco autos cíveis de habilitação à herança lavrados no século XVIII, na cidade de São Paulo, pelo Juízo dos Órfãos, que poderá ser lida sem perdas de informação dos documentos originais, sendo de interesse para o estudioso acadêmico que pesquise tanto dados históricos quanto linguísticos; estruturar o vocabulário de todas as palavras que constam nesses documentos, usando como ferramenta o software AntConc3.2.4w; e traçar um panorama da história interna dos documentos, do Juízo de Órfãos, das leis orfanológicas vigentes e trazer uma breve análise tipológica. / The objectives of this work are present semidiplomatic justalinear transcription, reliable, numbered every five lines, five civil cases enable the inheritance drawn up in the eighteenth century, the city of São Paulo, by the Orphans Court, which can be read without loss of information from the original documents, being of interest to the academic scholar who search for both historical and linguistic data; structure the vocabulary of all the words contained in these documents, using as a tool the AntConc3.2.4w software; and draw a picture of the inside story of the documents, the Orphan Court, the existing orfanológicas laws and bring a brief typological analysis. Direito Filologia História Lexicografia Linguística de corpus Corpus linguistics History Law Lexicography Philology
134	Um estudo da lexia bem (português-francês) pela linguística de corpus / A study of the lexical item BEM (Portuguese - French) by Corpus Linguistics Gisele Galafacci 29 September 2014 (has links) No contexto de ensino e aprendizagem de uma língua estrangeira, o dicionário bilíngue aparece como um instrumento linguístico que se propõe a auxiliar os aprendizes na construção do conhecimento em relação à língua de aprendizagem. No entanto, esse instrumento tem apresentado muitas lacunas, o que resulta em dificuldades aos aprendizes no que concerne à compreensão e, sobretudo, à expressão em língua estrangeira. Tais lacunas se referem ao conteúdo informacional contido na sua microestrutura, que apresenta geralmente uma quantidade reduzida de definições, as quais não são acompanhadas de exemplos de uso na maioria dos casos. Por isso, este trabalho considera a descrição do item lexical BEM em obras lexicográficas, tanto monolíngues como bilíngues, com o objetivo de verificar se os conteúdos informacionais, sua estruturação e organização são funcionais para os consulentes em contextos de ensino e aprendizagem. Além disso, propõe uma descrição baseada em corpora do mesmo item lexical realizada com auxílio do instrumental da Linguística de Corpus, utilizada como metodologia neste trabalho. Consideramos a hipótese de que a descrição lexicográfica baseada em corpora pode se constituir num diferencial para a compreensão de uso de itens lexicais em situação de ensino e aprendizagem de uma língua estrangeira. Este estudo se justifica pela necessidade que observamos na prática docente de fornecer aos estudantes instrumentos linguísticos que possam lhes melhor auxiliar na compreensão do uso das lexias em francês, especialmente em situações de produção oral e escrita. / In the context of teaching and learning a foreign language, bilingual dictionaries appear as a linguistic tool that aims to assist learners in constructing knowledge related to language learning. However, this instrument shows many gaps, which results in difficulties for learners to understand and speak in a foreign language. Such gaps generally refer to the informational content contained in its microstructure, which offers a reduced amount of definitions that, in the majority of cases, are not followed by examples of their use. Therefore, this work considers the description of the lexical item BEM in lexicographical works, both monolingual and bilingual, with the goal of checking whether the informational content, its structure and its organization are functional for users in the context of teaching and learning. It also proposes an approach based on corpora of the same lexical item performed by using the tools of Corpus Linguistics, which comprises the methodology of this work. We consider the hypothesis that the lexicographical description based on corpora may constitute an increase in the understanding of the use of lexical items when teaching and learning a foreign language. This study was justified by the need to provide students with linguistic tools that can help them better understand the use of words in French, especially in speaking and writing situations. Lexicografia bilingue Linguística de Corpus Produção em língua estrangeira Bilingual lexicography Corpus Linguistics Foreign language production
135	A construção de um glossário bilíngue de futebol com o apoio da Linguística de Corpus. / Bulding a bilingual glossary on Football with the aid of Corpus Linguistics Paulo Augusto Almeida Seemann 26 March 2012 (has links) Ao tentar traduzir um texto específico sobre o tema futebol da língua espanhola para o português brasileiro ou vice-versa, o tradutor se depara com uma infinidade de termos típicos dessa área de especialidade que não constam em muitos dos atuais dicionários e glossários, ou constam de forma limitada, sem abranger muitas das situações reais de uso. Neste trabalho, construímos um glossário bilíngue e bidirecional que contempla os termos futebolísticos mais frequentes no par linguístico espanhol-português, usados rotineiramente na comunicação escrita. Partimos da suposição que a Linguística de Corpus forneceria os meios necessários para tal empreitada. A Linguística de Corpus permite estudar uma língua ou variedade linguística por computador, por meio de evidências empíricas encontradas em um corpus, entendido como um conjunto de dados linguísticos textuais em formato eletrônico e coletado de forma criteriosa. Esta dissertação está dividida em cinco partes. Como introdução, falamos de alguns aspectos históricos das línguas portuguesa e espanhola, da influência do futebol em nossa sociedade, de problemas encontrados em dicionários e glossários, e do potencial das notícias futebolísticas da Internet como referência para a construção do glossário que propomos. Na segunda parte, comentamos a Linguística de Corpus como abordagem e método de investigação, os tipos de corpora e a composição de nosso corpus de estudo, a questão da equivalência na tradução e a forma como selecionamos os termos e seus equivalentes tradutórios, tendo como base a comparação de notícias futebolísticas do Brasil, da Espanha e da Argentina, além da extração e observação de palavras-chave, com a ajuda de ferramentas eletrônicas específicas. Na terceira parte, discutimos as questões terminológicas que envolvem este estudo, especialmente as decisões tomadas para a macro e microestrutura de nosso glossário. Na quarta parte, demonstramos como o glossário pode ser apresentado ao consulente e oferecemos uma amostra de verbetes. Na quinta e última parte, fazemos as considerações finais, em que concluímos que a Linguística de Corpus, como abordagem e metodologia, confirmou-se eficiente para a construção do glossário bilíngue, pois a exploração de corpora especializados permitiu identificar os principais termos futebolísticos e seus equivalentes tradutórios usados na comunicação escrita do jornalismo brasileiro, espanhol e argentino, resultando em uma obra de referência bilíngue específica do futebol com quase quatro mil verbetes; todos com exemplos reais de uso / When trying to translate a specific text on football from Spanish into Brazilian Portuguese or vice versa, the translator is faced with a myriad of footbal-specific terms which are not found in most dictionaries or glossaries, or which are found in a limited way, leaving out many real use situations. In the course of this study, a bilingual and bi-directional glossary was built with the most commonly used football terms in written communication in the Spanish-Portuguese language pair, . My initial assumption was that Corpus Linguistics would provide the necessary means for such a task. Corpus Linguistics enables one to study a language or a language variety using a computer, retrieving empirical evidence found in a corpus, which is defined as a set of texts, compiled according to predefined criteria, in electronic format. This dissertation is divided into five parts. In the introduction, some historical aspects of Portuguese and Spanish are discussed, as well as the influence of football in our society, the problems found in dictionaries and glossaries, and the potential of football news retrieved from the Internet as a basis for building the glossary proposed. In the second part, I argue that Corpus Linguistics is an approach and a method of research, and present the different types of corpora. Then, the question of equivalence in translation is briefly addressed, the content of our corpus of study is explained, as well as the steps adopted to identify the terms and their translation equivalents, through the comparison of football news from Brazil, Spain and Argentina, and by means of the extraction and observation of keywords, with the aid of specific electronic tools. In the third part, I discuss the terminology issues implicated in this study, especially with reference to the decisions taken for the macro- and microstructure of the glossary. In the fourth part, I propose a form of presenting the glossary to the user and provide a sample of entries. In the fifth and last part, I make the final considerations, in which I conclude that Corpus Linguistics, as an approach and a methodology, proved to be effective for the construction of the targeted bilingual glossary, since exploring the specialized corpora made it possible to properly identify the main football terms used in written communication in Brazilian, Spanish and Argentine journalism and their translation equivalents. The result is a bilingual work of reference in the field of football, which contains nearly four thousand entries, all of them with authentic examples of usage. Futebol Glossário bilíngue Linguística de Corpus Terminologia Tradução Bilingual Glossary Corpus Linguistics Football Terminology Translation
136	A Linguistic Analysis of the Written Production of Second Language Learners : The Variation of Article Usage by Adult Chinese Learners of English Wu, Junyu January 2014 (has links) This study aims to test Robertson’s lexical transfer principle, which posits that Chinese learners use demonstratives (particularly this) and the numeral one as markers of definiteness and indefiniteness. This is tested by analysing Chinese learners’ written production collected from the Spoken and Written English Corpus of Chinese Learners 2.0 (SWECCL 2.0). The purpose is to understand the variation of article usage by adult Chinese learners of English. More specifically, the study examines to what extent articles, possessive and demonstrative pronouns are used in Chinese learners’ English and how definite and indefinite articles are used by the Chinese learners. Findings suggest that Robertson’s lexical transfer principle was corroborated by the present study. In addition, Chinese learners prefer to use demonstrative determiners, the possessive determiner our, and the numeral one to perform the function of marking definiteness and indefiniteness. In particular, the learners try to use the demonstrative determiners that and this in the anaphoric function instead of the definite article, and the demonstrative determiner those is frequently used in the cataphoric function. What is more, the learners use the numeral one as a marker of indefiniteness, and it is also used as a marker of definiteness in the anaphoric function. Further, the possessive determiner our is used as a marker of definiteness in larger situation uses referring to something unique. To this end, the study is able to show that the definite article is used to perform the function of marking indefiniteness, and in some particular contexts the definite article functions as a Chinese specifier in Chinese learners’ English. Also, the indefinite article is frequently used in quantifier phrases but is rarely used in other functions. There are three main reasons that may explain why Chinese learners use determiners variously. Firstly, the choice of determiners by Chinese learners is influenced by linguistic contexts. Secondly, because of learning strategies, Chinese learners try to ignore the anaphoric function and cataphoric function that they are not yet ready to process in article usage. Thirdly, interlanguage grammar influences the optionality in the use of articles. second language acquisition corpus linguistics keyword analysis General Language Studies and Linguistics
137	Prudes versus sluts : An analysis of how attitudes are expressed through colloquial terminology Blixt, Emely January 2018 (has links) This paper performs a corpus-based critical discourse analysis on the terms“vamp”, “slut”, “prude” and “spinster” and how they are used in context fromthe 1920s to the 2000s. They were categorized according to what attitudeswere connected to them, positive, neutral and negative. An interest was alsotaken in what attributive adjectives were used in context with each term. Theresults showed consistent negative attitudes towards “prude” and “spinster”,while the attitudes towards “Vamp” and “slut” were mixed with negative andpositive. Critical discourse analysis corpus linguistics historical sociolinguistics attributive features Humanities and the Arts Humaniora och konst
138	Putting Your Ass on the Line : The Conceptualization of Risk in English and Spanish Arcon, Tjasa January 2010 (has links) The present study sets out to shed light on the conceptualization of risk in two different languages, English and Spanish. In order to reveal how risk is perceived in the minds of speakers of the two languages, I undertook a comprehensive cross-linguistic survey of the conceptual metaphors related to risk-taking. This was done through the examination of the conventional collocations of the noun and the verb risk in English, and the noun riesgo and the verb arriesgar(se) in Spanish. In addition, I also focused on the analysis of the idioms that deal with risk and risk-taking in both languages. This contrastive cross-cultural linguistic study of the conceptual field of risk and risk-taking was conducted within the frameworks of corpus linguistics as well as cognitive linguistics, which means that I worked with naturally occurring data gathered from various corpora while using the conceptual theory of metaphor for the analysis of potential conceptual metaphors related to risk. conceptual metaphor risk English Spanish corpus linguistics cognitive linguistics Specific Languages Studier av enskilda språk
139	The use of the pronouns we, us, and our in political speeches : A comparative study of the inaugural addresses of Bush and Obama Verhoek, Simone January 2016 (has links) Pronouns carry considerable importance in language. The speaker’s identity and connection to the audience emerges through the consistent use of certain pronouns (De Fina, 1995). This research is about the use of we, us, and our in political discourse. Specifically, their use will be examined in the inaugural addresses of George W. Bush in 2005 and Barack Obama in 2009. The aim of this research is to examine the frequencies and the co-occurrences of these pronouns and then compare their use in these two speeches. More specifically, how do the pronouns examined affect the message and enhance hearer credibility. This is done by applying (a) a quantitative corpus linguistics analysis and (b) qualitative analysis of the context of use. The results show that there is a difference in frequency of pronoun use; however, the usage of pronouns is rather similar in the two speeches Political discourse participation framework personal pronouns inaugural addresses corpus linguistics. Specific Languages Studier av enskilda språk
140	Thou Shalt Not Split...? : A Corpus-Based Study on Split Infinitives in American English Johansson, Simon January 2015 (has links) This essay aims to shed light on the prevalence of the to + adverb + verb and to not + verb split infinitives in American English, both in a historical perspective and in present day usage, and how it varies in different contexts where different levels of formality are expected. Although students are taught to avoid splitting constructions, numerous grammarians and linguists question this prescriptive viewpoint. Two extensive corpora, the Corpus of Historical American English (COHA) and the Corpus of Contemporary American English (COCA), were used to gather data. The results revealed how the frequency of the split infinitive was, and still is, rising rapidly, and becoming more and more a standard and accepted feature in American English. The most common context in which to find a split infinitive was that of informal spoken language. However, it was in the most formal of settings, that of academic texts, where the largest increase in prevalence of the split infinitive was seen. split infinitive corpus linguistics COHA COCA American English General Language Studies and Linguistics

Search results