Global ETD Search

1	The automatic selection of concordance lines Collier, Alex January 1999 (has links) This thesis presents the results of an experiment into the automatic selection of concordance lines from very large corpora. Corpora now exist which are in excess of 100 million words in size, but the increase in size of corpora brings with it certain problems. These problems are discussed in the light of information obtained from professional corpus users and the continuing centrality of the concordance as the main means of interpreting the contents of the corpus is highlighted. A possible means of overcoming the problems associated with the use of large corpora is presented. This solution is based upon software which was designed for the purposes of textual abridgement, this being carried out via an automatic analysis of lexico-cohesive bonds within the text. An analogy is drawn between conventional text and concordances; this analogy is then further explored by processing sets of concordance lines with the modified abridgement software. In order to determine the success of the approach in identifying concordance lines which illustrate key features of the node word, an evaluation exercise is carried out, involving expert corpus users as respondents. 005 Textual abridgement; Corpora
2	A corpus-assisted study on modal verbs in consecutive interpreting He, Yuan, William January 2018 (has links) University of Macau / Faculty of Arts and Humanities. / Department of English Corpora (Linguistics) Consecutive interpreting
3	Probabilistic topic modeling and classification probabilistic PCA for text corpora Cheng, Chi Wa 01 January 2011 (has links) No description available. Computational linguistics Corpora (Linguistics)
4	Using Language Corpora to Enhance Grammatical Proficiency in Chinese Chen, Hsiao-Chien 10 July 2012 (has links) (PDF) School curriculum and pedagogy change over time and are affected by changes in technology. One little used technology in foreign language classrooms is the electronic language corpus. In corpus-based linguistics, language corpora are often used as tools to analyze and observe various language features, including discourse, pragmatics, and syntax. However, language corpora can also act as a tool to assist language teachers by providing greater exposure to features of the language. Using language corpora is especially helpful in exposing learners to so-called authentic language used in a target language culture. Moreover, students can gradually enhance their language proficiency by using a well-developed corpus. In the foreign language-teaching world, most corpora studies focus on using English corpora in ESL settings, and there are only a few studies focused on using foreign language corpora in other foreign language teaching settings. The light use of corpora in other foreign language settings may be partly related to the lack of user-friendly foreign language corpora, or a lack of understanding of how to manipulate different foreign language corpora effectively. This study seeks to demonstrate how language corpora can be used in advanced Chinese classrooms, and how Chinese corpora can help students to enhance their language proficiency. This study's results show that corpus use in advanced Chinese classrooms can help advanced Chinese learners to improve their understanding of grammar taught in class. Chinese corpora corpora and language teaching Other Languages, Societies, and Cultures
5	Criação de um ambiente para o processamento de córpus de Português Histórico / Creation of an environment for processing of Historical Porrtuguese Corpora Candido Junior, Arnaldo 02 April 2008 (has links) A utilização de córpus tem crescido progressivamente em áreas como Lingüística e Processamento de Língua Natural. Como resultado, temos a compilação de novos e grandes córpus e a criação de sistemas processadores de córpus e de padrões para codificação e intercâmbio de textos eletrônicos. Entretanto, a metodologia para compilação de córpus históricos difere das metodologias usadas em córpus contemporâneos. Outro problema é o fato de a maior parte dos processadores de córpus proverem poucos recursos para o tratamento de córpus históricos, apesar de tais córpus serem numerosos. Da mesma forma, os sistemas para criação de dicionários não atendem satisfatoriamente necessidades de dicionários históricos. A motivação desta pesquisa é o projeto do Dicionário Histórico do Português do Brasil (DHPB) que tem como base a construção de um córpus de Português do Brasil dos séculos XVI a XVIII (incluindo alguns textos do começo do século XIX). Neste trabalho são apresentados os desafios encontrados para o processamento do córpus do projeto do projeto DHPB e os requisitos para redação de verbetes do dicionário histórico. Um ambiente computacional para processamento de córpus, criação de glossários e redação de verbetes foi desenvolvido para o projeto DHPB sendo possível adaptá-lo para ser aplicado a outros projetos de criação de dicionários históricos / Corpora has been increasingly used within the areas of Linguistics and Natural Language Processing. As a result, new and larger corpora have been compiled and processing systems and standards for encoding and interchange of electronic texts have been developed. However, when it comes to compilation of historical corpora, the methodology is different from the ones used to compile corpora of contemporary language. Another drawback is the fact that most corpus processing systems provide few resources for the treatment of historical corpus, although there are numerous corpora of this type. Similarly, the systems for dictionary creation do not satisfactorily meet the needs of historical dictionaries. The present study is part of a larger project - the Historical Dictionary of Brazilian Portuguese (HDBP) - which aims to compile a dictionary on the basis of a corpus of Brazilian Portuguese texts from the sixteenth through the eighteenth centuries (including some texts from early nineteenth century). Here, we present the challenges for processing the corpus of the HDPB project and established the criteria for creating the entries of a historical dictionary. This study has developed a computational environment for processing the corpus, building glossaries as well as for creating the entries of the HDPB. This system can be easily adapted to the needs and scope of other historical dictionary projects Compilação de córpus Corpora creation Córpus históricos Ferramentas de processamento de córpus Historical corpora Tools for corpora processing
6	Criação de um ambiente para o processamento de córpus de Português Histórico / Creation of an environment for processing of Historical Porrtuguese Corpora Arnaldo Candido Junior 02 April 2008 (has links) A utilização de córpus tem crescido progressivamente em áreas como Lingüística e Processamento de Língua Natural. Como resultado, temos a compilação de novos e grandes córpus e a criação de sistemas processadores de córpus e de padrões para codificação e intercâmbio de textos eletrônicos. Entretanto, a metodologia para compilação de córpus históricos difere das metodologias usadas em córpus contemporâneos. Outro problema é o fato de a maior parte dos processadores de córpus proverem poucos recursos para o tratamento de córpus históricos, apesar de tais córpus serem numerosos. Da mesma forma, os sistemas para criação de dicionários não atendem satisfatoriamente necessidades de dicionários históricos. A motivação desta pesquisa é o projeto do Dicionário Histórico do Português do Brasil (DHPB) que tem como base a construção de um córpus de Português do Brasil dos séculos XVI a XVIII (incluindo alguns textos do começo do século XIX). Neste trabalho são apresentados os desafios encontrados para o processamento do córpus do projeto do projeto DHPB e os requisitos para redação de verbetes do dicionário histórico. Um ambiente computacional para processamento de córpus, criação de glossários e redação de verbetes foi desenvolvido para o projeto DHPB sendo possível adaptá-lo para ser aplicado a outros projetos de criação de dicionários históricos / Corpora has been increasingly used within the areas of Linguistics and Natural Language Processing. As a result, new and larger corpora have been compiled and processing systems and standards for encoding and interchange of electronic texts have been developed. However, when it comes to compilation of historical corpora, the methodology is different from the ones used to compile corpora of contemporary language. Another drawback is the fact that most corpus processing systems provide few resources for the treatment of historical corpus, although there are numerous corpora of this type. Similarly, the systems for dictionary creation do not satisfactorily meet the needs of historical dictionaries. The present study is part of a larger project - the Historical Dictionary of Brazilian Portuguese (HDBP) - which aims to compile a dictionary on the basis of a corpus of Brazilian Portuguese texts from the sixteenth through the eighteenth centuries (including some texts from early nineteenth century). Here, we present the challenges for processing the corpus of the HDPB project and established the criteria for creating the entries of a historical dictionary. This study has developed a computational environment for processing the corpus, building glossaries as well as for creating the entries of the HDPB. This system can be easily adapted to the needs and scope of other historical dictionary projects Compilação de córpus Córpus históricos Ferramentas de processamento de córpus Corpora creation Historical corpora Tools for corpora processing
7	The awareness of semantic prosody and its implications for the EFL vocabulary teaching :a study Choi, Ka Fai January 2018 (has links) University of Macau / Faculty of Arts and Humanities. / Department of English Semantic prosody Corpora (Linguistics) Vocabulary -- Study and teaching
8	What's the buzz? :a discursive approach to news values of Buzzfeed News Wang, An Ni, Annie January 2018 (has links) University of Macau / Faculty of Arts and Humanities. / Department of English News Web sites Discourse analysis Corpora (Linguistics)
9	Traços linguístico-tradutórios em "As Três Marias" de Rachel de Queiroz, à luz dos Estudos da Tradução Baseados em Corpus / Bispo, Mirian Pereira. January 2019 (has links) Orientador: Celso Fernando Rocha / Banca: Regiani Aparecida Santos Zacarias / Banca: Camila Hofling / Resumo: Esta pesquisa tem como objetivo analisar os traços linguísticos-tradutórios com ênfase no léxico mais recorrente presente em um Corpus Paralelo, constituído pelas obras As Três Marias, da escritora brasileira Rachel de Queiroz, e sua respectiva tradução, The Three Marias, realizada por Fred P. Ellisson em 1963. O arcabouço teórico e metodológico recaiu sobre os Estudos da Tradução Baseados em Corpus (BAKER, 1995; TAGNIN, VIANA, 2015; FERNANDES, 2006), bem como da Linguística de Corpus (BERBER SARDINHA, 2004) que nos auxiliaram na seleção e investigação dos itens lexicais. A partir do maior índice de chavicidade dos vocábulos levantados pela ferramenta KeyWords do programa WordSmith Tools 6.0, escolhemos quatro itens, a saber: Olhos, Coração, Amor e Medo. Desse modo, com base nesses itens lexicais selecionados foi possível propor uma leitura sobre a temática do romance. E, por meio da ferramenta Concord, observarmos nas linhas de Concordância o emprego metafórico e simbólico dos vocábulos. Em seguida, analisamos os correspondentes dos itens mais frequentes fundamentados nos conceitos dos Estudos Descritivos da Tradução (LAVIOSA, 2011, 2004), alguns princípios do léxico (BIDERMAN, 1987, 1996, 1998, 2001; BASILIO, 2000) e para uma leitura mais simbólica dos vocábulos utilizamos dicionários de símbolos (CHEVALIER & GHEERBRANT, 1994; LEXIKON, 1990; CIRLOT, 1979). Ao cotejarmos os excertos escolhidos para análise pudemos constatar o emprego dos seguintes correspondentes em inglês: ... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: This research aims to analyze the linguistic-translational traits with emphasis on the most recurrent lexicon present in a Parallel Corpus constituted by the works As Três Marias, by the Brazilian writer Rachel de Queiroz, and the respective translation, The Three Marias by Fred P. Ellisson in 1963. The theoretical and methodological framework was based on the Corpus-based Translation Studies (BAKER, 1995; TAGNIN, VIANA, 2015; FERNANDES, 2006), as well as Corpus Linguistics (BERBER SARDINHA, 2004), which assisted us in the selection and investigation of lexical items. From the highest keyness of words raised by the KeyWords tool of the WordSmith Tools 6.0 program, we chose four items, namely: Olhos, Coração, Amor and Medo. Thus, based on these selected lexical items it was possible to propose a reading about the theme of the novel. And, through the Concord tool, we observed in the lines of Concordance the metaphorical and symbolic use of the words. Then, we analyze the correspondents of the most frequent items based on the concepts of Descriptive Studies of Translation (LAVIOSA, 2011, 2004), some lexical principles (BIDERMAN, 1987, 1996, 1998, 2001 and BASILIO, 2000) and for a more symbolic reading of the words we use dictionaries of symbols (CHEVALIER & GHEERBRANT, 1994; LEXIKON, 1990; CIRLOT, 1979). When comparing the chosen excerpts for analysis we could verify the use of the following correspondents in English: Olhos → Eyes, Coração → Heart, Amor → Love and Medo → Afraid/... (Complete abstract click electronic access below) / Mestre Traduções. Linguística de corpus. Lexicografia. Corpora (Linguistics)
10	The use of the general nouns people and thing by L2 learners of English : A corpus-based study Gerdin, Göran January 2006 (has links) <p>With the advent of corpora documenting learner English, a new and interesting field of research has become available. Learner corpora provide a new type of data which can inform thinking both in second language acquisition research and in foreign language teaching research. Analyses of learner corpora normally report on features which are typically ‘overused’ and ‘underused’, when contrasted to comparable native speaker corpora, in addition to those which are ‘misused’ by the learners. Ringbom (1998) conducted a study in which he identified one common aspect of non-native speaker corpora: the high frequency of general nouns, such as people and thing.</p><p>The aim of this paper was to test Ringbom’s findings and attempt to identify how English as a second language learners’ usage of these particular nouns in written production differ from that of native speakers by conducting a corpus comparison of comparable learner and native speaker corpora. The results of this study clearly support Ringbom’s findings; additionally, it was found that the learners’ written production does not appear vaguer and ‘non-native like’ merely because they overuse the general nouns people and thing, but it also seems as if the learners use these nouns in a more restricted range of meanings whereas the natives’ usage is more diversified. Moreover, this study has identified some of the issues that teachers of English as a second language should be aware of when helping their students to avoid using the general nouns people and thing in a non-native like manner.</p> Learner corpora ICLE general nouns Linguistics Lingvistik

Search results