Global ETD Search

231	Dimensões de variação em manuais aeronáuticos: um estudo baseado na análise multidimensional Zuppardo, Maria Carolina 24 April 2014 (has links) Made available in DSpace on 2016-04-28T18:22:51Z (GMT). No. of bitstreams: 1 Maria Carolina Zuppardo.pdf: 3838765 bytes, checksum: 213462ae9cb7b14eb4b3210f9447f9c2 (MD5) Previous issue date: 2014-04-24 / Conselho Nacional de Desenvolvimento Científico e Tecnológico / This study set out to identify the dimensions of variation that characterize aircraft manuals, apply the 1988 dimensions for English to aircraft manuals, and verify if there is statistically significant variation among aircraft manufacturers and models as well as types of manuals in terms of the dimensions of variation of aircraft manuals. The main theoretical underpinning for the research is provided by Corpus Linguistics, an area of Applied Linguistics based on the view of language as a probability system whose studies of language or language varieties involve the use of computational and interactive tools applied to large collections of texts held in electronic format (corpora). More specifically, the study presented here is informed by Multidimensional Analysis, a corpus-based approach for the study of associations between linguistic features and registers in large amounts of text through the use of statistical procedures in order to identify dimensions of variation. The corpus compiled for the study is comprised of operational and maintenance aircraft manuals of commercial and corporate aircraft, containing 10,009,040 words distributed across 154 texts. On the whole, the findings suggest that, in terms of placement of the corpus in the dimensions of variation for English defined by Biber in 1988, aircraft manuals are highly informational, non-narrative, with explicit references, abstract information, and covert persuasion. In addition, the multidimensional analysis revealed three dimensions of variation for aircraft manuals, namely: (1) Broad system descriptions; (2) Decision-making discourse versus Procedural discourse; e (3) Problem-solving discourse versus Specific parts descriptions. The study presented here may have made an original contribution to the English for Aviation areas in addition to the existing body of corpus-based research in that it determined the dimensions of variation of aircraft manuals and their salient linguistic features. The study also presents and discusses possible limitations and further research, as well suggestions for pedagogical applications of the findings / O trabalho teve como objetivos principais identificar as dimensões de variação próprias da linguagem dos manuais aeronáuticos, identificar onde se situam os manuais de manutenção de aeronaves nas dimensões de variação do inglês e verificar se há variação estatística significativa entre os fabricantes de aeronaves, os modelos de aeronaves e os tipos de manuais aeronáuticos em relação às dimensões de variação dos manuais aeronáuticos. A pesquisa foi embasada nos pressupostos teóricos da Linguística de Corpus, uma área de pesquisa da Linguística Aplicada que se baseia na visão probabilística da linguagem e cujo estudo de uma língua ou variedade linguística envolve o uso de ferramentas computacionais e interativas aplicadas à grandes coletâneas de textos em formato eletrônico (corpora). Mais especificamente, o estudo se baseia na Análise Multidimensional (AMD), uma metodologia baseada em corpus para o estudo das relações entre características linguísticas e registros em grandes quantidades de textos, por meio de procedimentos estatísticos a fim de identificar dimensões de variação. O corpus empregado na pesquisa é composto de manuais operacionais e de manutenção de aeronaves comerciais e executivas, contendo 10.009.040 palavras distribuídas em 154 textos. A análise da localização do corpus nas dimensões de variação do inglês definidas por Biber em 1988 indicou que manuais aeronáuticos são essencialmente informacionais e não narrativos, contendo referências explícitas, expressão não explícita de persuasão e informações abstratas. Além disso, a análise multidimensional revelou três dimensões próprias dos manuais aeronáuticos que foram denominadas: (1) Ampla descrição de sistemas; (2) Discurso de tomada de decisões versus Discurso de procedimentos; e (3) Discurso de solução de problemas versus Descrição de partes específicas. A pesquisa pretende ter fornecido uma contribuição original para a área de inglês para aviação e para a área de pesquisas baseadas em corpus ao identificar as dimensões próprias dos manuais aeronáuticos e suas características linguísticas mais salientes. O trabalho ainda apresenta e discute as limitações da pesquisa e possíveis pesquisas para o futuro, além de possíveis sugestões de aplicações pedagógicas dos resultados da pesquisa Manuais aeronáuticos Linguística de corpus Análise multidimensional Aircraft manuals Corpus linguistics Multidimensional analysis
232	Good cop, bad cop? : A corpus analysis on the semantic prosody of the noun cop / Good cop, bad cop? : En korpus analys om den semantisk prosodiska förändringen hos substantivet cop Lund, Simon January 2018 (has links) The public opinion on the favourability towards the police varies greatly in the different populations in the United States. This is a corpus linguistic study that investigates a possible change in semantic prosody of the word cop. The study also investigates the distribution of the keyword in the different subcorpora to see if it has driven an overall change in the semantic prosody. The source is the Corpus of Historical American English, dating from 1800 to 2009. The Corpus of Historical American English is divided into four subcategories that have a median of 51% fictional material throughout the corpus. The four subcategories in the corpus are fiction, news, popular magazines, and non-fictional books. The data are divided into two categories, the first one being the positive/neutral category and the second category being the negative category. Neutral/positive is when cop is used with neutral or positive connotations and the negative is when cop is used with negative connotations. The period studied is that of 1859 to 2009 and this time span is divided into four periods to be more manageable. The distribution of the word in the subcorpora shows that the cop is used mostly in fictional material. The results show that the use of cop in contextual positive or neutral situations increase during the time span. To further this matter additional studies on cop in other corpora from America and other parts of the speaking world would benefit the knowledge on the noun’s semantic prosody and to further knowledge in the public opinion on the police. / Befolkningens attityd i de Förenta staterna varierar kraftigt mellan olika grupperingarna inom landet. Detta är en korpuslingvistisk studie som undersöker en möjlig semantisk prosodisk förändring hos ordet cop. Studien utgår ifrån Corpus of Historical American English som dateras mellan 1800 – 2009. Den semantiska prosodin undersöks genom att dela in träffarna i två kategorier, användning i neutral/positiv semantisk prosodi och användning i negativ semantisk prosodi. Studien undersöker även distributionen av ordet i korpusens underkorpus för att se om distributionen kan ha en koppling till semantisk prosodi. Underkorpusen är skönlitteratur, nyheter, populärmagasin och icke skönlitterära böcker. Tidsperioden som undersöks är 1859 – 2009 och denna period delas in i fyra mindre, mer hanterbara perioder. Resultatet visar att cop använts i en majoritet av skönlitterära texter i korpusen. Undersökningen visar att det finns en majoritet av negativ semantisk prosodisk användning under period 1 och 2. Period 3 skiftar lite mellan kategorierna men från 1940 hamnar cop i neutrala/positiv majoritet och Period 4 har en stark positiv semantisk prosodisk trend. Studien visar att cop har genomgått en förändring i ordets semantiska prosodi från negativ till neutral/positiv. För att utöka kunskapen kring detta ämne skulle fler undersökningar i fler korpus från de Förenta staterna och resten av den engelsktalande världen behöva göras. Detta skulle öka kunskapen kring ordets semantiska prosodi och kunna ge viss insikt i den offentliga attityden gentemot polisväsendet. Corpus linguistics semantic prosody cop corpus Korpuslingvistik semantisk prosodi cop korpus General Language Studies and Linguistics
233	Discours d'entreprise et organisation de l'information : apports de la textométrie dans la construction de référentiels terminologiques adaptables au contexte / Corporate discourses and information organization : Contribution of the textual statistics to the construction of terminological thesaurus adaptable to the context Erlos, Frédéric 16 January 2009 (has links) L'organisation de l'information sur un intranet (réseau informatique interne d’une organisation fonctionnant avec les technologies d'Internet) nécessite de nouvelles approches pour traiter la question de l'adéquation entre l’arborescence des sites et les usages linguistiques de leurs publics. Une façon de prendre en compte ces usages consiste à explorer les données textuelles représentatives d'une situation de communication spécifique. Une telle exploration est effectuée à l’aide de techniques textométriques, comme l'index hiérarchique des formes, les concordances, les segments répétés, la carte des sections d’un texte, le calcul des co-occurrences et l'analyse factorielle des correspondances. On extrait alors d’un corpus de textes de communication d’entreprise (rapports d’activité) les unités lexicales destinées à la construction d'un référentiel terminologique d’un type particulier. Afin de prendre en compte le contexte de communication on propose d’utiliser trois sortes de repères : - le référentiel d’objets propre à une organisation, - les propriétés pragmatiques des noms propres, - la collecte d’une partie du vocabulaire caractéristique du corpus utilisé comme source du référentiel terminologique, réalisée à partir d’une sélection de noms propres. Ainsi, cette collecte ne se limite pas aux seules unités terminologiques : elle comprend également des mots relevant de la langue commune et des noms propres. Les unités appartenant au vocabulaire du corpus sont choisies en fonction du type de relations sémantiques établies avec les noms propres dans les discours. Enfin, les résultats obtenus sont évalués en termes de productivité, de fiabilité et de représentativité. / Information organization on an intranet (internal network of an organization, using technologies of Internet) needs new approaches handling the question of the adequacy between the structures of intranet sites and the language use of their visitors. A way to take into account these usages is to explore textual data which are representative of a specific situation of communication. Such an exploration is carried out with textual statistics tools, like hierarchical index, concordance, repeated segment, textual map, co-occurrence and cluster analysis. This corpus-based approach allows us to extract linguistic units belonging, for example, to texts of corporate communication (annual reports). Recognition and storage of such lexical data aim at the construction of a terminological thesaurus of a peculiar type. We suggest taking into account the context of communication by using three sorts of marks : - the particular ontology of an organization such as it is evoked in discourses, - the pragmatic properties of the proper names, - a selection of proper names allows gathering a part of the characteristic vocabulary of the corpus used as source for the terminological thesaurus. This collection does not thus limit itself to the only terminological units, but also contains words of the common language and proper names. Elements belonging to the vocabulary of the corpus are selected according to the type of semantic relations established with the proper names in the texts. Finally, the results are assessed in terms of productivity, reliability and representativeness. Intranet Organisation de l’information Linguistique de corpus Textométrie Terminologie Nom propre Intranet Information organization Corpus linguistics Textual statistics Terminology Proper name
234	The Israeli-Palestinian Conflict in American, Arab, and British Media: Corpus-Based Critical Discourse Analysis Kandil, Magdi Ahmed 27 May 2009 (has links) The Israeli-Palestinian conflict is one of the longest and most violent conflicts in modern history. The language used to represent this important conflict in the media is frequently commented on by scholars and political commentators (e.g., Ackerman, 2001; Fisk, 2001; Mearsheimer & Walt, 2007). To date, however, few studies in the field of applied linguistics have attempted a thorough investigation of the language used to represent the conflict in influential media outlets using systematic methods of linguistic analysis. The current study aims to partially bridge this gap by combining methods and analytical frameworks from Critical Discourse Analysis (CDA) and Corpus Linguistics (CL) to analyze the discursive representation of the Israeli-Palestinian conflict in American, Arab, and British media, represented by CNN, Al-Jazeera Arabic, and BBC respectively. CDA, which is primarily interested in studying how power and ideology are enacted and resisted in the use of language in social and political contexts, has been frequently criticized mainly for the arbitrary selection of a small number of texts or text fragments to be analyzed. In order to strengthen CDA analysis, Stubbs (1997) suggested that CDA analysts should utilize techniques from CL, which employs computational approaches to perform quantitative and qualitative analysis of actual patterns of use occurring in a large and principled collection of natural texts. In this study, the corpus-based keyword technique is initially used to identify the topics that tend to be emphasized, downplayed, and/or left out in the coverage of the Israeli-Palestinian conflict in three corpora complied from the news websites of Al-Jazeera, CNN, and the BBC. Topics –such as terrorism, occupation, settlements, and the recent Israeli disengagement plan—which were found to be key in the coverage of the conflict—are further studied in context using several other corpus tools, especially the concordancer and the collocation finder. The analysis reveals some of the strategies employed by each news website to control for the positive or negative representations of the different actors involved in the conflict. The corpus findings are interpreted using some informative CDA frameworks, especially Van Dijk’s (1998) ideological square framework. Settlements Collocation Corpus linguistics Critical discourse analysis Israeli-Palestinian conflict Terrorism Keyword analysis Concordance Applied Linguistics First and Second Language Acquisition
235	Lietuvių kalbos samplaikos / Multi-word lexemes in the Lithuanian language Kovalevskaitė, Jolanta 12 April 2012 (has links) Darbo objektas yra lietuvių kalbos samplaikos, apibrėžiamos kaip dvižodžiai ar ilgesni iš kaitomų ir nekaitomų žodžių sudaryti stabilieji junginiai, sudarantys vientisos reikšmės leksinį vienetą, kuris dažniausiai vartojamas nesavarankiškos (tarnybinės) kalbos dalies funkcija. Disertacijos tyrimo tikslas – ištirti lietuvių kalbos samplaikų, kaip leksinio vieneto, pasižyminčio formos ir turinio stabilumu, autonomiškumą. Darbo šaltiniai: neanotuotas Dabartinės lietuvių kalbos tekstynas, morfologiškai anotuotas lietuvių kalbos tekstynas ir lygiagretusis vokiečių–lietuvių kalbų tekstynas. Darbo metodai: aprašomasis metodas, tekstynų lingvistikos metodas, statistiniai metodai, gretinamasis metodas. Ginamieji teiginiai: 1. Remiantis išplėstąja frazeologijos samprata, samplaikos yra sustabarėjusių kalbos vienetų tipas, laikomas frazeologijos objektu nuo tada, kai tekstynų analize įrodytas šių junginių dažnumas ir vartojimo pastovumas. 2. Samplaikų stabilumas yra nevienodas. Samplaikų dėmenų traukos įverčio ir morfologinės paradigmos nuokrypio tyrimas rodo, kad samplaikų stabilumo laipsnį lemia samplaikų sandara. 3. Samplaikų kontekstui būdingas stabilumas arba kintamumas. Stabilesnių samplaikų kontekstas kintamas, todėl jos yra autonomiškesnės. Mažesniu stabilumu pasižyminčios samplaikos, kurių kontekstas labiau apibrėžtas, yra ne tokios autonomiškos. 4. Autonomiškesnės samplaikos labiau linkusios būti vertimo vienetais nei mažiau autonomiškos. Kuo samplaika autonomiškesnė, tuo... [toliau žr. visą tekstą] / The object of the study is multi-word lexemes (samplaikos in Lithuanian), defined as combinations composed of two or more inflective or non-inflective parts of speech, grammatically and semantically perceived as one unit. The goal of the dissertation is to investigate the autonomy of multi-word lexemes in the Lithuanian language. Two monolingual corpora (the non-annotated Corpus of the Contemporary Lithuanian Language and the morphologically annotated Lithuanian language corpus) and the parallel German-Lithuanian corpus have been used for the extraction and the analysis of data. Several research methods have been applied: descriptive, corpus-based, statistical, and contrastive. The statements to be defended are as follows: 1. According to the broad conception of phraseology, multi-word lexemes are a subtype of multi-word units. They are considered to be an object of phraseology, since their frequency and fixedness have been confirmed by corpus analysis. 2. There are variations in the degree of fixedness of multi-word lexemes. The analysis of collocation strength between the elements of multi-word lexemes and of deviations in morphological paradigm indicates that the degree of fixedness of multi-word lexemes is largely determined by their composition. 3. The context of multi-word lexemes is characterized by stability or variability. The context of more stable multi-word lexemes is variable, which determines their greater autonomy. Less stable multi-word lexemes that occur in... [to full text] Philology Stabilieji junginiai Frazeologija Leksinis vienetas Vertimo vienetas Tekstynų lingvistika Multi-word units Phraseology Lexical item Translation unit Corpus linguistics
236	Multi-word lexemes in the Lithuanian Language / Lietuvių kalbos samplaikos Kovalevskaitė, Jolanta 12 April 2012 (has links) The object of the study is multi-word lexemes (samplaikos in Lithuanian), defined as combinations composed of two or more inflective or non-inflective parts of speech, grammatically and semantically perceived as one unit. The goal of the dissertation is to investigate the autonomy of multi-word lexemes in the Lithuanian language. Two monolingual corpora (the non-annotated Corpus of the Contemporary Lithuanian Language and the morphologically annotated Lithuanian language corpus) and the parallel German-Lithuanian corpus have been used for the extraction and the analysis of data. Several research methods have been applied: descriptive, corpus-based, statistical, and contrastive. The statements to be defended are as follows: 1. According to the broad conception of phraseology, multi-word lexemes are a subtype of multi-word units. They are considered to be an object of phraseology, since their frequency and fixedness have been confirmed by corpus analysis. 2. There are variations in the degree of fixedness of multi-word lexemes. The analysis of collocation strength between the elements of multi-word lexemes and of deviations in morphological paradigm indicates that the degree of fixedness of multi-word lexemes is largely determined by their composition. 3. The context of multi-word lexemes is characterized by stability or variability. The context of more stable multi-word lexemes is variable, which determines their greater autonomy. Less stable multi-word lexemes that occur in... [to full text] / Darbo objektas yra lietuvių kalbos samplaikos, apibrėžiamos kaip dvižodžiai ar ilgesni iš kaitomų ir nekaitomų žodžių sudaryti stabilieji junginiai, sudarantys vientisos reikšmės leksinį vienetą, kuris dažniausiai vartojamas nesavarankiškos (tarnybinės) kalbos dalies funkcija. Disertacijos tyrimo tikslas – ištirti lietuvių kalbos samplaikų, kaip leksinio vieneto, pasižyminčio formos ir turinio stabilumu, autonomiškumą. Darbo šaltiniai: neanotuotas Dabartinės lietuvių kalbos tekstynas, morfologiškai anotuotas lietuvių kalbos tekstynas ir lygiagretusis vokiečių–lietuvių kalbų tekstynas. Darbo metodai: aprašomasis metodas, tekstynų lingvistikos metodas, statistiniai metodai, gretinamasis metodas. Ginamieji teiginiai: 1. Remiantis išplėstąja frazeologijos samprata, samplaikos yra sustabarėjusių kalbos vienetų tipas, laikomas frazeologijos objektu nuo tada, kai tekstynų analize įrodytas šių junginių dažnumas ir vartojimo pastovumas. 2. Samplaikų stabilumas yra nevienodas. Samplaikų dėmenų traukos įverčio ir morfologinės paradigmos nuokrypio tyrimas rodo, kad samplaikų stabilumo laipsnį lemia samplaikų sandara. 3. Samplaikų kontekstui būdingas stabilumas arba kintamumas. Stabilesnių samplaikų kontekstas kintamas, todėl jos yra autonomiškesnės. Mažesniu stabilumu pasižyminčios samplaikos, kurių kontekstas labiau apibrėžtas, yra ne tokios autonomiškos. 4. Autonomiškesnės samplaikos labiau linkusios būti vertimo vienetais nei mažiau autonomiškos. Kuo samplaika autonomiškesnė, tuo... [toliau žr. visą tekstą] Philology Multi-word units Phraseology Lexical item Translation unit Corpus linguistics Stabilieji junginiai Frazeologija Leksinis vienetas Vertimo vienetas Tekstynų lingvistika
237	PragmaSUM: novos m?todos na utiliza??o de palavras-chave na sumariza??o autom?tica Rocha, Valdir J?nior Cordeiro 05 December 2017 (has links) Submitted by Jos? Henrique Henrique (jose.neves@ufvjm.edu.br) on 2018-05-03T18:35:26Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) valdir_junior_cordeiro_rocha.pdf: 3757934 bytes, checksum: 00a2e6ee18188436daa1415ec6a05021 (MD5) / Approved for entry into archive by Rodrigo Martins Cruz (rodrigo.cruz@ufvjm.edu.br) on 2018-05-04T16:22:37Z (GMT) No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) valdir_junior_cordeiro_rocha.pdf: 3757934 bytes, checksum: 00a2e6ee18188436daa1415ec6a05021 (MD5) / Made available in DSpace on 2018-05-04T16:22:37Z (GMT). No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) valdir_junior_cordeiro_rocha.pdf: 3757934 bytes, checksum: 00a2e6ee18188436daa1415ec6a05021 (MD5) Previous issue date: 2017 / Com a amplia??o do acesso ? internet e a cria??o de ferramentas que possibilitam pessoas a criarem conte?do, a informa??o dispon?vel cresce de forma acelerada. Textos sobre os mais diversos assuntos e autores s?o criados todos os dias. ? imposs?vel absorver a quantidade de informa??o dispon?vel, o que dificulta a escolha da mais adequada para determinado interesse ou p?blico. A sumariza??o autom?tica de textos, al?m de apresentar um texto de forma condensada, pode simplifica-lo, gerando uma alternativa para ganho de tempo e amplia??o do acesso a informa??o contida aos mais diferentes tipos de leitores. Os sumarizadores autom?ticos existentes atualmente na literatura n?o apresentam m?todos de personifica??o dos sum?rios para cada tipo de leitor, e consequentemente geram resultados pouco precisos. Este trabalho tem como objetivo utilizar o sumarizador autom?tico de textos PragmaSUM em textos educacionais com novas t?cnicas de sumariza??o utilizando palavras-chave. A utiliza??o de m?todos de personifica??o do sum?rio com palavras-chave visa aumentar a precis?o e melhorar o desempenho do PragmaSUM e seus sum?rios. Para isto, um corpus formado apenas por artigos cient?ficos da ?rea educacional foi criado para realiza??o de testes e compara??es entre diferentes sumarizadores e m?todos de sumariza??o. O desempenho dos sumarizadores foi medido pelas m?tricas Recall, Precision e F-Measure presentes na ferramenta ROUGE e validados com os testes estat?sticos ANOVA de Friedman e Coeficiente de Concord?ncia de Kendall. Os resultados obtidos apontam uma melhora no desempenho com a utiliza??o de palavras-chave na sumariza??o com o PragmaSUM, indicando a import?ncia na escolha adequada destas palavras-chave para classifica??o do conte?do do texto fonte. / Disserta??o (Mestrado Profissional) ? Programa de P?s-Gradua??o em Educa??o, Universidade Federal dos Vales do Jequitinhonha e Mucuri, 2017. / By expanding access to the internet and creating tools that enable people to create content, available information grows rapidly. Texts on the most diverse subjects and authors are created every day. It is impossible to absorb the amount of information available, which makes it difficult to choose the most appropriate for a particular interest or public. Automatic text summarization, as well as presenting a condensed text, can simplify it, generating an alternative to gain time and increase the access to information contained to the most different types of readers. The automatic summarizers that currently exist in the literature do not present methods of personification of the summaries for each type of reader, and consequently generate results inaccurate. This work aims to use the PragmaSUM automatic text summarizer in educational texts with new summarization techniques using keywords. Using summary keywords impersonation methods is intended to increase accuracy and improve the performance of PragmaSUM and its summaries. For this, a corpus formed only by scientific articles of the educational area was created to carry out tests and comparisons between different summarizers and summarization methods. The performance of the summarizers was measured by the Recall, Precision and F-Measure metrics present in the ROUGE tool and validated with the Friedman ANOVA statistical tests and Kendall's coefficient of agreement. The results obtained indicate an improvement in the performance with the use of keywords in the summarization with PragmaSUM, pointing out importance in the appropriate choice of these keywords for classification of the content of the source text. PragmaSUM Sumariza??o autom?tica de textos ROUGE Corpus Linguistics Lingu?stica computacional Lingu?stica de corpus Automatic summarization of texts Computational linguistics
238	A generic and open framework for multiword expressions treatment : from acquisition to applications Ramisch, Carlos Eduardo January 2012 (has links) The treatment of multiword expressions (MWEs), like take off, bus stop and big deal, is a challenge for NLP applications. This kind of linguistic construction is not only arbitrary but also much more frequent than one would initially guess. This thesis investigates the behaviour of MWEs across different languages, domains and construction types, proposing and evaluating an integrated methodological framework for their acquisition. There have been many theoretical proposals to define, characterise and classify MWEs. We adopt generic definition stating that MWEs are word combinations which must be treated as a unit at some level of linguistic processing. They present a variable degree of institutionalisation, arbitrariness, heterogeneity and limited syntactic and semantic variability. There has been much research on automatic MWE acquisition in the recent decades, and the state of the art covers a large number of techniques and languages. Other tasks involving MWEs, namely disambiguation, interpretation, representation and applications, have received less emphasis in the field. The first main contribution of this thesis is the proposal of an original methodological framework for automatic MWE acquisition from monolingual corpora. This framework is generic, language independent, integrated and contains a freely available implementation, the mwetoolkit. It is composed of independent modules which may themselves use multiple techniques to solve a specific sub-task in MWE acquisition. The evaluation of MWE acquisition is modelled using four independent axes. We underline that the evaluation results depend on parameters of the acquisition context, e.g., nature and size of corpora, language and type of MWE, analysis depth, and existing resources. The second main contribution of this thesis is the application-oriented evaluation of our methodology proposal in two applications: computer-assisted lexicography and statistical machine translation. For the former, we evaluate the usefulness of automatic MWE acquisition with the mwetoolkit for creating three lexicons: Greek nominal expressions, Portuguese complex predicates and Portuguese sentiment expressions. For the latter, we test several integration strategies in order to improve the treatment given to English phrasal verbs when translated by a standard statistical MT system into Portuguese. Both applications can benefit from automatic MWE acquisition, as the expressions acquired automatically from corpora can both speed up and improve the quality of the results. The promising results of previous and ongoing experiments encourage further investigation about the optimal way to integrate MWE treatment into other applications. Thus, we conclude the thesis with an overview of the past, ongoing and future work. Linguagem natural Linguística computacional Natural language processing Computational linguistics Multiword expressions Lexical acquisition Machine translation Lexicography Corpus linguistics
239	Bases teórico-metodológicas para elaboração de um glossário bilíngue (português-inglês) de treinamento de força : subsídios para o tradutor Dornelles, Márcia dos Santos January 2015 (has links) O terminógrafo, ao elaborar um produto terminográfico bilíngue para tradutores, deve preocupar-se não só em repertoriar, nas duas línguas, os termos próprios de uma (sub)área do conhecimento, mas também em apresentá-los inseridos em suas combinatórias típicas, ou seja, associados aos elementos que a eles se combinam em nível sintagmático, de forma recorrente nos textos daquela especialidade. Isso porque o tradutor precisa produzir um texto de chegada adequado ao padrão de linguagem em foco, de forma a espelhar o modus dicendi daquele campo. Assim, seu texto soará natural à comunidade de leitores, evitando-se ruídos na comunicação. Diante da falta de produtos terminográficos bilíngues sobre Treinamento de Força (TF), dirigido a tradutores, esta investigação tem como objetivo central apresentar bases teórico-metodológicas para a elaboração de um glossário português-inglês da terminologia do TF. Esse glossário é aqui apresentado como um protótipo, uma amostra de um todo, destinado a auxiliar especialmente tradutores brasileiros que trabalhem na direção português→inglês, mas que pode ser aproveitado também por pesquisadores e estudantes dessa temática que precisem produzir artigos científicos em inglês. Ele inclui guia do usuário, uma árvore de domínio em português do TF, lista de termos em português e 30 exemplares de fichas terminológicas em formato estendido. Outro objetivo do estudo é oferecer uma descrição do comportamento dos termos em português e inglês, e das unidades fraseológicas especializadas (UFE) eventivas (BEVILACQUA, 2003; 2004) em português no âmbito dos artigos científicos sobre TF. Como referencial teórico, valemo-nos dos princípios da Teoria Comunicativa da Terminologia (TCT) e dos fundamentos e diretrizes da Linguística de Corpus (LC). Seguir a TCT (CABRÉ, 1999a; 1999b; 2001a; 2001b; 2003; 2009) implica adotar o termo como objeto central de estudo e concebê-lo, antes de tudo, como uma unidade lexical da língua natural que adquire valor especializado dentro de um contexto especializado, segundo critérios semânticos, discursivos e pragmáticos. Seguir a LC (BIBER, 2012; BERBER SARDINHA, 2004) implica uma visão probabilística da língua, pressupondo que, embora muitos traços linguísticos sejam possíveis teoricamente, não ocorrem com a mesma frequência. Ganham realce no estudo os temas da variação terminológica, da tradução funcional e do artigo científico como gênero especializado. Nosso corpus de estudo é constituído de 70 artigos de periódicos científicos de destaque no âmbito do TF, escritos originalmente em português e inglês. São, portanto, dois subcorpora, um em cada língua, que são comparáveis. Para exploração e análise do corpus, utilizamos o software AntConc (ANTHONY, 2011), especialmente as funcionalidades keyword list, n-grams e concordance. Como material de apoio, utilizamos livros-texto e artigos científicos de referência sobre TF, um glossário particular pré-existente de Educação Física, a Terminologia Anatômica Internacional, o Google Acadêmico, o Wikipédia, entre outros. Também contamos com a colaboração de dois consultores especialistas em TF. A pesquisa contempla, então, uma parte teórica e uma parte aplicada que se inter-relacionam e se inserem na dupla face da Terminologia, visto que há uma descrição de uma linguagem especializada a partir de um dado ponto de vista teórico e o desenho de um produto concreto. / When designing a bilingual terminographic product for translators, a terminographer must be concerned not only with including, in both languages, the specific terms of a (sub)field of knowledge, but also with presenting these terms within their typical phraseological structures, that is, associated with the elements they combine with syntagmatically and recurrently in the texts of that domain. This is because a translator needs to produce a target text appropriate to the language pattern in focus, so as to reflect the modus dicendi of that specialized field. In this way, the text produced will sound much more natural to the community of readers, thereby avoiding noise in communication. Given the lack of bilingual terminographic products on Strength Training (ST), addressed to translators, the main purpose of this research study is to provide theoretical and methodological foundations for the development of a Portuguese-English glossary of ST terminology. This glossary is presented here as a prototype – a sample of a whole – especially designed to assist Brazilian translators working in the Portuguese to English direction, but it can also be useful for researchers and students of this subject to produce scientific papers in English. It includes a user guide, a domain tree of ST in Portuguese, a list of terms in Portuguese, and 30 sample terminology records in extended format. Another objective of the study is to provide a description of the behavior of terms in Portuguese and English, and of eventive specialized phraseological units (BEVILACQUA, 2003; 2004) in Portuguese on ST scientific articles. As theoretical framework, we based on the principles of the Communicative Theory of Terminology (CTT) and on the foundations and guidelines of Corpus Linguistics (CL). Following CTT (CABRÉ, 1999a; 1999b; 2001a; 2001b; 2003; 2009) implies adopting the term as the central object of study and conceiving it, first of all, as a lexical unit of natural language that acquires specialized value within a specialized context, according to semantic, discursive and pragmatic criteria. Following CL (BIBER, 2012; BERBER SARDINHA, 2004) implies a probabilistic viewpoint of language, assuming that, although many linguistic features are possible theoretically, they do not occur with the same frequency. The topics of terminological variation, functional approach to translation, and the scientific article as a specialized genre are also highlighted in the study. Our corpus consists of 70 articles from leading scientific journals on ST, originally written in Portuguese and English. They are two comparable subcorpora, one in each language. For the exploration and analysis of the corpus, we used the AntConc software (ANTHONY, 2011), especially the tools keyword list, n-grams and concordance. As support material, we used textbooks and reference scientific papers on ST, a pre-existing personal glossary of Physical Education, the International Anatomical Terminology, Google Scholar, Wikipedia, among others. We also had the collaboration of two expert consultants in ST. Therefore, the research embraces a theoretical part and an applied part that interrelate and fall into the double face of Terminology, since there is a description of a specialized language from a given theoretical point of view and the design of a concrete product. Terminologia Lingüística de corpus Terminografia Glossário Treinamento de força Communicative theory of terminology Corpus linguistics Terminography Bilingual glossary Strength training
240	Asymetrie větných segmentů při překladu z japonštiny do češtiny / Asymmetry of Sentence Segments in Japanese to Czech translations Jirkal, Martin January 2018 (has links) It is evident from the data included in the Czech-Japanese Parallel Corpus that apparent qualitative shifts between corresponding sentence segments in source and target languages appear due to the process of translation from Japanese. My goal then is the analysis of this asymmetry of sentences in translations from Japanese to Czech and evaluation of its causes and effects. This issue is viewed through the theory of translation universals (explicitation, implicitation, normalization, simplification). However, it is also concerned with the theory of information density, although its application has during the research appeared at least problematic. The theoretical outlook of translatology on these theories and the detailed process of sample selection is discussed in the introduction of the thesis. The results of the analysis of asymmetric sentences are discussed in the central part of the thesis, which is mainly concerned with the summary of language features and situations creating this asymmetry but also with the question which general trends can be considered to exist in Japanese-Czech translation based on this summary. Finally, the distribution of asymmetric segments in six analysed translation is studied as well as the potential influence of translators on their creation. Keywords: Japanese,...

Search results