21 |
Cafés do Brasil: estudo de variantes em português e inglês na língua falada / Brazilian Coffee: Portuguese and English Variants in the spoken languageGinezi, Luciana Latarini 28 January 2008 (has links)
O objetivo deste trabalho é analisar a ocorrência de variantes terminológicas na linguagem de especialidade do Café e verificar a possibilidade de se construir um produto terminológico bilíngüe baseado na oralidade. Inspirado no trabalho profissional de interpretação consecutiva e intermitente, o estudo utiliza corpora falados, uma vez que podemos estabelecer uma relação clara entre a oralidade e a interpretação, modalidade oral da tradução. Todas as dificuldades encontradas na construção dos corpora falados são explicitadas e algumas sugestões são feitas para futuras pesquisas. A pesquisa segue os princípios da Lingüística de Corpus (LC), tanto na elaboração dos corpora como também na análise dos dados, essa com o uso da ferramenta computacional WordSmith Tools, agilizando o processo e dando a ele confiabilidade. O estudo justifica-se pela importância do conhecimento de variantes terminológicas nas línguas de especialidade e na sua modalidade falada, por intérpretes e por profissionais da área e, também, pelas possibilidades oferecidas pela LC para a pesquisa socioterminológica na oralidade. Assim, compilamos dois corpora falados monolíngües, um em português do Brasil e outro em inglês de países diversos, com o tema Café, subdividido em colheita e processamento, composto por entrevistas face-a-face da pesquisadora com profissionais da área cafeeira e por conversações entre profissionais, em ambas as línguas. Também construímos um corpus bilíngüe, composto por interpretações entre falantes de inglês e de português. Em seguida, analisamos os dados dos corpora, buscando encontrar variantes. Ao final do trabalho, elaboramos um vocabulário bilíngüe a partir dos dados coletados e das análises efetuadas. / The aim of this research is to analyze the presence of terminological variants in the specialty language of coffee and to verify the possibility of building a bilingual vocabulary based on spoken language. The study is guided by the consecutive or liaison interpreting and the use of spoken corpora, once we can establish a close relation between spoken language and interpreting, oral mode of translation. Several difficulties were faced in order to build the spoken corpora, and they are presented with some suggestions for future research. The principles of Corpus Linguistics are followed to the corpora design, as well corpora exploration, using Mike Scott\'s WordSmith Tools. The study is significant due to the knowledge of terminological variants in spoken language, by interpreters and professional workers at a specialty area, and also to the possibilities Corpus Linguistics offers to a socioterminological research at spoken variety. Two monolingual spoken corpora were compiled, one in Brazilian Portuguese and the other in English spoken world-wide. The main topic is Coffee, divided in harvest and processing, with face to face interviews as well as conversations among coffee professionals, in both languages. An interpreting corpus is also included in the work, between Portuguese and English speakers. After analysis, we present a bilingual vocabulary of spoken language, including the variants found for most of the terms.
|
22 |
O emprego do presente indicativo em entrevistas com enfoque no passadoFatori, Marcos José [UNESP] 03 1900 (has links) (PDF)
Made available in DSpace on 2014-06-11T19:22:18Z (GMT). No. of bitstreams: 0
Previous issue date: 2006-03Bitstream added on 2014-06-13T19:27:07Z : No. of bitstreams: 1
fatori_mj_me_arafcl.pdf: 377330 bytes, checksum: 173994023adecc343d271ede7a2a8428 (MD5) / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / O presente do indicativo é um dos tempos verbais mais empregados na língua portuguesa falada. Na medida em que é utilizado para expressar tanto o presente como o passado e o futuro, pode-se dizer que se trata do tempo mais versátil de nossa língua. No entanto, há poucos estudos acerca de seu emprego. Tendo em vista tal fato, resolvemos desenvolver esta pesquisa, que teve por objetivo principal analisar, num corpus constituído de textos orais, especificamente de entrevistas com enfoque em história de vida, os valores semânticos assumidos pelo presente do indicativo, bem como verificar a relação que se estabelece entre esse tempo verbal e os tipos de verbo (ação, processo, ação-processo e estado), os argumentos de primeiro grau (agente, paciente, instrumental, causativo, objetivo, locativo, experimentador), que se apresentam na função de sujeito e a pessoa (1ª, 2ª, 3ª) em que este se realiza. Em virtude do alto índice de ocorrência dos pretéritos perfeito e imperfeito do indicativo nas entrevistas, verificamos também a relação estabelecida entre esses dois tempos verbais e os tipos de verbo, de sujeito e de pessoa, com o intuito de podermos comparar os resultados com os do presente do indicativo. Como embasamento teórico para nossa discussão, foram utilizadas as pesquisas de Weinrich (1974) e de Corôa (1985). / The present of the indicative is one of the most useful tenses in spoken Portuguese language. Used to express present, past and future, the present of the indicative is the most versatile tense of our language. However, there are few studies about its use. Because of that, we decided to development this research, in order to analyze, in interviews with focus on past, the semantic values that the present of the indicative assumes. We also checked the relation between this tense and the kind of verb, kind of first-degree argument (that acts as subject) and kind of person used with it. As the past tenses of the indicative were also much used in the interviews, we decided to check the relation between them and the kind of verb kind of first-degree argument and kind of person, in order to establish a comparison with the present of the indicative. As theoretical basis for the analysis, we used the researches to Weinrich (1974) And Corôa (1985).
|
23 |
Vybrané aspekty jazyka rozhlasových pořadů o vaření / Selected aspects of radio programmes on cookingJelínková, Blanka January 2012 (has links)
This thesis discusses selected linguistic aspects of two radio culinary broadcasts (Rozhlas u plotny, Pochoutky). The theoretical part deals with the tradition of spoken language research, its resources and the phenomena studied. The conclusion of this chapter is devoted to the approaches to spoken language focused on coherence and cohesion. For the purposes of analysis, transcripts of above mentioned radio culinary broadcasts were made. Specific transcription rules (based on different existing transcription rules) were created, to suit our type of analysis. In Rozhlas u plotny, the guest is a professional chef, in Pochoutky, the guest is an actor or a presenter. First part of the work briefly describes and compares language of these two types of speakers in the terms of phonetics, morphology, lexicology, syntax and stylistics. Subsequently, the main part of the thesis concentrates on textual syntax and describes cohesion and coherence of spoken dialogue. From this point of view, it focuses on comparison of two types of dialogues (and two types of speakers): predominantly continuous dialogue vs. predominantly interrupted dialogue (e.g. by overlapping talk). Thus, it gives examples representing various ways of retaining cohesion and coherence in continuous dialogue and in interrupted dialogue. Key...
|
24 |
Cafés do Brasil: estudo de variantes em português e inglês na língua falada / Brazilian Coffee: Portuguese and English Variants in the spoken languageLuciana Latarini Ginezi 28 January 2008 (has links)
O objetivo deste trabalho é analisar a ocorrência de variantes terminológicas na linguagem de especialidade do Café e verificar a possibilidade de se construir um produto terminológico bilíngüe baseado na oralidade. Inspirado no trabalho profissional de interpretação consecutiva e intermitente, o estudo utiliza corpora falados, uma vez que podemos estabelecer uma relação clara entre a oralidade e a interpretação, modalidade oral da tradução. Todas as dificuldades encontradas na construção dos corpora falados são explicitadas e algumas sugestões são feitas para futuras pesquisas. A pesquisa segue os princípios da Lingüística de Corpus (LC), tanto na elaboração dos corpora como também na análise dos dados, essa com o uso da ferramenta computacional WordSmith Tools, agilizando o processo e dando a ele confiabilidade. O estudo justifica-se pela importância do conhecimento de variantes terminológicas nas línguas de especialidade e na sua modalidade falada, por intérpretes e por profissionais da área e, também, pelas possibilidades oferecidas pela LC para a pesquisa socioterminológica na oralidade. Assim, compilamos dois corpora falados monolíngües, um em português do Brasil e outro em inglês de países diversos, com o tema Café, subdividido em colheita e processamento, composto por entrevistas face-a-face da pesquisadora com profissionais da área cafeeira e por conversações entre profissionais, em ambas as línguas. Também construímos um corpus bilíngüe, composto por interpretações entre falantes de inglês e de português. Em seguida, analisamos os dados dos corpora, buscando encontrar variantes. Ao final do trabalho, elaboramos um vocabulário bilíngüe a partir dos dados coletados e das análises efetuadas. / The aim of this research is to analyze the presence of terminological variants in the specialty language of coffee and to verify the possibility of building a bilingual vocabulary based on spoken language. The study is guided by the consecutive or liaison interpreting and the use of spoken corpora, once we can establish a close relation between spoken language and interpreting, oral mode of translation. Several difficulties were faced in order to build the spoken corpora, and they are presented with some suggestions for future research. The principles of Corpus Linguistics are followed to the corpora design, as well corpora exploration, using Mike Scott\'s WordSmith Tools. The study is significant due to the knowledge of terminological variants in spoken language, by interpreters and professional workers at a specialty area, and also to the possibilities Corpus Linguistics offers to a socioterminological research at spoken variety. Two monolingual spoken corpora were compiled, one in Brazilian Portuguese and the other in English spoken world-wide. The main topic is Coffee, divided in harvest and processing, with face to face interviews as well as conversations among coffee professionals, in both languages. An interpreting corpus is also included in the work, between Portuguese and English speakers. After analysis, we present a bilingual vocabulary of spoken language, including the variants found for most of the terms.
|
25 |
Linguistic practice on contemporary Jordanian radio : publics and participationFras, Jona Jan January 2018 (has links)
Contemporary studies of media Arabic often pass over issues of media form and the broader relevance of language use. The present thesis addresses these issues directly by examining the language used in Jordanian non-government radio programmes. It examines recordings and transcriptions of a range of programme genres - primarily, morning talk shows and 'service programmes' (barāmiž ḳadamātiyya), and Islamic advice programmes, both of which feature significant audience input via call-ins. The data are examined through an interpretive form of discourse analysis, drawing on linguistic anthropological theory that analyses language as a form of performance, through comparison of radio programmes as 'units of interaction'. This is supported by sociolinguistic data obtained from the recordings, including phoneme frequency analysis, in addition to the author's experience of 6 months of fieldwork in Jordan in 2014-15. The analysis focuses on four major themes: (1) the influence of media context, specifically the sonic exclusivity and temporal evanescence of radio, on language use, as well as the impact of digital media; (2) the indexicality of certain locally salient sociolinguistic variables, and the use to which they are put in radio talk; (3) the role of language in constructing the identity, or persona, of broadcasters; and (4) the role of language in constructing and validating authoritative discourse, in particular that of Islamic texts and scripture in religious programming. Through its analysis of these themes, using selected recording excerpts as demonstrative case studies, this thesis shows that specific strategies of Arabic use in the radio setting crucially affect both the publics - the addressed audiences - of radio talk, as well as the frameworks of participation in this talk - how and to what extent broadcasters and members of the public can participate in mediated discourse. The results demonstrate the unique value of an interpretive study of linguistic performance for highlighting broader social issues, including the inclusion and exclusion of particular segments of the society through linguistic strategies - Jordanians versus non-Jordanians, Ammanis versus non-Ammanis, and pious Muslims versus non-believers; and the use of language to reassert, or occasionally challenge, dominant ideologies and discourses, such as those of gender, nationalism, and religion. This study thus contributes an examination of contemporary Jordanian non-government radio language in its social and political context - something which has not been attempted before, and which provides important insights regarding both the nature of contemporary Arabic media language and its broader social and cultural import.
|
26 |
Le reflet de la langue parlée dans la presse écrite française et allemande / The reflection of spoken language in French and German print mediaFriedl, Isabelle 28 November 2009 (has links)
Ce travail s’est donné comme but d’analyser un corpus de presse écrite constitué de sept titres allemands et de sept titres français datant de 2006, en vue de répertorier tous les phénomènes de langue de conception parlée [terminologie Koch-Oesterreicher 1985 et 1990, aussi: oralité] à l’intérieur de ce corpus et ce afin d’observer le degré de perméabilité des différents titres vis-à-vis de ces phénomènes et les normes scripturales journalistiques en vigueur dans les deux pays. Pour ce faire, il a été dressé un inventaire de catégories permettant de passer au peigne fin le corpus pour en recueillir, dans une banque de données, les phrases-tokens présentant au moins un trait de langue de CONCEPTION parlée- d‘oralité. Ces catégories servant de filtre ont été arrêtées suite à l‘élaboration, dans la première partie du travail, d‘une liste contenant les caractéristiques observables dans les langue! s orales, de conception parlée, des deux pays. Les résultats sont interprétés dans une optique tridimensionnelle: celle du cadre énonciatif, celle du contrat de communication et celle de l‘oralité fictive. Une analyse plus détaillée de chaque catégorie se trouve par ailleurs en annexes [annexes n° 1, tome II]. Il s‘avère alors que les magazines pour jeunes sont très réceptifs en matière de phénomènes d‘oralité et que la langue de la presse allemande y est plus ouverte que son correspondant français / This paper has aimed to analyze a corpus of print media made of seven German and seven French 2006 newspapers and magazines in order to make an inventory of all those phenomena of spoken language [terminology by Koch-Oesterreicher 1985 and 1990, also: oral language] inside it so as to look at how pervious the different titles are vis-à-vis these phenomena and so as to judge about the journalistic norms currently ruling in both countries. To do so, the author has elaborated an inventory of different categories allowing to comb the corpus to gather those sentences [tokens] into a data base which presented at least one item of spoken language. The aforesaid categories working as filters have been acquired as a result of the elaboration, in the first part of this paper, of a list of caracteristics observable in the spoken languages of the two countries
|
27 |
Incremental Parsing with Adjoining OperationMATSUBARA, Shigeki, KATO, Yoshihide 01 December 2009 (has links)
No description available.
|
28 |
Construction of linefeed insertion rules for lecture transcript and their evaluationMatsubara, Shigeki, Ohno, Tomohiro, Murata, Masaki January 2010 (has links)
No description available.
|
29 |
Inserção de estrangeirismos procedentes da tecnologia da informação na língua materna: significados que fluem uma abordagem fenomenológicaOliveira, Nádia Fátima de 30 June 2005 (has links)
Made available in DSpace on 2016-12-12T20:34:12Z (GMT). No. of bitstreams: 1
Nadia Resumo.pdf: 41976 bytes, checksum: 19c1581c82139acb6deaa859b1e03428 (MD5)
Previous issue date: 2005-06-30 / Conselho Nacional de Desenvolvimento Científico e Tecnológico / The theme of this investigation The Insertion of Foreign Words from the Information Technology in the Mother Tongue originated from the work developed with the students of the Technology and Engineering Bachelor courses at Sociedade Educacional de Santa Catarina SOCIESC in Joinville, Santa Catarina. Most of the students who opt to take a college course at this institution come from mechanical or metallurgical industries and present two main characteristics: a) they have been for some time working in their area of study and are searching for updating or upgrading in their profession, and; b) they have been working for a short time or are entering the job market now, but trying specialization in some area. The former group enters the collegial environment bringing a vocabulary which has lots of technical words and expressions; the latter group only comes into contact with such vocabulary and expressions when starting their tertiary level courses. What is important to note is that, over the last three decades, the technological development has gone so far that it has made computing really popular, changing the linguistic habits of those who work in practically automatized industries, as well as in institutions which prepare people for the job market. In this same line of thought, in the family environment, where computer technology and the informational web are present, the computing language is bringing about cultural changes and, as can easily be seen, these changes will have reflections in the native language. This research, using a phenomenological approach, has as its main aim to comprehend and interpret the way the speakers use these foreign words and expressions that are produced by the technological advances which are, in turn, inserted in the mother tongue. The participants in this research were chosen deliberately six professors and three college students and made up a list of nine interviewees who spontaneously contributed to unveil these two essences: As a natural expression, foreign words are used and influence the mother tongue; and Foreign words come up in the man/world relationship. These essences correlate to each other reciprocally and originated from these dimensions: computer technology as a promising work field; the simple and the natural in the computer technology language; globalization forces the use of foreign words; foreign words insert themselves in the written language; and the teaching activity, in middle technical schools, without appropriate teachers qualification. Reflecting upon the oral language of those interviewed, I shall go on, committed to the work that aims at linguistic education. / O tema/problema desta investigação - Inserção de Estrangeirismos Procedentes da Tecnologia da Informação na Língua Materna - fluiu a partir do trabalho que desenvolvo com alunos dos cursos de tecnologia e do bacharelado em Engenharia, na SOCIEDADE EDUCACIONAL DE SANTA CATARINA - SOCIESC - em Joinville. A maioria das pessoas que optam por fazer o terceiro grau nessa instituição, provém de indústrias mecânicas ou metalúrgicas e apresentam duas características: a) estão há algum tempo no mercado de trabalho e buscam atualização ou ascensão profissional; b) trabalham há pouco tempo ou estão entrando no mercado de trabalho, mas buscam especialização em alguma área. O primeiro grupo adentra o recinto universitário trazendo uma bagagem lingüística eivada de palavras e expressões técnicas; o segundo grupo se depara no recinto universitário com essas palavras ou expressões. Ocorre que, nas últimas três décadas, o desenvolvimento tecnológico avançou vertiginosamente, popularizando a informática e mudando os costumes lingüísticos de quem trabalha em indústrias praticamente automatizadas e em instituições que preparam pessoas para o mercado de trabalho. Por extensão, no ambiente familiar, onde a informática e a rede informacional se fazem presentes, a linguagem computacional está provocando mudanças culturais e, ao que tudo indica, respingam na língua materna. Tratase de uma abordagem fenomenológica, objetivando compreender e interpretar o uso que os falantes do português do Brasil fazem desses estrangeirismos, produzidos pelo avanço tecnológico, e inseridos na língua materna. Os participantes da pesquisa foram selecionados intencionalmente - seis professores e três alunos universitários - e compuseram um rol de nove entrevistados que contribuíram, de forma espontânea, para o desvelar das duas essências: Como expressão "natural", estrangeirismos fluem e influenciam a língua materna; e Estrangeirismos se des-velam na relação homem/mundo. Essas essências se correlacionam entre si e fluíram das seguintes dimensões: a informática como promissor campo de trabalho; O simples e o natural na linguagem da informática; A globalização provoca o uso de estrangeirismos; Estrangeirismos inserem-se na língua escrita; e O exercício da docência, no ensino técnico, sem habilitação docente. Refletindo sobre as falas dos sujeitos entrevistados, prossigo, comprometida com um trabalho visando à educação lingüística.
|
30 |
Advanced Quality Measures for Speech Translation / Mesures de qualité avancées pour la traduction de la paroleLe, Ngoc Tien 29 January 2018 (has links)
Le principal objectif de cette thèse vise à estimer de manière automatique la qualité de la traduction de langue parlée (Spoken Language Translation ou SLT), appelée estimation de confiance (Confidence Estimation ou CE). Le système de SLT génère les hypothèses représentées par les séquences de mots pour l'audio qui contient parfois des erreurs. En raison de multiples facteurs, la sortie de SLT, ayant une qualité insatisfaisante, pourrait causer différents problèmes pour les utilisateurs finaux. Par conséquent, il est utile de savoir combien de confiance les tokens corrects pourraient être trouvés au sein de l'hypothèse. L'objectif de l'estimation de confiance consistait à obtenir des scores qui quantifient le niveau de confiance ou à annoter les tokens cibles en appliquant le seuil de décision (par exemple, seuil par défaut = 0,5). Dans le cadre de cette thèse, nous avons proposé un boîte à outils, qui consiste en un framework personnalisable, flexible et en une plate-forme portative, pour l'estimation de confiance au niveau de mots (Word-level Confidence Estimation ou WCE) de SLT.En premier lieu, les erreurs dans le SLT ont tendance à se produire sur les hypothèses de la reconnaissance automatique de la parole (Automatic Speech Recognition ou ASR) et sur celles de la traduction automatique (Machine Translation ou MT), qui sont représentées par des séquences de mots. Ce phénomène est étudié par l'estimation de confiance (CE) au niveau des mots en utilisant les modèles de champs aléatoires conditionnels (Conditional Random Fields ou CRF). Cette tâche, relativement nouvelle, est définie et formalisée comme un problème d'étiquetage séquentiel dans lequel chaque mot, dans l'hypothèse de SLT, est annoté comme bon ou mauvais selon un ensemble des traits importants. Nous proposons plusieurs outils servant d’estimer la confiance des mots (WCE) en fonction de notre évaluation automatique de la qualité de la transcription (ASR), de la qualité de la traduction (MT), ou des deux (combiner ASR et MT). Ce travail de recherche est réalisable parce que nous avons construit un corpus spécifique, qui contient 6.7k des énoncés pour lesquels un quintuplet est normalisé comme suit : (1) sortie d’ASR, (2) transcription en verbatim, (3) traduction textuelle, (4) traduction vocale et (5) post-édition de la traduction. La conclusion de nos multiples expérimentations, utilisant les traits conjoints entre ASR et MT pour WCE, est que les traits de MT demeurent les plus influents, tandis que les traits de ASR peuvent apporter des informations intéressantes complémentaires.En deuxième lieu, nous proposons deux méthodes pour distinguer des erreurs susceptibles d’ASR et de celles de MT, dans lesquelles chaque mot, dans l'hypothèse de SLT, est annoté comme good (bon), asr_error (concernant les erreurs d’ASR) ou mt_error (concernant les erreurs de MT). Nous contribuons donc à l’estimation de confiance au niveau de mots (WCE) pour SLT par trouver la source des erreurs au sein des systèmes de SLT.En troisième lieu, nous proposons une nouvelle métrique, intitulée Word Error Rate with Embeddings (WER-E), qui est exploitée afin de rendre cette tâche possible. Cette approche génère de meilleures hypothèses de SLT lors de l'optimisation de l'hypothèse de N-meilleure hypothèses avec WER-E.En somme, nos stratégies proposées pour l'estimation de la confiance se révèlent un impact positif sur plusieurs applications pour SLT. Les outils robustes d’estimation de la qualité pour SLT peuvent être utilisés dans le but de re-calculer des graphes de la traduction de parole ou dans le but de fournir des retours d’information aux utilisateurs dans la traduction vocale interactive ou des scénarios de parole aux textes assistés par ordinateur.Mots-clés: Estimation de la qualité, Estimation de confiance au niveau de mots (WCE), Traduction de langue parlée (SLT), traits joints, Sélection des traits. / The main aim of this thesis is to investigate the automatic quality assessment of spoken language translation (SLT), called Confidence Estimation (CE) for SLT. Due to several factors, SLT output having unsatisfactory quality might cause various issues for the target users. Therefore, it is useful to know how we are confident in the tokens of the hypothesis. Our first contribution of this thesis is a toolkit LIG-WCE which is a customizable, flexible framework and portable platform for Word-level Confidence Estimation (WCE) of SLT.WCE for SLT is a relatively new task defined and formalized as a sequence labelling problem where each word in the SLT hypothesis is tagged as good or bad accordingto a large feature set. We propose several word confidence estimators (WCE) based on our automatic evaluation of transcription (ASR) quality, translation (MT) quality,or both (combined/joint ASR+MT). This research work is possible because we built a specific corpus, which contains 6.7k utterances for which a quintuplet containing: ASRoutput, verbatim transcript, text translation, speech translation and post-edition of the translation is built. The conclusion of our multiple experiments using joint ASR and MT features for WCE is that MT features remain the most influent while ASR features can bring interesting complementary information.As another contribution, we propose two methods to disentangle ASR errors and MT errors, where each word in the SLT hypothesis is tagged as good, asr_error or mt_error.We thus explore the contributions of WCE for SLT in finding out the source of SLT errors.Furthermore, we propose a simple extension of WER metric in order to penalize differently substitution errors according to their context using word embeddings. For instance, the proposed metric should catch near matches (mainly morphological variants) and penalize less this kind of error which has a more limited impact on translation performance. Our experiments show that the correlation of the new proposed metric with SLT performance is better than the one of WER. Oracle experiments are also conducted and show the ability of our metric to find better hypotheses (to be translated) in the ASR N-best. Finally, a preliminary experiment where ASR tuning is based on our new metric shows encouraging results.To conclude, we have proposed several prominent strategies for CE of SLT that could have a positive impact on several applications for SLT. Robust quality estimators for SLT can be used for re-scoring speech translation graphs or for providing feedback to the user in interactive speech translation or computer-assisted speech-to-text scenarios.Keywords: Quality estimation, Word confidence estimation (WCE), Spoken Language Translation (SLT), Joint Features, Feature Selection.
|
Page generated in 0.0499 seconds