Global ETD Search

231	Good cop, bad cop? : A corpus analysis on the semantic prosody of the noun cop / Good cop, bad cop? : En korpus analys om den semantisk prosodiska förändringen hos substantivet cop Lund, Simon January 2018 (has links) The public opinion on the favourability towards the police varies greatly in the different populations in the United States. This is a corpus linguistic study that investigates a possible change in semantic prosody of the word cop. The study also investigates the distribution of the keyword in the different subcorpora to see if it has driven an overall change in the semantic prosody. The source is the Corpus of Historical American English, dating from 1800 to 2009. The Corpus of Historical American English is divided into four subcategories that have a median of 51% fictional material throughout the corpus. The four subcategories in the corpus are fiction, news, popular magazines, and non-fictional books. The data are divided into two categories, the first one being the positive/neutral category and the second category being the negative category. Neutral/positive is when cop is used with neutral or positive connotations and the negative is when cop is used with negative connotations. The period studied is that of 1859 to 2009 and this time span is divided into four periods to be more manageable. The distribution of the word in the subcorpora shows that the cop is used mostly in fictional material. The results show that the use of cop in contextual positive or neutral situations increase during the time span. To further this matter additional studies on cop in other corpora from America and other parts of the speaking world would benefit the knowledge on the noun’s semantic prosody and to further knowledge in the public opinion on the police. / Befolkningens attityd i de Förenta staterna varierar kraftigt mellan olika grupperingarna inom landet. Detta är en korpuslingvistisk studie som undersöker en möjlig semantisk prosodisk förändring hos ordet cop. Studien utgår ifrån Corpus of Historical American English som dateras mellan 1800 – 2009. Den semantiska prosodin undersöks genom att dela in träffarna i två kategorier, användning i neutral/positiv semantisk prosodi och användning i negativ semantisk prosodi. Studien undersöker även distributionen av ordet i korpusens underkorpus för att se om distributionen kan ha en koppling till semantisk prosodi. Underkorpusen är skönlitteratur, nyheter, populärmagasin och icke skönlitterära böcker. Tidsperioden som undersöks är 1859 – 2009 och denna period delas in i fyra mindre, mer hanterbara perioder. Resultatet visar att cop använts i en majoritet av skönlitterära texter i korpusen. Undersökningen visar att det finns en majoritet av negativ semantisk prosodisk användning under period 1 och 2. Period 3 skiftar lite mellan kategorierna men från 1940 hamnar cop i neutrala/positiv majoritet och Period 4 har en stark positiv semantisk prosodisk trend. Studien visar att cop har genomgått en förändring i ordets semantiska prosodi från negativ till neutral/positiv. För att utöka kunskapen kring detta ämne skulle fler undersökningar i fler korpus från de Förenta staterna och resten av den engelsktalande världen behöva göras. Detta skulle öka kunskapen kring ordets semantiska prosodi och kunna ge viss insikt i den offentliga attityden gentemot polisväsendet. Corpus linguistics semantic prosody cop corpus Korpuslingvistik semantisk prosodi cop korpus General Language Studies and Linguistics
232	Discours d'entreprise et organisation de l'information : apports de la textométrie dans la construction de référentiels terminologiques adaptables au contexte / Corporate discourses and information organization : Contribution of the textual statistics to the construction of terminological thesaurus adaptable to the context Erlos, Frédéric 16 January 2009 (has links) L'organisation de l'information sur un intranet (réseau informatique interne d’une organisation fonctionnant avec les technologies d'Internet) nécessite de nouvelles approches pour traiter la question de l'adéquation entre l’arborescence des sites et les usages linguistiques de leurs publics. Une façon de prendre en compte ces usages consiste à explorer les données textuelles représentatives d'une situation de communication spécifique. Une telle exploration est effectuée à l’aide de techniques textométriques, comme l'index hiérarchique des formes, les concordances, les segments répétés, la carte des sections d’un texte, le calcul des co-occurrences et l'analyse factorielle des correspondances. On extrait alors d’un corpus de textes de communication d’entreprise (rapports d’activité) les unités lexicales destinées à la construction d'un référentiel terminologique d’un type particulier. Afin de prendre en compte le contexte de communication on propose d’utiliser trois sortes de repères : - le référentiel d’objets propre à une organisation, - les propriétés pragmatiques des noms propres, - la collecte d’une partie du vocabulaire caractéristique du corpus utilisé comme source du référentiel terminologique, réalisée à partir d’une sélection de noms propres. Ainsi, cette collecte ne se limite pas aux seules unités terminologiques : elle comprend également des mots relevant de la langue commune et des noms propres. Les unités appartenant au vocabulaire du corpus sont choisies en fonction du type de relations sémantiques établies avec les noms propres dans les discours. Enfin, les résultats obtenus sont évalués en termes de productivité, de fiabilité et de représentativité. / Information organization on an intranet (internal network of an organization, using technologies of Internet) needs new approaches handling the question of the adequacy between the structures of intranet sites and the language use of their visitors. A way to take into account these usages is to explore textual data which are representative of a specific situation of communication. Such an exploration is carried out with textual statistics tools, like hierarchical index, concordance, repeated segment, textual map, co-occurrence and cluster analysis. This corpus-based approach allows us to extract linguistic units belonging, for example, to texts of corporate communication (annual reports). Recognition and storage of such lexical data aim at the construction of a terminological thesaurus of a peculiar type. We suggest taking into account the context of communication by using three sorts of marks : - the particular ontology of an organization such as it is evoked in discourses, - the pragmatic properties of the proper names, - a selection of proper names allows gathering a part of the characteristic vocabulary of the corpus used as source for the terminological thesaurus. This collection does not thus limit itself to the only terminological units, but also contains words of the common language and proper names. Elements belonging to the vocabulary of the corpus are selected according to the type of semantic relations established with the proper names in the texts. Finally, the results are assessed in terms of productivity, reliability and representativeness. Intranet Organisation de l’information Linguistique de corpus Textométrie Terminologie Nom propre Intranet Information organization Corpus linguistics Textual statistics Terminology Proper name
233	ANNIS: A graph-based query system for deeply annotated text corpora Krause, Thomas 11 January 2019 (has links) Diese Dissertation beschreibt das Design und die Implementierung eines effizienten Suchsystems für linguistische Korpora. Das bestehende und auf einer relationalen Datenbank basierende System ANNIS ist spezialisiert darin, Korpora mit verschiedenen Arten von Annotationen zu unterstützen und nutzt Graphen als einheitliche Repräsentation der verschiedener Annotationen. Für diese Dissertation wurde eine Hauptspeicher-Datenbank, die rein auf Graphen basiert, als Nachfolger für ANNIS entwickelt. Die Korpora werden in Kantenkomponenten partitioniert und für verschiedene Typen von Subgraphen werden unterschiedliche Implementationen zur Darstellung und Suche in diesen Komponenten genutzt. Operationen der Anfragesprache AQL (ANNIS Query Language) werden als Kombination von Erreichbarkeitsanfragen auf diesen verschiedenen Komponenten implementiert und jede Implementierung hat optimierte Funktionen für diese Art von Anfragen. Dieser Ansatz nutzt die verschiedenen Strukturen der unterschiedlichen Annotationsarten aus, ohne die einheitliche Darstellung als Graph zu verlieren. Zusätzliche Optimierungen, wie die parallele Ausführung von Teilen der Anfragen, wurden ebenfalls implementiert und evaluiert. Da AQL eine bestehende Implementierung besitzt und diese für Forscher offen als webbasierter Service zu Verfügung steht, konnten echte AQL-Anfragen aufgenommen werden. Diese dienten als Grundlage für einen Benchmark der neuen Implementierung. Mehr als 4000 Anfragen über 18 Korpora wurden zu einem realistischen Workload zusammengetragen, der sehr unterschiedliche Arten von Korpora und Anfragen mit einem breitem Spektrum von Komplexität enthält. Die neue graphbasierte Implementierung wurde mit der existierenden, die eine relationale Datenbank nutzt, verglichen. Sie führt den Anfragen im Workload im Vergleich ~10 schneller aus und die Experimente zeigen auch, dass die verschiedenen Implementierungen für die Kantenkomponenten daran einen großen Anteil haben. / This dissertation describes the design and implementation of an efficient system for linguistic corpus queries. The existing system ANNIS is based on a relational database and is focused on providing support for corpora with very different kinds of annotations and uses graphs as unified representations of the different annotations. For this dissertation, a main memory and solely graph-based successor of ANNIS has been developed. Corpora are divided into edge components and different implementations for representation and search of these components are used for different types of subgraphs. AQL operations are interpreted as a set of reachability queries on the different components and each component implementation has optimized functions for this type of queries. This approach allows exploiting the different structures of the different kinds of annotations without losing the common representation as a graph. Additional optimizations, like parallel executions of parts of the query, are also implemented and evaluated. Since AQL has an existing implementation and is already provided as a web-based service for researchers, real-life AQL queries have been recorded and thus can be used as a base for benchmarking the new implementation. More than 4000 queries from 18 corpora (from which most are available under an open-access license) have been compiled into a realistic workload that includes very different types of corpora and queries with a wide range of complexity. The new graph-based implementation was compared against the existing one, which uses a relational database. It executes the workload ~10 faster than the baseline and experiments show that the different graph storage implementations had a major effect in this improvement. Hauptspeicher-Datenbank Graphdatenbank Korpuslinguistik Suchmaschine In-memory database Graph database Corpus linguistics Search engine 004 Datenverarbeitung; Informatik ST 306 ddc:004
234	The Israeli-Palestinian Conflict in American, Arab, and British Media: Corpus-Based Critical Discourse Analysis Kandil, Magdi Ahmed 27 May 2009 (has links) The Israeli-Palestinian conflict is one of the longest and most violent conflicts in modern history. The language used to represent this important conflict in the media is frequently commented on by scholars and political commentators (e.g., Ackerman, 2001; Fisk, 2001; Mearsheimer & Walt, 2007). To date, however, few studies in the field of applied linguistics have attempted a thorough investigation of the language used to represent the conflict in influential media outlets using systematic methods of linguistic analysis. The current study aims to partially bridge this gap by combining methods and analytical frameworks from Critical Discourse Analysis (CDA) and Corpus Linguistics (CL) to analyze the discursive representation of the Israeli-Palestinian conflict in American, Arab, and British media, represented by CNN, Al-Jazeera Arabic, and BBC respectively. CDA, which is primarily interested in studying how power and ideology are enacted and resisted in the use of language in social and political contexts, has been frequently criticized mainly for the arbitrary selection of a small number of texts or text fragments to be analyzed. In order to strengthen CDA analysis, Stubbs (1997) suggested that CDA analysts should utilize techniques from CL, which employs computational approaches to perform quantitative and qualitative analysis of actual patterns of use occurring in a large and principled collection of natural texts. In this study, the corpus-based keyword technique is initially used to identify the topics that tend to be emphasized, downplayed, and/or left out in the coverage of the Israeli-Palestinian conflict in three corpora complied from the news websites of Al-Jazeera, CNN, and the BBC. Topics –such as terrorism, occupation, settlements, and the recent Israeli disengagement plan—which were found to be key in the coverage of the conflict—are further studied in context using several other corpus tools, especially the concordancer and the collocation finder. The analysis reveals some of the strategies employed by each news website to control for the positive or negative representations of the different actors involved in the conflict. The corpus findings are interpreted using some informative CDA frameworks, especially Van Dijk’s (1998) ideological square framework. Settlements Collocation Corpus linguistics Critical discourse analysis Israeli-Palestinian conflict Terrorism Keyword analysis Concordance Applied Linguistics First and Second Language Acquisition
235	Lietuvių kalbos samplaikos / Multi-word lexemes in the Lithuanian language Kovalevskaitė, Jolanta 12 April 2012 (has links) Darbo objektas yra lietuvių kalbos samplaikos, apibrėžiamos kaip dvižodžiai ar ilgesni iš kaitomų ir nekaitomų žodžių sudaryti stabilieji junginiai, sudarantys vientisos reikšmės leksinį vienetą, kuris dažniausiai vartojamas nesavarankiškos (tarnybinės) kalbos dalies funkcija. Disertacijos tyrimo tikslas – ištirti lietuvių kalbos samplaikų, kaip leksinio vieneto, pasižyminčio formos ir turinio stabilumu, autonomiškumą. Darbo šaltiniai: neanotuotas Dabartinės lietuvių kalbos tekstynas, morfologiškai anotuotas lietuvių kalbos tekstynas ir lygiagretusis vokiečių–lietuvių kalbų tekstynas. Darbo metodai: aprašomasis metodas, tekstynų lingvistikos metodas, statistiniai metodai, gretinamasis metodas. Ginamieji teiginiai: 1. Remiantis išplėstąja frazeologijos samprata, samplaikos yra sustabarėjusių kalbos vienetų tipas, laikomas frazeologijos objektu nuo tada, kai tekstynų analize įrodytas šių junginių dažnumas ir vartojimo pastovumas. 2. Samplaikų stabilumas yra nevienodas. Samplaikų dėmenų traukos įverčio ir morfologinės paradigmos nuokrypio tyrimas rodo, kad samplaikų stabilumo laipsnį lemia samplaikų sandara. 3. Samplaikų kontekstui būdingas stabilumas arba kintamumas. Stabilesnių samplaikų kontekstas kintamas, todėl jos yra autonomiškesnės. Mažesniu stabilumu pasižyminčios samplaikos, kurių kontekstas labiau apibrėžtas, yra ne tokios autonomiškos. 4. Autonomiškesnės samplaikos labiau linkusios būti vertimo vienetais nei mažiau autonomiškos. Kuo samplaika autonomiškesnė, tuo... [toliau žr. visą tekstą] / The object of the study is multi-word lexemes (samplaikos in Lithuanian), defined as combinations composed of two or more inflective or non-inflective parts of speech, grammatically and semantically perceived as one unit. The goal of the dissertation is to investigate the autonomy of multi-word lexemes in the Lithuanian language. Two monolingual corpora (the non-annotated Corpus of the Contemporary Lithuanian Language and the morphologically annotated Lithuanian language corpus) and the parallel German-Lithuanian corpus have been used for the extraction and the analysis of data. Several research methods have been applied: descriptive, corpus-based, statistical, and contrastive. The statements to be defended are as follows: 1. According to the broad conception of phraseology, multi-word lexemes are a subtype of multi-word units. They are considered to be an object of phraseology, since their frequency and fixedness have been confirmed by corpus analysis. 2. There are variations in the degree of fixedness of multi-word lexemes. The analysis of collocation strength between the elements of multi-word lexemes and of deviations in morphological paradigm indicates that the degree of fixedness of multi-word lexemes is largely determined by their composition. 3. The context of multi-word lexemes is characterized by stability or variability. The context of more stable multi-word lexemes is variable, which determines their greater autonomy. Less stable multi-word lexemes that occur in... [to full text] Philology Stabilieji junginiai Frazeologija Leksinis vienetas Vertimo vienetas Tekstynų lingvistika Multi-word units Phraseology Lexical item Translation unit Corpus linguistics
236	Multi-word lexemes in the Lithuanian Language / Lietuvių kalbos samplaikos Kovalevskaitė, Jolanta 12 April 2012 (has links) The object of the study is multi-word lexemes (samplaikos in Lithuanian), defined as combinations composed of two or more inflective or non-inflective parts of speech, grammatically and semantically perceived as one unit. The goal of the dissertation is to investigate the autonomy of multi-word lexemes in the Lithuanian language. Two monolingual corpora (the non-annotated Corpus of the Contemporary Lithuanian Language and the morphologically annotated Lithuanian language corpus) and the parallel German-Lithuanian corpus have been used for the extraction and the analysis of data. Several research methods have been applied: descriptive, corpus-based, statistical, and contrastive. The statements to be defended are as follows: 1. According to the broad conception of phraseology, multi-word lexemes are a subtype of multi-word units. They are considered to be an object of phraseology, since their frequency and fixedness have been confirmed by corpus analysis. 2. There are variations in the degree of fixedness of multi-word lexemes. The analysis of collocation strength between the elements of multi-word lexemes and of deviations in morphological paradigm indicates that the degree of fixedness of multi-word lexemes is largely determined by their composition. 3. The context of multi-word lexemes is characterized by stability or variability. The context of more stable multi-word lexemes is variable, which determines their greater autonomy. Less stable multi-word lexemes that occur in... [to full text] / Darbo objektas yra lietuvių kalbos samplaikos, apibrėžiamos kaip dvižodžiai ar ilgesni iš kaitomų ir nekaitomų žodžių sudaryti stabilieji junginiai, sudarantys vientisos reikšmės leksinį vienetą, kuris dažniausiai vartojamas nesavarankiškos (tarnybinės) kalbos dalies funkcija. Disertacijos tyrimo tikslas – ištirti lietuvių kalbos samplaikų, kaip leksinio vieneto, pasižyminčio formos ir turinio stabilumu, autonomiškumą. Darbo šaltiniai: neanotuotas Dabartinės lietuvių kalbos tekstynas, morfologiškai anotuotas lietuvių kalbos tekstynas ir lygiagretusis vokiečių–lietuvių kalbų tekstynas. Darbo metodai: aprašomasis metodas, tekstynų lingvistikos metodas, statistiniai metodai, gretinamasis metodas. Ginamieji teiginiai: 1. Remiantis išplėstąja frazeologijos samprata, samplaikos yra sustabarėjusių kalbos vienetų tipas, laikomas frazeologijos objektu nuo tada, kai tekstynų analize įrodytas šių junginių dažnumas ir vartojimo pastovumas. 2. Samplaikų stabilumas yra nevienodas. Samplaikų dėmenų traukos įverčio ir morfologinės paradigmos nuokrypio tyrimas rodo, kad samplaikų stabilumo laipsnį lemia samplaikų sandara. 3. Samplaikų kontekstui būdingas stabilumas arba kintamumas. Stabilesnių samplaikų kontekstas kintamas, todėl jos yra autonomiškesnės. Mažesniu stabilumu pasižyminčios samplaikos, kurių kontekstas labiau apibrėžtas, yra ne tokios autonomiškos. 4. Autonomiškesnės samplaikos labiau linkusios būti vertimo vienetais nei mažiau autonomiškos. Kuo samplaika autonomiškesnė, tuo... [toliau žr. visą tekstą] Philology Multi-word units Phraseology Lexical item Translation unit Corpus linguistics Stabilieji junginiai Frazeologija Leksinis vienetas Vertimo vienetas Tekstynų lingvistika
237	PragmaSUM: novos m?todos na utiliza??o de palavras-chave na sumariza??o autom?tica Rocha, Valdir J?nior Cordeiro 05 December 2017 (has links) Submitted by Jos? Henrique Henrique (jose.neves@ufvjm.edu.br) on 2018-05-03T18:35:26Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) valdir_junior_cordeiro_rocha.pdf: 3757934 bytes, checksum: 00a2e6ee18188436daa1415ec6a05021 (MD5) / Approved for entry into archive by Rodrigo Martins Cruz (rodrigo.cruz@ufvjm.edu.br) on 2018-05-04T16:22:37Z (GMT) No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) valdir_junior_cordeiro_rocha.pdf: 3757934 bytes, checksum: 00a2e6ee18188436daa1415ec6a05021 (MD5) / Made available in DSpace on 2018-05-04T16:22:37Z (GMT). No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) valdir_junior_cordeiro_rocha.pdf: 3757934 bytes, checksum: 00a2e6ee18188436daa1415ec6a05021 (MD5) Previous issue date: 2017 / Com a amplia??o do acesso ? internet e a cria??o de ferramentas que possibilitam pessoas a criarem conte?do, a informa??o dispon?vel cresce de forma acelerada. Textos sobre os mais diversos assuntos e autores s?o criados todos os dias. ? imposs?vel absorver a quantidade de informa??o dispon?vel, o que dificulta a escolha da mais adequada para determinado interesse ou p?blico. A sumariza??o autom?tica de textos, al?m de apresentar um texto de forma condensada, pode simplifica-lo, gerando uma alternativa para ganho de tempo e amplia??o do acesso a informa??o contida aos mais diferentes tipos de leitores. Os sumarizadores autom?ticos existentes atualmente na literatura n?o apresentam m?todos de personifica??o dos sum?rios para cada tipo de leitor, e consequentemente geram resultados pouco precisos. Este trabalho tem como objetivo utilizar o sumarizador autom?tico de textos PragmaSUM em textos educacionais com novas t?cnicas de sumariza??o utilizando palavras-chave. A utiliza??o de m?todos de personifica??o do sum?rio com palavras-chave visa aumentar a precis?o e melhorar o desempenho do PragmaSUM e seus sum?rios. Para isto, um corpus formado apenas por artigos cient?ficos da ?rea educacional foi criado para realiza??o de testes e compara??es entre diferentes sumarizadores e m?todos de sumariza??o. O desempenho dos sumarizadores foi medido pelas m?tricas Recall, Precision e F-Measure presentes na ferramenta ROUGE e validados com os testes estat?sticos ANOVA de Friedman e Coeficiente de Concord?ncia de Kendall. Os resultados obtidos apontam uma melhora no desempenho com a utiliza??o de palavras-chave na sumariza??o com o PragmaSUM, indicando a import?ncia na escolha adequada destas palavras-chave para classifica??o do conte?do do texto fonte. / Disserta??o (Mestrado Profissional) ? Programa de P?s-Gradua??o em Educa??o, Universidade Federal dos Vales do Jequitinhonha e Mucuri, 2017. / By expanding access to the internet and creating tools that enable people to create content, available information grows rapidly. Texts on the most diverse subjects and authors are created every day. It is impossible to absorb the amount of information available, which makes it difficult to choose the most appropriate for a particular interest or public. Automatic text summarization, as well as presenting a condensed text, can simplify it, generating an alternative to gain time and increase the access to information contained to the most different types of readers. The automatic summarizers that currently exist in the literature do not present methods of personification of the summaries for each type of reader, and consequently generate results inaccurate. This work aims to use the PragmaSUM automatic text summarizer in educational texts with new summarization techniques using keywords. Using summary keywords impersonation methods is intended to increase accuracy and improve the performance of PragmaSUM and its summaries. For this, a corpus formed only by scientific articles of the educational area was created to carry out tests and comparisons between different summarizers and summarization methods. The performance of the summarizers was measured by the Recall, Precision and F-Measure metrics present in the ROUGE tool and validated with the Friedman ANOVA statistical tests and Kendall's coefficient of agreement. The results obtained indicate an improvement in the performance with the use of keywords in the summarization with PragmaSUM, pointing out importance in the appropriate choice of these keywords for classification of the content of the source text. PragmaSUM Sumariza??o autom?tica de textos ROUGE Corpus Linguistics Lingu?stica computacional Lingu?stica de corpus Automatic summarization of texts Computational linguistics
238	A generic and open framework for multiword expressions treatment : from acquisition to applications Ramisch, Carlos Eduardo January 2012 (has links) The treatment of multiword expressions (MWEs), like take off, bus stop and big deal, is a challenge for NLP applications. This kind of linguistic construction is not only arbitrary but also much more frequent than one would initially guess. This thesis investigates the behaviour of MWEs across different languages, domains and construction types, proposing and evaluating an integrated methodological framework for their acquisition. There have been many theoretical proposals to define, characterise and classify MWEs. We adopt generic definition stating that MWEs are word combinations which must be treated as a unit at some level of linguistic processing. They present a variable degree of institutionalisation, arbitrariness, heterogeneity and limited syntactic and semantic variability. There has been much research on automatic MWE acquisition in the recent decades, and the state of the art covers a large number of techniques and languages. Other tasks involving MWEs, namely disambiguation, interpretation, representation and applications, have received less emphasis in the field. The first main contribution of this thesis is the proposal of an original methodological framework for automatic MWE acquisition from monolingual corpora. This framework is generic, language independent, integrated and contains a freely available implementation, the mwetoolkit. It is composed of independent modules which may themselves use multiple techniques to solve a specific sub-task in MWE acquisition. The evaluation of MWE acquisition is modelled using four independent axes. We underline that the evaluation results depend on parameters of the acquisition context, e.g., nature and size of corpora, language and type of MWE, analysis depth, and existing resources. The second main contribution of this thesis is the application-oriented evaluation of our methodology proposal in two applications: computer-assisted lexicography and statistical machine translation. For the former, we evaluate the usefulness of automatic MWE acquisition with the mwetoolkit for creating three lexicons: Greek nominal expressions, Portuguese complex predicates and Portuguese sentiment expressions. For the latter, we test several integration strategies in order to improve the treatment given to English phrasal verbs when translated by a standard statistical MT system into Portuguese. Both applications can benefit from automatic MWE acquisition, as the expressions acquired automatically from corpora can both speed up and improve the quality of the results. The promising results of previous and ongoing experiments encourage further investigation about the optimal way to integrate MWE treatment into other applications. Thus, we conclude the thesis with an overview of the past, ongoing and future work. Linguagem natural Linguística computacional Natural language processing Computational linguistics Multiword expressions Lexical acquisition Machine translation Lexicography Corpus linguistics
239	Bases teórico-metodológicas para elaboração de um glossário bilíngue (português-inglês) de treinamento de força : subsídios para o tradutor Dornelles, Márcia dos Santos January 2015 (has links) O terminógrafo, ao elaborar um produto terminográfico bilíngue para tradutores, deve preocupar-se não só em repertoriar, nas duas línguas, os termos próprios de uma (sub)área do conhecimento, mas também em apresentá-los inseridos em suas combinatórias típicas, ou seja, associados aos elementos que a eles se combinam em nível sintagmático, de forma recorrente nos textos daquela especialidade. Isso porque o tradutor precisa produzir um texto de chegada adequado ao padrão de linguagem em foco, de forma a espelhar o modus dicendi daquele campo. Assim, seu texto soará natural à comunidade de leitores, evitando-se ruídos na comunicação. Diante da falta de produtos terminográficos bilíngues sobre Treinamento de Força (TF), dirigido a tradutores, esta investigação tem como objetivo central apresentar bases teórico-metodológicas para a elaboração de um glossário português-inglês da terminologia do TF. Esse glossário é aqui apresentado como um protótipo, uma amostra de um todo, destinado a auxiliar especialmente tradutores brasileiros que trabalhem na direção português→inglês, mas que pode ser aproveitado também por pesquisadores e estudantes dessa temática que precisem produzir artigos científicos em inglês. Ele inclui guia do usuário, uma árvore de domínio em português do TF, lista de termos em português e 30 exemplares de fichas terminológicas em formato estendido. Outro objetivo do estudo é oferecer uma descrição do comportamento dos termos em português e inglês, e das unidades fraseológicas especializadas (UFE) eventivas (BEVILACQUA, 2003; 2004) em português no âmbito dos artigos científicos sobre TF. Como referencial teórico, valemo-nos dos princípios da Teoria Comunicativa da Terminologia (TCT) e dos fundamentos e diretrizes da Linguística de Corpus (LC). Seguir a TCT (CABRÉ, 1999a; 1999b; 2001a; 2001b; 2003; 2009) implica adotar o termo como objeto central de estudo e concebê-lo, antes de tudo, como uma unidade lexical da língua natural que adquire valor especializado dentro de um contexto especializado, segundo critérios semânticos, discursivos e pragmáticos. Seguir a LC (BIBER, 2012; BERBER SARDINHA, 2004) implica uma visão probabilística da língua, pressupondo que, embora muitos traços linguísticos sejam possíveis teoricamente, não ocorrem com a mesma frequência. Ganham realce no estudo os temas da variação terminológica, da tradução funcional e do artigo científico como gênero especializado. Nosso corpus de estudo é constituído de 70 artigos de periódicos científicos de destaque no âmbito do TF, escritos originalmente em português e inglês. São, portanto, dois subcorpora, um em cada língua, que são comparáveis. Para exploração e análise do corpus, utilizamos o software AntConc (ANTHONY, 2011), especialmente as funcionalidades keyword list, n-grams e concordance. Como material de apoio, utilizamos livros-texto e artigos científicos de referência sobre TF, um glossário particular pré-existente de Educação Física, a Terminologia Anatômica Internacional, o Google Acadêmico, o Wikipédia, entre outros. Também contamos com a colaboração de dois consultores especialistas em TF. A pesquisa contempla, então, uma parte teórica e uma parte aplicada que se inter-relacionam e se inserem na dupla face da Terminologia, visto que há uma descrição de uma linguagem especializada a partir de um dado ponto de vista teórico e o desenho de um produto concreto. / When designing a bilingual terminographic product for translators, a terminographer must be concerned not only with including, in both languages, the specific terms of a (sub)field of knowledge, but also with presenting these terms within their typical phraseological structures, that is, associated with the elements they combine with syntagmatically and recurrently in the texts of that domain. This is because a translator needs to produce a target text appropriate to the language pattern in focus, so as to reflect the modus dicendi of that specialized field. In this way, the text produced will sound much more natural to the community of readers, thereby avoiding noise in communication. Given the lack of bilingual terminographic products on Strength Training (ST), addressed to translators, the main purpose of this research study is to provide theoretical and methodological foundations for the development of a Portuguese-English glossary of ST terminology. This glossary is presented here as a prototype – a sample of a whole – especially designed to assist Brazilian translators working in the Portuguese to English direction, but it can also be useful for researchers and students of this subject to produce scientific papers in English. It includes a user guide, a domain tree of ST in Portuguese, a list of terms in Portuguese, and 30 sample terminology records in extended format. Another objective of the study is to provide a description of the behavior of terms in Portuguese and English, and of eventive specialized phraseological units (BEVILACQUA, 2003; 2004) in Portuguese on ST scientific articles. As theoretical framework, we based on the principles of the Communicative Theory of Terminology (CTT) and on the foundations and guidelines of Corpus Linguistics (CL). Following CTT (CABRÉ, 1999a; 1999b; 2001a; 2001b; 2003; 2009) implies adopting the term as the central object of study and conceiving it, first of all, as a lexical unit of natural language that acquires specialized value within a specialized context, according to semantic, discursive and pragmatic criteria. Following CL (BIBER, 2012; BERBER SARDINHA, 2004) implies a probabilistic viewpoint of language, assuming that, although many linguistic features are possible theoretically, they do not occur with the same frequency. The topics of terminological variation, functional approach to translation, and the scientific article as a specialized genre are also highlighted in the study. Our corpus consists of 70 articles from leading scientific journals on ST, originally written in Portuguese and English. They are two comparable subcorpora, one in each language. For the exploration and analysis of the corpus, we used the AntConc software (ANTHONY, 2011), especially the tools keyword list, n-grams and concordance. As support material, we used textbooks and reference scientific papers on ST, a pre-existing personal glossary of Physical Education, the International Anatomical Terminology, Google Scholar, Wikipedia, among others. We also had the collaboration of two expert consultants in ST. Therefore, the research embraces a theoretical part and an applied part that interrelate and fall into the double face of Terminology, since there is a description of a specialized language from a given theoretical point of view and the design of a concrete product. Terminologia Lingüística de corpus Terminografia Glossário Treinamento de força Communicative theory of terminology Corpus linguistics Terminography Bilingual glossary Strength training
240	Asymetrie větných segmentů při překladu z japonštiny do češtiny / Asymmetry of Sentence Segments in Japanese to Czech translations Jirkal, Martin January 2018 (has links) It is evident from the data included in the Czech-Japanese Parallel Corpus that apparent qualitative shifts between corresponding sentence segments in source and target languages appear due to the process of translation from Japanese. My goal then is the analysis of this asymmetry of sentences in translations from Japanese to Czech and evaluation of its causes and effects. This issue is viewed through the theory of translation universals (explicitation, implicitation, normalization, simplification). However, it is also concerned with the theory of information density, although its application has during the research appeared at least problematic. The theoretical outlook of translatology on these theories and the detailed process of sample selection is discussed in the introduction of the thesis. The results of the analysis of asymmetric sentences are discussed in the central part of the thesis, which is mainly concerned with the summary of language features and situations creating this asymmetry but also with the question which general trends can be considered to exist in Japanese-Czech translation based on this summary. Finally, the distribution of asymmetric segments in six analysed translation is studied as well as the potential influence of translators on their creation. Keywords: Japanese,...

Search results