• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 209
  • 118
  • 32
  • 13
  • 13
  • 12
  • 7
  • 6
  • 4
  • 3
  • 3
  • 3
  • 2
  • 1
  • 1
  • Tagged with
  • 469
  • 469
  • 155
  • 133
  • 93
  • 89
  • 85
  • 75
  • 72
  • 67
  • 65
  • 62
  • 50
  • 48
  • 45
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

A fraseologia do futebol: um estudo bilingue português-inglês direcionado pelo corpus / Football Phraseology: A bilingual Portuguese-English corpus-driven study.

Sabrina Matuda 09 August 2011 (has links)
O objetivo desta pesquisa é estudar a terminologia do futebol em inglês e português por meio do estabelecimento de equivalentes fraseológicos. A escolha de trabalhar com unidades fraseológicas, e não apenas com termos isolados, deve-se ao fato de acreditarmos que um termo raramente ocorre sozinho. Em outras palavras, é muito provável que este venha acompanhado de um colocado, formando uma colocação e, muitas vezes, seja até parte de uma unidade de sentido maior. Para tanto, a fundamentação teórica embasa-se na Linguística de Corpus, na Terminologia Textual, na Tradução Técnica como ato comunicativo sujeito a condicionantes culturais e no conceito forma-representação. O corpus de estudo possui, aproximadamente, um milhão de palavras em cada língua: 917.073 em português e 1.002.897 em inglês. Cada corpus é dividido em quatro subcorpora: regras do jogo, textos jornalísticos sobre resultados de partidas, narrações minuto a minuto e transmissões sociais. A análise do corpus foi realizada de maneira semiautomática, utilizando o etiquetador Tree-Tagger para fazer a etiquetagem morfossintática dos textos e o programa WordSmith Tools para explorar o corpus. O estudo nos mostrou que a extração de unidades fraseológicas é uma abordagem promissora para a compilação de um glossário que tenha como objetivo registrar o uso autêntico da terminologia técnica - em nosso caso, do futebol. Ao final do trabalho, apresentamos um modelo de glossário bilíngue português-inglês de fraseologias formadas a partir do termo gol, com base na análise realizada. / This study investigates football terminology both in English and in Portuguese and attempts to establish phraseological equivalents. Phraseological units were chosen to the detriment of individual terms because these usually occur in a larger context rather than as isolated lexical items living a life of their own. We believe that a term tends to be accompanied by a collocate, making up a collocation, which is frequently part of an extended unit of meaning. Therefore, the study is based on the notions of Corpus Linguistics and Textual Terminology. To explain cultural differences, technical translation is viewed as a communicative act subject to cultural restraints and the concept of form-representation is called upon to elucidate such differences. Our corpus consists of approximately two million words - 1.002.897 in English and 917.073 in Portuguese. Each corpus is divided into four subcorpora: laws of the game, newspaper reports on match results, live minute by minute commentaries and live commentaries by sports journalists and by football fans via social media like twitter and facebook. The analysis was carried out semi-automatically on tagged corpora, for which we used Helmut Schmids Tree-Tagger and Mike Scotts WordSmith Tools. All in all, the study proved that the extraction of phraseological units is a promising approach to build a glossary which aims at registering the authentic use of specialized language, in this case, the language of football. The study concludes with a model for a bilingual Portuguese-English phraseological glossary with entries made up of the term goal.
82

The Past Tenses of Early Middle Japanese

Hård, Arthur January 2018 (has links)
Early Middle Japanese is one of the oldest attested stages of Japanese. Its rich legacy consists of several literary works from the Heian era (7 th to 11 th centuries), some of which are still appreciated and widely read today. Despite a long tradition of research both within and outside Japan, quite a few details of the language remain incompletely understood. The present study addresses a long-standing question in the verbal domain of Early Middle Japanese, namely the semantics of the two so-called “past tenses” in -ki and -ker-. I tested the major hypotheses regarding their use by means of qualitative, corpus-based methods. Specifically, I trained a machine learning algorithm to predict which is likeliest of -ki and -ker- given a set of grammatical and semantic variables. Analysis of the results indicates that the suffixes likely embody a contrast between witnessed and non-witnessed past tense. It is also possible that mirativity—the grammaticalized expression of surprise at learning something unexpected—and aspect influence the choice of past tense suffix.
83

A lingüística de corpus a serviço do tradutor: proposta de um dicionário de culinária voltado para a produção textual / Corpus linguistics at the translator\'s service: proposal of an online dictionary of culinary aiming at text production

Teixeira, Elisa Duarte 01 December 2008 (has links)
Os dicionários sempre foram e ainda são uma das principais ferramentas da tarefa tradutória. No entanto, a terminografia parece não ter se beneficiado ainda de forma sistemática, pelo menos no Brasil, da estreita relação entre dicionários técnicos e esse público-alvo específico e cada vez mais expressivo: os tradutores técnicos. Na área da Culinária, por exemplo, cuja demanda por traduções tem crescido regularmente no país, os dicionários disponíveis no par de línguas inglês-português, ainda que possam contribuir para a compreensão do texto original, não dão qualquer informação sobre como os termos são, de fato, usados em textos reais na área, ou seja, não auxiliam o tradutor numa etapa fundamental da atividade tradutória: a produção textual na língua de chegada. Nossa tese é a de que um dicionário que procure atender as necessidades de produção textual do tradutor deve concentrar-se nos aspectos que caracterizam o texto técnico do ponto de vista da tradução, isto é, deve descrever e propor equivalentes ou soluções tradutórias para as Unidades de Tradução Especializadas (UTEs) presentes nesses textos, com as quais o tradutor da área se depara freqüentemente em sua prática, sejam elas terminológicas ou não. A Lingüística de Corpus (L.C.), abordagem empirista que vê a língua como um sistema probabilístico, tem se dedicado à identificação de padrões léxico-gramaticais recorrentes na linguagem por meio da observação de textos autênticos organizados sob a forma de corpora eletrônicos. É, portanto, a área de estudos que julgamos fornecer os subsídios teóricos e metodológicos mais adequados para compilar as UTEs a partir de textos reais. Para abrigar essas unidades, apresentamos uma proposta de dicionário online bidirecional inglês-português voltado para o tradutor da área técnica da Culinária. As etapas seguidas no desenvolvimento deste trabalho estão organizadas em seis capítulos. O primeiro trata de aspectos teórico-práticos da tradução técnica e discute o papel da terminologia na prática tradutória. O segundo examina as especificidades do trabalho do tradutor da área da Culinária no Brasil e caracteriza a receita culinária , foco de nosso estudo, como gênero e tipologia textual. No terceiro capítulo, as bases teóricas e metodológicas da L.C. são apresentadas, bem como os critérios de coleta do corpus que servirá de base para a identificação das UTEs. O Capítulo IV descreve a etapa de exploração desse corpus: primeiramente, apresentamos o estudo em que é feito um levantamento manual de padrões léxicogramaticais usando o programa WordSmith Tools; em seguida, descrevemos uma metodologia de extração semi-automática de UTEs no corpus. O Capítulo V apresenta nossa proposta do Dicionário Online de Culinária bidirecional para Tradutores, descrevendo sua macro- e microestrutura. O Capítulo VI apresenta nossas considerações finais. Os resultados obtidos em nosso trabalho demonstram que a L.C., se usada não apenas como metodologia, mas como abordagem teórica na exploração de corpora especializados, permite elaborar dicionários mais úteis e confiáveis para o tradutor, pois leva em consideração quaisquer padrões associativos entre palavras cuja probabilidade de ocorrer em textos representativos da área seja alta fato que justifica plenamente a inclusão desses padrões num dicionário voltado para o tradutor-produtorde- textos. / Dictionaries have always been and still are one of the main tools for the translator s task. Nevertheless, terminography does not seem to have systematically benefited, at least in Brazil, from the close relation between technical dictionaries and this increasingly significant and particular target audience: technical translators. In the field of Culinary, for instance, which has witnessed a growing demand for translations in Brazil, much as the dictionaries available for the English-Portuguese language pair may contribute to the understanding of the source text, they still do not provide any information on how the terms are actually used in real texts. In other words, they do not help the translator in a crucial step of the translation activity: text production in the target language. The thesis advanced here is that a dictionary which seeks to meet the needs of a translator s text production should focus on the aspects which characterize technical texts from the point of view of translation itself, that is, it should describe and propose translation equivalents or suggestions for the Specialized Translation Units (STUs) occurring in these texts, which the translator in the area often comes across in her/his practice, whether they are terminological or not. Corpus Linguistics (CL), an empirical approach which regards language as a probabilistic system, has devoted itself to the identification of recurring lexico-grammatical patterns in language by observing authentic texts organized as electronic corpora. It is, therefore, the field of studies we deem capable of providing the most adequate theoretical and methodological support to extract the STUs from real texts. In order to embrace these units, a proposal of an online bidirectional English-Portuguese dictionary is presented, which is aimed at the technical Culinary translator. The steps followed in the development of this study were organized in six chapters. The first deals with the theoretical and practical aspects of technical translation and discusses the role of terminology in translation practice. The second examines the specificities of the Culinary translator s job in Brazil and characterizes the culinary recipe , the focus of this study, in terms of text genre and typology. In the third chapter, the theoretical and methodological foundations of CL are presented, as well as the criteria used in the compilation of the corpus to be used for the identification of the STUs. Chapter IV describes the exploration of this corpus: first, we present the study in which a manual search of lexico-grammatical patterns using the WordSmith Tools program is carried out; next, a methodology for semi-automatic extraction of STUs in the corpus is described. Chapter V presents our proposal for a bidirectional online Culinary Dictionary for Translators, describing its macro- and microstructure. Chapter VI contains the final considerations. The results obtained in this study bear witness to the fact that CL, if used not only as a methodology, but as a theoretical approach in the investigation of specialized corpora, enables the production of more useful and trustworthy dictionaries for the translator, for it takes into account any association patterns between words with a probability of occurrence in representative texts in the field a fact which strongly supports the inclusion of these patterns in a dictionary aimed at the translator-producer-of-texts
84

A construção de um glossário bilíngue de futebol com o apoio da Linguística de Corpus. / Bulding a bilingual glossary on Football with the aid of Corpus Linguistics

Seemann, Paulo Augusto Almeida 26 March 2012 (has links)
Ao tentar traduzir um texto específico sobre o tema futebol da língua espanhola para o português brasileiro ou vice-versa, o tradutor se depara com uma infinidade de termos típicos dessa área de especialidade que não constam em muitos dos atuais dicionários e glossários, ou constam de forma limitada, sem abranger muitas das situações reais de uso. Neste trabalho, construímos um glossário bilíngue e bidirecional que contempla os termos futebolísticos mais frequentes no par linguístico espanhol-português, usados rotineiramente na comunicação escrita. Partimos da suposição que a Linguística de Corpus forneceria os meios necessários para tal empreitada. A Linguística de Corpus permite estudar uma língua ou variedade linguística por computador, por meio de evidências empíricas encontradas em um corpus, entendido como um conjunto de dados linguísticos textuais em formato eletrônico e coletado de forma criteriosa. Esta dissertação está dividida em cinco partes. Como introdução, falamos de alguns aspectos históricos das línguas portuguesa e espanhola, da influência do futebol em nossa sociedade, de problemas encontrados em dicionários e glossários, e do potencial das notícias futebolísticas da Internet como referência para a construção do glossário que propomos. Na segunda parte, comentamos a Linguística de Corpus como abordagem e método de investigação, os tipos de corpora e a composição de nosso corpus de estudo, a questão da equivalência na tradução e a forma como selecionamos os termos e seus equivalentes tradutórios, tendo como base a comparação de notícias futebolísticas do Brasil, da Espanha e da Argentina, além da extração e observação de palavras-chave, com a ajuda de ferramentas eletrônicas específicas. Na terceira parte, discutimos as questões terminológicas que envolvem este estudo, especialmente as decisões tomadas para a macro e microestrutura de nosso glossário. Na quarta parte, demonstramos como o glossário pode ser apresentado ao consulente e oferecemos uma amostra de verbetes. Na quinta e última parte, fazemos as considerações finais, em que concluímos que a Linguística de Corpus, como abordagem e metodologia, confirmou-se eficiente para a construção do glossário bilíngue, pois a exploração de corpora especializados permitiu identificar os principais termos futebolísticos e seus equivalentes tradutórios usados na comunicação escrita do jornalismo brasileiro, espanhol e argentino, resultando em uma obra de referência bilíngue específica do futebol com quase quatro mil verbetes; todos com exemplos reais de uso / When trying to translate a specific text on football from Spanish into Brazilian Portuguese or vice versa, the translator is faced with a myriad of footbal-specific terms which are not found in most dictionaries or glossaries, or which are found in a limited way, leaving out many real use situations. In the course of this study, a bilingual and bi-directional glossary was built with the most commonly used football terms in written communication in the Spanish-Portuguese language pair, . My initial assumption was that Corpus Linguistics would provide the necessary means for such a task. Corpus Linguistics enables one to study a language or a language variety using a computer, retrieving empirical evidence found in a corpus, which is defined as a set of texts, compiled according to predefined criteria, in electronic format. This dissertation is divided into five parts. In the introduction, some historical aspects of Portuguese and Spanish are discussed, as well as the influence of football in our society, the problems found in dictionaries and glossaries, and the potential of football news retrieved from the Internet as a basis for building the glossary proposed. In the second part, I argue that Corpus Linguistics is an approach and a method of research, and present the different types of corpora. Then, the question of equivalence in translation is briefly addressed, the content of our corpus of study is explained, as well as the steps adopted to identify the terms and their translation equivalents, through the comparison of football news from Brazil, Spain and Argentina, and by means of the extraction and observation of keywords, with the aid of specific electronic tools. In the third part, I discuss the terminology issues implicated in this study, especially with reference to the decisions taken for the macro- and microstructure of the glossary. In the fourth part, I propose a form of presenting the glossary to the user and provide a sample of entries. In the fifth and last part, I make the final considerations, in which I conclude that Corpus Linguistics, as an approach and a methodology, proved to be effective for the construction of the targeted bilingual glossary, since exploring the specialized corpora made it possible to properly identify the main football terms used in written communication in Brazilian, Spanish and Argentine journalism and their translation equivalents. The result is a bilingual work of reference in the field of football, which contains nearly four thousand entries, all of them with authentic examples of usage.
85

The story of pu: the grammaticalisation in space and time of a Modern Greek complementiser

Nicholas, Nick January 1998 (has links)
This work is concerned with tracing the historical development of the various functions of the Modern Greek connective pu. This connective has a considerable range of functions, and there have been attempts in the literature to group together these functions in a synchronically valid framework. It is my contention that the most illuminating way of regarding the functional diffusion of pu—and of any content word—is by looking, not only at one synchronic distribution (that of Standard Modern Greek), but at the full range of synchronic distributions in the sundry diatopic variants (dialects) of Modern Greek, and that such a discussion must be informed by the diachrony of the form. / This I attempt to do within the framework of grammaticalisation theory, whereby the development of grammatical forms is considered in the context of reanalysis and analogical extension of forms. As a diachronicist model, this allows for fluidity between function distinctions, and puts in place a historically-oriented alignment of semantic transitions which a strictly synchronicist account would miss. Work on pu has already been done in this framework; however, such work has considered the distribution of pu in Standard Greek alone, with only a brief consideration of its ancient antecedents. I contend that the picture formed of its distribution under such constraints leads to several false generalisations. / In order to arrive at a truer picture of the factors determining the development of pu, there are three facets that need to be considered in detail: / (a) its synchronic distribution in Standard Modern Greek, a variant for which extensive corpora and native speaker judgements are readily available; / (b) its distribution in the various modern dialects—to establish the possible diversification of developments for the particle, and to ensure that one potential pathway is not privileged as a universal tendency at the expense of other, divergent developments (a problem identifiable in treatments of this topic, hitherto looking only at the standard language); / (c) a detailed investigation of the use of the etymon of the particle— hópou—in Ancient Greek. It is one of the major contentions of grammaticalisation theory that the past meaning of a particle influences its subsequent meanings. In order to test the relevance of this principle fully, it is necessary to investigate the functionality of hópou not in isolation, but in the context of the entire Ancient Greek grammatical system. / Due to time and scope constraints, I attempt only these first three tasks in this thesis. I do not attempt a detail look at areal diffusion or the mediaeval Greek semantic transitions involved, nor at the use of pu in collocation.
86

Expressions of Future in Present-day English: A Corpus-based Approach

Berglund, Ylva January 2005 (has links)
<p>This corpus-based study of the use of expressions of future in English has two aims: to examine how certain expressions of future are used in Present-day English, and to explore how electronic corpora can be exploited for linguistic study. </p><p>The expressions focused on in this thesis are five auxiliary or semi-auxiliary verb phrases frequently discussed in studies of future reference in English: <i>will</i>, <i>’ll</i>, <i>shall</i>, <i>going to</i> and <i>gonna</i>. The study examines the patterned ways in which the expressions are used in association with various linguistic and non-linguistic (or extra-linguistic) factors. The linguistic factors investigated are co-occurrence with particular words and co-occurrence with items of particular grammatical classes. The non-linguistic factors examined are medium (written vs. spoken), text category, speaker characteristics (age, sex, social class, etc.), region and time. The data for the study are exclusively drawn from computer-readable corpora of Present-day English. Corpus analyses are performed with automatic and interactive methods, and exploit both quantitative and qualitative analytical techniques.</p><p>The study finds that the use of these expressions of future varies with a number of factors. Differences between spoken and written language are particularly prominent and usage also varies between different types of text, both within spoken and written corpora. Variation between groups of speakers is also attested. Although the linguistic co-occurrence patterns are similar to some degree, there are nonetheless differences in the collocational patterns in which the expressions are used. </p><p>Methodological issues related to corpus-based studies in general are discussed in the light of the insights gained from this study of expressions of future.</p>
87

The Greek Interjections : Studies on the Syntax, Semantics and Pragmatics of the Interjections in Fifth-Century Drama

Nordgren, Lars January 2012 (has links)
This thesis investigates the linguistic and philological characteristics of the primary interjections in Ancient Greek drama. It employs Ameka’s definition and classification from 1992 as its theoretical base, and provides a comprehensive research survey. The thesis has a data-driven approach, and is based on all items traditionally classified as interjections. In the chapter on morphology and syntax, the unique characteristics of interjections are presented. E.g., NPs co-occurring with interjections form an interjection phrase, which follows a specific pattern, in accordance with a phrase schema. The chapter on semantics, which is the main part of the thesis, employs an analytical model based on a moderate minimalism approach. This assumes that all items have a core meaning that can be identified without the aid of context, yet allows different, but related, meanings. The definition adopted in the present thesis states that interjections share only formal characteristics, and thus can be divided into categories based on their semantic features, which are defined using Kaplan’s notion of informational equivalence. The thesis deals with three such categories, each with its individual semantic properties: expressive interjections, express the speaker’s experience of emotion and/or cognition; conative interjections, express what the speaker wants the addressee or auditor to do; imitative interjections, depict or reproduce sounds or events. Items in category 1 are the most frequent and thus receive most attention. In the chapter on pragmatics, it is proposed that the primary function of interjections is to express the core semantics in a specified context. Felicity conditions are suggested for an utterance to convey the primary meaning of an interjection. Interjections are also shown to have various secondary functions, e.g. that of strengthening markers. Finally, a lexicon is provided, which offers individual informational equivalents of all interjections under study.
88

Ponctuation et syntaxe dans la langue française médiévale. Étude d'un corpus de chartes originales écrites à Liège entre 1236 et 1291

Mazziotta, Nicolas 21 December 2007 (has links)
%%%Un résumé mis en forme disponible dans les fichiers joints%%% Nous avons commencé par faire le pari que la syntaxe pouvait expliquer la majorité des signes de ponctuation. Cette optique nous a guidé durant toute notre étude, dont le but était de répondre à la question: «Comment, d'après ce qu'on peut observer dans les chartes écrites en français à Liège avant 1292, la ponctuation originale interagit-elle avec la syntaxe dans la langue française médiévale?» Nous avons d'emblée positionné notre étude par rapport à la réflexion sur la ponctuation médiévale, osant le pari que la syntaxe peut servir de point de référence pour expliquer la plus grande partie de la ponctuation des chartes. Nous avons ensuite décrit la constitution du corpus. Face à une pareille question, il n'était pas envisageable de commencer immédiatement à dépouiller les documents: il nous fallait définir avec exactitude les différents concepts dont nous allions avoir besoin. *** Première partie: modélisation *** La première partie du travail a ainsi été consacrée à la définition, sur des bases empiriques, des concepts mobilisés. Partant du sens commun et des principes fondamentaux de l'analyse linguistique classique (tenant du structuralisme et du fonctionnalisme), nous avons exploité les matériaux à notre disposition pour en dégager des notions, dans une approche inductive par son rapport aux faits, mais déductive par sa progression. Ainsi, au chapitre 2, l'observation du tracé des unités graphiques sur le parchemin nous a amené à abstraire les catégories nécessaires à une modélisation de l'ensemble des unités de la langue écrite, pour lesquelles nous proposons une terminologie neuve reflétant notre analyse. Nous avons progressivement défini _langue écrite_, puis _scriptèmes_, _grammèmes_, etc., progressant des unités les plus générales aux unités les plus particulières. Ce n'est qu'à ce prix que nous avons pu enfin délimiter exactement, le moins intuitivement possible, notre propre acception du mot _ponctuation_: «ensemble des ponctogrammes d'une langue écrite spécifique}. Dans cette définition, le terme _ponctogramme_ désigne une unité minimale de la langue écrite (_scriptème_) n'organisant pas l'espace (_grammème_), exprimant un contenu (_plérégramme_), ne dépendant pas matériellement d'une autre unité (_autogramme_), construit à l'aide de traits qui ne se combinent pas obligatoirement sur un même axe (_nébulogramme_) et non paraphrasable par d'autres unités significatives... Employer ce terme ne pouvait se faire qu'à la fin d'un exposé détaillé, passant en revue tous les hyperonymes impliqués. De manière moins audacieuse du point de vue de la terminologie employée, nous avons également tenté d'exposer notre conception de la syntaxe (chapitre 3). À nouveau, c'est le corpus qui nous a servi de guide: une fois les phrases délimitées de manière empirique, toutes les structures syntaxiques ont été passées en revue, nommées et intégrées dans un système théorique fondé sur la notion, héritée d'Alain Lemaréchal, de _relation minimale_. Nous sommes parti de l'existence d'un lien sémantique entre les unités en présence et nous avons caractérisé la manière dont ce lien était _spécifié_. Nous croyons, au delà de l'intérêt pratique de cette première partie, que les concepts dégagés peuvent être jugés suffisamment généraux sinon pour servir à la comparaison d'autres systèmes graphiques ou syntaxiques, du moins afin de constituer une base à leur description. *** Deuxième partie: analyse des données*** Une fois les concepts définis et l'ensemble du corpus annoté, il a été envisageable de répondre à la question posée. Néanmoins, l'ensemble des données disponibles, de par sa nature et son abondance, rendait l'approche traditionnelle -- ou plutôt _manuelle_ -- difficilement applicable. C'est pourquoi nous avons ouvert la seconde partie du travail en annonçant le recours à des méthodes plus outillées: les statistiques (introduites au chapitre 4). Ces méthodes présentées, nous avons sélectionné six caractéristiques morphosyntaxiques et positionnelles que nous avons jugées fondamentales pour décrire tous les constituants. Ces variables répondaient à six questions: 1/ du point de vue de l'ordre linéaire des mots, le constituant est-il le premier de la structure qu'il sert à construire? 2/ le constituant est-il le dernier de la structure qu'il sert à construire? 3/ quelle est la nature et le niveau d'intégration syntaxique de la structure qui le contient? 4/ quelle est la fonction du constituant? 5/ est-il de nature propositionnelle (mode personnel ou non)? 6/ est-il relaté? Nous avons ensuite pu mettre en relation les réponses à ces questions et la simple présence de ponctuation de part et d'autre des constituants, sans tenir compte, dans un premier temps, de la forme des ponctogrammes. Pour ce faire, nous avons essentiellement employé les techniques statistiques les plus classiques en sciences humaines: l'analyse des tableaux de contingence à l'aide du test du chi². Après avoir évalué la relation entre chacune des six variables et la ponctuation, nous avons constaté l'inefficacité de la méthode, ce qui nous a conduit à en rechercher une autre, permettant d'envisager simultanément toutes les variables morphosyntaxiques et positionnelles, en particulier. Ces nouveaux dépouillements nous ont permis de repérer, au milieu de la masse de constituants inégalement marqués par la présence d'un ponctogramme, ceux dont le marquage ou le rejet du marquage avait la plus faible probabilité d'être dû au hasard. Ce qui est ressorti de cette première étape, où les données étaient réduites à une représentation très abstraite, c'est une liste de points forts concernant: - la différence de fréquence entre le marquage de la phrase et celui des autres propositions; - la spécificité du marquage d'un certain nombre de types d'arguments; - le rejet manifeste du marquage du prédicat; - la faible fréquence de marquage à la suite des relateurs; - la forte présence de marquage devant les coordonnants. Nous avons ainsi pu observer que la ponctuation n'était pas obligatoire, mais que sa présence était certainement liée à un contexte syntaxique spécifique. Ensuite, ces grandes lignes ont pu être inspectées de manière plus concrète: pour chaque tendance qui le justifiait, nous avons évalué la probabilité que l'attraction ou la répulsion observée soit généralisée. Nous avons adopté la position pragmatique selon laquelle toute tendance suffisamment fréquente pouvait être considérée comme générale si le fait de retirer les chartes qui la manifestaient de manière significative de l'échantillon ne changeait pas significativement la probabilité d'attraction. Il en est ressorti que la plupart des tendances observées étaient générales ou trop faiblement illustrées pour être évaluées de ce point de vue. Par ailleurs, nous avons essayé de mettre en relation la ponctuation avec le contexte immédiat, ce qui nous a laissé observer que beaucoup de constituants étaient davantage, voire exclusivement marqués au contact d'autres constituants attirant également le marquage ou dans un contexte de coordination. Cet examen détaillé des tendances mises en évidence au chapitre 5 permet en fin de compte de faire le tri parmi les tendances et de repérer celles qui sont manifestement dues à l'entourage du constituant ou au document dans lequel il est attesté. En observant plus intuitivement les attestations, nous avons également pu repérer, comme nous nous y attendions, un certain nombre de tendances liées à des facteurs étrangers à la morphosyntaxe: la ponctuation de formules spécifiques au type discursif, celle des chiffres ou encore la présence d'un ponctogramme devant les noms de personnes. En outre, l'examen du détail des attestations nous a amené à proposer des révisions concernant le modèle d'analyse morphosyntaxique présenté au chapitre 3: 1/ il conviendrait que soient pris en compte les lexèmes employés; 2/ la notion de la coordination pourrait être étendue à des groupements de constituants que nous n'avons pas considérés comme coordonnés; 3/ il serait peut-être profitable de considérer les coordonnants de la même manière que les autres relateurs. D'autre part, nous avons insisté sur le fait que l'analyse des structures en syntaxe immédiate gagnerait à être moins abstraite. De cette étude de la fréquence du marquage est ressorti un ensemble d'environnements propices à la présence de ponctuation. À ce moment, il nous a été possible de réintroduire les considérations portant sur la _forme_ des ponctogrammes et d'employer l'_Analyse Factorielle des Correspondances_ (AFC) pour décrire les données. Nous avons effectué un tri croisé pour mesurer les associations entre la forme des ponctogrammes et la tendance au marquage spécifique à la position où se trouvait ce ponctogramme (ce qui incluait l'absence d'environnement attirant le marquage). Après une analyse exploratoire, nous avons complété notre étude par une série de tests évaluant la probabilité que les regroupements entre la forme des ponctogrammes et l'environnement dans lequel on les rencontre soit due au hasard. Dans la majorité des cas observés, les contrastes mis en évidence par l'AFC correspondaient à des oppositions significatives. L'étude détaillée de la forme a mené à la conclusion suivante: les ponctogrammes autres que <·> sont plus rares, et leur emploi paraît plus spécifique à un environnement donné. En d'autres termes: non seulement les scribes ne ponctuaient pas n'importe où, mais, en plus, ils n'employaient pas indifféremment les signes. Les méthodes ne permettant pas de traiter de manière efficace les ponctogrammes peu attestés, nous les avons simplement commentés, laissant de côté les statistiques pour une étude plus philologique. Ces observations ont mené, d'une part, à la critique de la validité de la transcription: 1/ certaines distinctions entre les formes sont peut-être superflues; 2/ certaines unités peuvent être confondues avec d'autres. D'autre part, la forme des ponctogrammes pose la question de la relation entre les ponctogrammes et le reste du système graphique.
89

Discourse markers within the university lecture genre:A contrastive study between Spanish and North-American lectures

Bellés Fortuño, Begoña 02 February 2007 (has links)
La tesis doctoral que aquí se presenta se podría enmarcar dentro de tres campos lingüísticos: el análisis de género, la retórica contrastiva y el análisis de corpus.El análisis de género (Swales 1981, 1990; Dudley-Evans & Henderson 1990a, 1990b; Henderson & Hewings 1990; Bathia 1993, 2002; Skulstad 1996, 2002; Flowerdew 1994, 2002) es un parte dentro del amplio campo de análisis del discurso (Barber 1962; Halliday, Strevens & McIntosh 1964). En este estudio nos centramos en el estudio de la clase magistral dentro de los denominados géneros académicos en el aula (Fortanet 2004b). La clase magistral es un género hablado y como tal posee ciertas peculiaridades de los géneros hablados en contraposición a los géneros académicos escritos.Nuestro estudio se centra en la comparación y contraste de dos lenguas, el español peninsular y el inglés americano, ya que como corpus se utilizan clases magistrales españolas y norte-americanas y en consecuencia se toman como referencia estudios de retórica contrastiva. En este estudio nos centramos en un aspecto concreto del lenguaje, los marcadores discursivos. Con el análisis de los marcadores discursivos en el lenguaje académico hablado en español e inglés norte-americano pretendemos ver como se usan los marcadores discursivos para favorecer a hablantes nativos y no nativos de español e inglés en el espacio de educación superior.
90

Topical Opinion Retrieval

Skomorowski, Jason January 2006 (has links)
With a growing amount of subjective content distributed across the Web, there is a need for a domain-independent information retrieval system that would support ad hoc retrieval of documents expressing opinions on a specific topic of the user’s query. While the research area of opinion detection and sentiment analysis has received much attention in the recent years, little research has been done on identifying subjective content targeted at a specific topic, i.e. expressing topical opinion. This thesis presents a novel method for ad hoc retrieval of documents which contain subjective content on the topic of the query. Documents are ranked by the likelihood each document expresses an opinion on a query term, approximated as the likelihood any occurrence of the query term is modified by a subjective adjective. Domain-independent user-based evaluation of the proposed methods was conducted, and shows statistically significant gains over Google ranking as the baseline.

Page generated in 0.0491 seconds