Global ETD Search

1	[en] THE CORPUS NEVER LIES: ON THE IDENTIFICATION AND USE OF MULTIWORD EXPRESSIONS / [pt] O CÓRPUS NÃO MENTE JAMAIS: SOBRE A IDENTIFICAÇÃO E USO DE COMBINAÇÕES MULTIVOCABULARES DO TIPO VERBO MAIS SINTAGMA NOMINAL MILENA DE UZEDA GARRAO 22 August 2006 (has links) [pt] Muitos estudos recentes sobre a identificação e uso de combinações multivocabulares (CMs) adotam uma perspectiva representacionista do significado da palavra. Este estudo propõe que é muito mais interessante identificar as CMs por um olhar não-representacionista. A metodologia proposta foi testada em CMs do tipo V+SN, um padrão bastante freqüente no português do Brasil (PB). Trata-se de uma análise estatística com base em córpus que pode ser resumida em três etapas: 1) córpus robusto do PB como base de análise, 2) aplicação de um teste estatístico ao córpus, a saber, teste de Logaritmo de Verossimilhança (Banerjee e Pedersen, 2003), para detecção das CMs mais freqüentes com padrão V+SN (como tomar café) e exclusão de co-ocorrências sintáticas aleatórias dos mesmos itens lexicais, 3) aplicação de Medidas de Similaridade (Baeza-Yates e Ribeiro-Neto, 1999) entre todos os parágrafos contendo uma certa CM (por exemplo, fazer campanha) e todos os parágrafos contendo o substantivo fora da CM (campanha). Esta última etapa foi utilizada para avaliar o grau de composicionalidade da CM. Pôde-se concluir que quanto maior a similaridade entre os parágrafos contendo a CM e os parágrafos contendo o substantivo fora da expressão, maior será o grau de composicionalidade da CM. Por essa razão, este estudo tem um impacto tanto teórico quanto prático para a semântica. / [en] A considerable amount of recent researches on defining multi-word expressions´ (MWE) phenomenon has an underlying representational framework of word meaning. In this study we claim that it is much more interesting to view MWE from a non-representational perspective. By choosing this path, we avoid the time-consuming and controversial human intuitions to MWE identification and definition. Our methodology was tested on Brazilian Portuguese verbal phrases of V+NP pattern. It is a statistically-based corpus analysis which could be summed up as the following three sequent steps: 1) robust linguistic corpora as output, 2) application of a probabilistic test to the corpora, namely Log Likelihood test (Banerjee and Pedersen, 2003), in order to spot the Portuguese MWEs of V+NP pattern (such as tomar café) and disregard casual syntactic and not otherwise motivated co-occurrences of the same lexical items, 3) application of Similarity Measures (Baeza-Yates and Ribeiro-Neto, 1999) between all the paragraphs containing a certain MWE and all the paragraphs containing its separate noun. This latter step is crucial to assess the MWE compositionality level. We conclude that the higher are the similarity measures between the MWE (such as fazer campanha) and its separate noun (campanha), the more compositional will be the MWE. Therefore, we believe that this work has both a practical and a theoretical impact to semantics. [pt] COMBINACOES MULTIVOCABULARES [en] MULTIWORD EXPRESSIONS [pt] COLOCACOES VERBAIS [en] VERBAL COLLOCATIONS [pt] LEXICOGRAFIA DE CORPUS [en] CORPUS LEXICOGRAPHY [pt] SEMANTICA DE CORPUS [en] CORPUS SEMANTICS
2	Les thèmes et le temps dans Le Monde diplomatique (1990-2008) / Themes and time in Le Monde diplomatique (1990-2008) Metwally, Heba 11 December 2017 (has links) La démocratisation des textes numérisés change aujourd’hui nos ambitions scientifiques. Lire les big data n’est plus un idéal auquel on aspire. Dès lors, l’interprétation des gros corpus devient un impératif et se pose en défi. Puisque les textes s’étalent naturellement dans le temps, les gros corpus prennent le plus souvent la forme des corpus chronologiques. Ceux-ci représentent ici un objet de connaissance ordonné qui approfondit notre compréhension des données sérielles et met en question la pertinence du recours à une statistique traditionnelle.Le Monde diplomatique est un mensuel sérieux et reconnu par les instances universitaires comme source de première main. En 2015, il comptait 37 éditions internationales en 20 langues. Journal français engagé à large diffusion internationale, il fait l’objet d’études universitaires nombreuses. Une analyse thématique documentée vise ici l’observation de l’évolution du discours sans complexe du mensuel dans un monde en reconstruction. Comment le MD gère-t-il l’évolution de son discours au lendemain de la chute du mur de Berlin et jusqu’à la fin de la guerre mondiale contre le terrorisme ? La fin du XXe siècle et le début du XXIe siècle est un laps de temps assez court et pourtant foisonnant.Au confluent de ce double intérêt pour les données sérielles chronologiques et l’analyse de l’évolution thématique du MD, une série textuelle chronologique regroupant plus de 5000 articles publiés entre 1990 et 2008 qui comptent plus de 11 millions d’occurrences est réduite à une maquette. Celle-ci devient un prêt-à-monter rapide qui nous assiste dans une lecture qui articule les niveaux descriptifs de la textualité pour aller au fond des moments de sens stabilisé, pour arriver au bout de la marche du temps et pour pratiquer une sémantique appropriée dans toute sa complexité. / Dealing with big data today is becoming a big challenge for scholars who are conducting corpus-based studies. As producing texts spreads normally over time, scholars are interfacing increasingly with chronological corpora. Studying time series deepens our understanding of chronological data and modifies our ideas about the appropriate statistical analysis. The Monde diplomatique is a monthly newspaper distributed worldwide. In 2015, it had 37 editions and was read in 20 languages. As a French international journal offering serious analysis on politics, economics, culture and current affairs, it is an area of interest for several university studies. We aim here to offer a documented analysis of the evolution of its discourse in the aftermath of the Fall of the Berlin Wall and till the end of the Global War on Terror (GWOT).To analyse big corpora that stretch out over time we need to adjust our practices in corpus semantics and statistical data analysis. That is what we propose by using a scale model of a chronological corpus initially composed of more than 5000 articles (ca 11 million text words). A new reduced and authentic model guarantees appropriate approach to different text levels to study meaning over time. Corpus chronologique Textométrie Analyse du discours médiatique Sémantique de thèmes Logogénétique Chronological corpus Textometry Media analysis Corpus semantics Logogenesis
3	Sémantique de corpus et didactique des langues : application à des discours journalistiques et politiques de langue arabe / Corpus Semantics and language learning : application to journalistic discourses and political speeches in Arabic language Makouar, Nadia 01 December 2014 (has links) L’objectif de cette recherche en linguistique de corpus est d’appliquer, suivant les concepts et principes de la sémantique interprétative, une méthode d’analyse contrastive de textes pour l’apprentissage de la langue arabe, en utilisant l’outil de textométrie Lexico 3. Cette étude se base sur deux corpus : l’un de discours journalistiques (thème des révolutions arabes de 2011), et l’autre de discours politiques (de Gamal Abdel Nasser et Anouar Sadate). Nous posons l’hypothèse que, d’une part, la sémantique outillée permet de caractériser les orientations idéologiques et politiques des différents énonciateurs. D’autre part, nous supposons que les analyses permettront de proposer des pistes didactiques applicables dans le cadre d’un apprentissage de la langue arabe, et en particulier pour la compréhension et la production écrites.La première partie de cette étude présente la linguistique de corpus, situe et décrit la sémantique des textes dans les sciences du langage et expose quelques caractéristiques de la langue arabe. La deuxième partie présente nos analyses de textes journalistiques et politiques et met en évidence, les positions des journaux vis-à-vis des révolutions en Égypte et au Bahreïn et expose, les différences d’énonciation des deux présidents égyptiens sur les politiques conduites en Égypte et dans le monde arabe.La troisième et dernière partie présente le volet théorique et pratique de nos pistes didactiques. Elle ancre notre positionnement sur l’interdisciplinarité, en faisant appel au paradigme des "connaissances" (qui se distingue de la notion de "compétence") en didactique des langues. Cette partie décrit, enfin, l’expérimentation avec sept étudiants de langue arabe. Elle montre les difficultés et les apports de cette expérimentation et montre qu’il est possible de penser un processus de conscientisation vis-à-vis des données langagières, qui doit, en outre, marquer une rupture avec la simple transmission d’informations à l’apprenant. / The purpose of this research in corpus linguistics is to apply, in accordance with the concepts and principals of interpretive semantics, a method of contrastive analysis of texts for learning Arabic using the Textometry tool Lexico 3. It is based on two corpora: a journalistic discourse (from Arab revolutions of 2011), and political speeches (from Gamal Abdel Nasser and Anwar Sadat). We postulate that the Corpus Semantics allows us to characterize the ideological and political orientation of the different enunciators. Furthermore we assume that the study will provide practical didactic approaches in the context of learning the Arabic language, in particular for the understanding and writing learning.The first part of this study presents the Corpus Linguistics, situates and describes the Interpretive Semantics theory in the linguistics field and shows some characteristics of the Arabic language. The second part presents our analyses of journalistic and political texts, highlighting the stance of newspapers on the revolutions in Egypt and Bahrain and exposes the differences in articulation between two Egyptian presidents regarding their policies in Egypt and in the Arab world.The third and last part presents the theoretical and practical component of our educational tracks. It anchors our work on the interdisciplinarity by drawing on the paradigm of "knowledge" in language teaching (which is distinct from the notion of "competence"). This section describes the experiment with seven students of Arabic. It shows the difficulties and the benefits of this experiment and demonstrates that it is possible to think of a process of awareness in regards to the language data, which must, amongst other points, mark a break from the simple transmission of information to the learner. Linguistique de corpus Semantique de Corpus Analyse des médias Éducations aux médias Révolutions arabes 2011 Discours politiques Lexico 3 Égypte Bahrein Apprentissage de la langue arabe Corpus Linguistics Corpus Semantics Media Analysis Media Literacy Arab Spring 2011 Political Speeches Lexico 3 Egypt Bahrain Arabic Language Learning لسانية المتن تحليل إعلامي الدلالة التأويلية تعلم اللغة الثورات العربية خطب سياسية مصر البحرين

1

Page generated in 0.0478 seconds