Global ETD Search

21	Annotation syntaxico-sémantique des actants en corpus spécialisé Hadouche, Fadila 12 1900 (has links) L’annotation en rôles sémantiques est une tâche qui permet d’attribuer des étiquettes de rôles telles que Agent, Patient, Instrument, Lieu, Destination etc. aux différents participants actants ou circonstants (arguments ou adjoints) d’une lexie prédicative. Cette tâche nécessite des ressources lexicales riches ou des corpus importants contenant des phrases annotées manuellement par des linguistes sur lesquels peuvent s’appuyer certaines approches d’automatisation (statistiques ou apprentissage machine). Les travaux antérieurs dans ce domaine ont porté essentiellement sur la langue anglaise qui dispose de ressources riches, telles que PropBank, VerbNet et FrameNet, qui ont servi à alimenter les systèmes d’annotation automatisés. L’annotation dans d’autres langues, pour lesquelles on ne dispose pas d’un corpus annoté manuellement, repose souvent sur le FrameNet anglais. Une ressource telle que FrameNet de l’anglais est plus que nécessaire pour les systèmes d’annotation automatisé et l’annotation manuelle de milliers de phrases par des linguistes est une tâche fastidieuse et exigeante en temps. Nous avons proposé dans cette thèse un système automatique pour aider les linguistes dans cette tâche qui pourraient alors se limiter à la validation des annotations proposées par le système. Dans notre travail, nous ne considérons que les verbes qui sont plus susceptibles que les noms d’être accompagnés par des actants réalisés dans les phrases. Ces verbes concernent les termes de spécialité d’informatique et d’Internet (ex. accéder, configurer, naviguer, télécharger) dont la structure actancielle est enrichie manuellement par des rôles sémantiques. La structure actancielle des lexies verbales est décrite selon les principes de la Lexicologie Explicative et Combinatoire, LEC de Mel’čuk et fait appel partiellement (en ce qui concerne les rôles sémantiques) à la notion de Frame Element tel que décrit dans la théorie Frame Semantics (FS) de Fillmore. Ces deux théories ont ceci de commun qu’elles mènent toutes les deux à la construction de dictionnaires différents de ceux issus des approches traditionnelles. Les lexies verbales d’informatique et d’Internet qui ont été annotées manuellement dans plusieurs contextes constituent notre corpus spécialisé. Notre système qui attribue automatiquement des rôles sémantiques aux actants est basé sur des règles ou classificateurs entraînés sur plus de 2300 contextes. Nous sommes limités à une liste de rôles restreinte car certains rôles dans notre corpus n’ont pas assez d’exemples annotés manuellement. Dans notre système, nous n’avons traité que les rôles Patient, Agent et Destination dont le nombre d’exemple est supérieur à 300. Nous avons crée une classe que nous avons nommé Autre où nous avons rassemblé les autres rôles dont le nombre d’exemples annotés est inférieur à 100. Nous avons subdivisé la tâche d’annotation en sous-tâches : identifier les participants actants et circonstants et attribuer des rôles sémantiques uniquement aux actants qui contribuent au sens de la lexie verbale. Nous avons soumis les phrases de notre corpus à l’analyseur syntaxique Syntex afin d’extraire les informations syntaxiques qui décrivent les différents participants d’une lexie verbale dans une phrase. Ces informations ont servi de traits (features) dans notre modèle d’apprentissage. Nous avons proposé deux techniques pour l’identification des participants : une technique à base de règles où nous avons extrait une trentaine de règles et une autre technique basée sur l’apprentissage machine. Ces mêmes techniques ont été utilisées pour la tâche de distinguer les actants des circonstants. Nous avons proposé pour la tâche d’attribuer des rôles sémantiques aux actants, une méthode de partitionnement (clustering) semi supervisé des instances que nous avons comparée à la méthode de classification de rôles sémantiques. Nous avons utilisé CHAMÉLÉON, un algorithme hiérarchique ascendant. / Semantic role annotation is a process that aims to assign labels such as Agent, Patient, Instrument, Location, etc. to actants or circumstants (also called arguments or adjuncts) of predicative lexical units. This process often requires the use of rich lexical resources or corpora in which sentences are annotated manually by linguists. The automatic approaches (statistical or machine learning) are based on corpora. Previous work was performed for the most part in English which has rich resources, such as PropBank, VerbNet and FrameNet. These resources were used to serve the automated annotation systems. This type of annotation in other languages for which no corpora of annotated sentences are available often use FrameNet by projection. Although a resource such as FrameNet is necessary for the automated annotation systems and the manual annotation by linguists of a large number of sentences is a tedious and time consuming work. We have proposed an automated system to help linguists in this task so that they have only to validate annotations proposed. Our work focuses on verbs that are more likely than other predicative units (adjectives and nouns) to be accompanied by actants realized in sentences. These verbs are specialized terms of the computer science and Internet domains (ie. access, configure, browse, download) whose actantial structures have been annotated manually with semantic roles. The actantial structure is based on principles of Explanatory and Combinatory Lexicology, LEC of Mel’čuk and appeal in part (with regard to semantic roles) to the notion of Frame Element as described in the theory of frame semantics (FS) of Fillmore. What these two theories have in common is that they lead to the construction of dictionaries different from those resulting from the traditional theories. These manually annotated verbal units in several contexts constitute the specialized corpus that our work will use. Our system designed to assign automatically semantic roles to actants is based on rules and classifiers trained on more than 2300 contexts. We are limited to a restricted list of roles for certain roles in our corpus have not enough examples manually annotated. In our system, we addressed the roles Patient, Agent and destination that the number of examples is greater than 300. We have created a class that we called Autre which we bring to gether the other roles that the number of annotated examples is less than 100. We subdivided the annotation task in the identification of participant actants and circumstants and the assignment of semantic roles to actants that contribute to the sense of the verbal lexical unit. We parsed, with Syntex, the sentences of the corpus to extract syntactic informations that describe the participants of the verbal lexical unit in the sentence. These informations are used as features in our learning model. We have proposed two techniques for the task of participant detection: the technique based in rules and machine learning. These same techniques are used for the task of classification of these participants into actants and circumstants. We proposed to the task of assigning semantic roles to the actants, a partitioning method (clustering) semi supervised of instances that we have compared to the method of semantic role classification. We used CHAMELEON, an ascending hierarchical algorithm. Actant Circonstant Rôles sémantiques Traits syntaxiques Classification Clustering Algorithme Chaméléon Frame semantics (FS) DicoInfo et FrameNet Actant Circumstant Semantic roles Syntactic features Classification Clustering Chaméléon algorithm Frame semantics DicoInfo and Framenet
22	Annotation syntaxico-sémantique des actants en corpus spécialisé Hadouche, Fadila 12 1900 (has links) L’annotation en rôles sémantiques est une tâche qui permet d’attribuer des étiquettes de rôles telles que Agent, Patient, Instrument, Lieu, Destination etc. aux différents participants actants ou circonstants (arguments ou adjoints) d’une lexie prédicative. Cette tâche nécessite des ressources lexicales riches ou des corpus importants contenant des phrases annotées manuellement par des linguistes sur lesquels peuvent s’appuyer certaines approches d’automatisation (statistiques ou apprentissage machine). Les travaux antérieurs dans ce domaine ont porté essentiellement sur la langue anglaise qui dispose de ressources riches, telles que PropBank, VerbNet et FrameNet, qui ont servi à alimenter les systèmes d’annotation automatisés. L’annotation dans d’autres langues, pour lesquelles on ne dispose pas d’un corpus annoté manuellement, repose souvent sur le FrameNet anglais. Une ressource telle que FrameNet de l’anglais est plus que nécessaire pour les systèmes d’annotation automatisé et l’annotation manuelle de milliers de phrases par des linguistes est une tâche fastidieuse et exigeante en temps. Nous avons proposé dans cette thèse un système automatique pour aider les linguistes dans cette tâche qui pourraient alors se limiter à la validation des annotations proposées par le système. Dans notre travail, nous ne considérons que les verbes qui sont plus susceptibles que les noms d’être accompagnés par des actants réalisés dans les phrases. Ces verbes concernent les termes de spécialité d’informatique et d’Internet (ex. accéder, configurer, naviguer, télécharger) dont la structure actancielle est enrichie manuellement par des rôles sémantiques. La structure actancielle des lexies verbales est décrite selon les principes de la Lexicologie Explicative et Combinatoire, LEC de Mel’čuk et fait appel partiellement (en ce qui concerne les rôles sémantiques) à la notion de Frame Element tel que décrit dans la théorie Frame Semantics (FS) de Fillmore. Ces deux théories ont ceci de commun qu’elles mènent toutes les deux à la construction de dictionnaires différents de ceux issus des approches traditionnelles. Les lexies verbales d’informatique et d’Internet qui ont été annotées manuellement dans plusieurs contextes constituent notre corpus spécialisé. Notre système qui attribue automatiquement des rôles sémantiques aux actants est basé sur des règles ou classificateurs entraînés sur plus de 2300 contextes. Nous sommes limités à une liste de rôles restreinte car certains rôles dans notre corpus n’ont pas assez d’exemples annotés manuellement. Dans notre système, nous n’avons traité que les rôles Patient, Agent et Destination dont le nombre d’exemple est supérieur à 300. Nous avons crée une classe que nous avons nommé Autre où nous avons rassemblé les autres rôles dont le nombre d’exemples annotés est inférieur à 100. Nous avons subdivisé la tâche d’annotation en sous-tâches : identifier les participants actants et circonstants et attribuer des rôles sémantiques uniquement aux actants qui contribuent au sens de la lexie verbale. Nous avons soumis les phrases de notre corpus à l’analyseur syntaxique Syntex afin d’extraire les informations syntaxiques qui décrivent les différents participants d’une lexie verbale dans une phrase. Ces informations ont servi de traits (features) dans notre modèle d’apprentissage. Nous avons proposé deux techniques pour l’identification des participants : une technique à base de règles où nous avons extrait une trentaine de règles et une autre technique basée sur l’apprentissage machine. Ces mêmes techniques ont été utilisées pour la tâche de distinguer les actants des circonstants. Nous avons proposé pour la tâche d’attribuer des rôles sémantiques aux actants, une méthode de partitionnement (clustering) semi supervisé des instances que nous avons comparée à la méthode de classification de rôles sémantiques. Nous avons utilisé CHAMÉLÉON, un algorithme hiérarchique ascendant. / Semantic role annotation is a process that aims to assign labels such as Agent, Patient, Instrument, Location, etc. to actants or circumstants (also called arguments or adjuncts) of predicative lexical units. This process often requires the use of rich lexical resources or corpora in which sentences are annotated manually by linguists. The automatic approaches (statistical or machine learning) are based on corpora. Previous work was performed for the most part in English which has rich resources, such as PropBank, VerbNet and FrameNet. These resources were used to serve the automated annotation systems. This type of annotation in other languages for which no corpora of annotated sentences are available often use FrameNet by projection. Although a resource such as FrameNet is necessary for the automated annotation systems and the manual annotation by linguists of a large number of sentences is a tedious and time consuming work. We have proposed an automated system to help linguists in this task so that they have only to validate annotations proposed. Our work focuses on verbs that are more likely than other predicative units (adjectives and nouns) to be accompanied by actants realized in sentences. These verbs are specialized terms of the computer science and Internet domains (ie. access, configure, browse, download) whose actantial structures have been annotated manually with semantic roles. The actantial structure is based on principles of Explanatory and Combinatory Lexicology, LEC of Mel’čuk and appeal in part (with regard to semantic roles) to the notion of Frame Element as described in the theory of frame semantics (FS) of Fillmore. What these two theories have in common is that they lead to the construction of dictionaries different from those resulting from the traditional theories. These manually annotated verbal units in several contexts constitute the specialized corpus that our work will use. Our system designed to assign automatically semantic roles to actants is based on rules and classifiers trained on more than 2300 contexts. We are limited to a restricted list of roles for certain roles in our corpus have not enough examples manually annotated. In our system, we addressed the roles Patient, Agent and destination that the number of examples is greater than 300. We have created a class that we called Autre which we bring to gether the other roles that the number of annotated examples is less than 100. We subdivided the annotation task in the identification of participant actants and circumstants and the assignment of semantic roles to actants that contribute to the sense of the verbal lexical unit. We parsed, with Syntex, the sentences of the corpus to extract syntactic informations that describe the participants of the verbal lexical unit in the sentence. These informations are used as features in our learning model. We have proposed two techniques for the task of participant detection: the technique based in rules and machine learning. These same techniques are used for the task of classification of these participants into actants and circumstants. We proposed to the task of assigning semantic roles to the actants, a partitioning method (clustering) semi supervised of instances that we have compared to the method of semantic role classification. We used CHAMELEON, an ascending hierarchical algorithm. Actant Circonstant Rôles sémantiques Traits syntaxiques Classification Clustering Algorithme Chaméléon Frame semantics (FS) DicoInfo et FrameNet Actant Circumstant Semantic roles Syntactic features Classification Clustering Chaméléon algorithm Frame semantics DicoInfo and Framenet
23	Description du lexique spécialisé chinois et constitution d'une ressource didactique adaptée pour locuteurs non sinophones Han, Zhiwei 10 1900 (has links) L’enseignement-apprentissage du lexique spécialisé chinois est un chemin semé d’obstacles. Pour les apprenants non natifs, les combinaisons lexicales spécialisées (CLS) (L’Homme, 2000) soulèvent des difficultés syntaxico sémantiques et représentent ainsi un défi majeur dans l’acquisition de compétences lexicales. On recense, toutefois, peu de propositions méthodologiques pour résoudre ces difficultés dans la littérature consacrée à la didactique du chinois sur objectifs spécifiques (COS) (Q. Li, 2011). Dans cette recherche, nous nous attachons à explorer de quelle manière une méthode de description lexicale basée sur une représentation sémantique et syntaxique assiste les apprenants non natifs dans la résolution des problèmes lexicaux soulevés par les CLS. Notre thèse vise à concevoir une méthode de description des CLS en vue de la résolution de difficultés lexicales par les locuteurs non sinophones. La méthode mise au point est appliquée à l’élaboration du dictionnaire CHINOINFO, une ressource lexicale chinois-français portant sur le domaine de l’informatique. Cette ressource s’adresse aux apprenants francophones du chinois. L’objectif secondaire de notre thèse consiste à évaluer l’efficacité du CHINOINFO auprès des apprenants francophones qui reçoivent une formation de chinois dans un établissement universitaire au Québec ou en Chine. Notre recherche fait appel à des notions empruntées à trois cadres théoriques. Premièrement, la Lexicologie explicative et combinatoire (Mel’čuk et al., 1995) nous sert d’appui théorique pour fonder la description des CLS sur la représentation sémantique du lexique spécialisé. Deuxièmement, notre démarche de collecte et d’analyse des CLS est guidée par l’approche lexico sémantique à la terminologie (L’Homme, 2020a). Enfin, nous nous appuyons sur l’approche cognitive en didactique des langues secondes (Chastain, 1990) pour envisager une présentation structurée des connaissances lexicales. Notre démarche méthodologique s’est déroulée en trois phases. Nous avons d’abord assemblé un corpus spécialisé chinois pour en extraire un échantillon de CLS et les renseignements permettant de les décrire. L’analyse des données collectées à partir du corpus nous a amenée à anticiper trois types de difficultés syntaxico-sémantiques soulevées par les CLS : 1) distinguer les acceptions d’un polysème dans différentes CLS; 2) différencier les sens distincts de CLS de forme identique; 3) choisir les cooccurrents appropriés d’un terme. À la deuxième phase, nous avons mobilisé différentes stratégies pour décrire les propriétés syntaxico-sémantiques des CLS. Une méthode descriptive intégrant les solutions proposées a ensuite été appliquée à la création du CHINOINFO. Cette ressource en ligne répertorie 91 termes fondamentaux du domaine de l’informatique, pour lesquels nous avons encodé au total 282 termes reliés et 644 CLS. La structuration des données au sein des articles s’est largement inspirée de l’adaptation du DiCoInfo (Observatoire de linguistique Sens-Texte, 2022) à un dictionnaire d’apprentissage (Alipour, 2014). Différents moyens techniques ont été mis en œuvre pour assurer la convivialité de la ressource. La dernière phase de notre recherche consiste en une expérimentation comparative visant à évaluer l’efficacité pédagogique du CHINOINFO. Nous avons fait passer un test lexical à deux groupes d’apprenants francophones, soit le groupe contrôle (GC) et le groupe expérimental (GE), en leur proposant un nombre d’outils de référence. Seul le GE a eu accès à CHINOINFO. Nous avons aussi collecté, au moyen de questionnaires de sondage, le profil des participants ainsi que leur appréciation sur le test et les outils de référence proposés. De manière générale, l’analyse comparative des résultats du test lexical montre que le GE a mieux réussi à résoudre les trois types de difficultés soulevées par les CLS. Les participants étaient plutôt satisfaits de l’organisation du test. Le GE a eu moins de difficultés à réaliser le test puisqu’il se sentait mieux outillé pour trouver des éléments de réponses aux questions du test par rapport GC. Le GE s’est exprimé favorablement quant à l’utilité du CHINOINFO pour résoudre les problèmes lexicaux dans le cadre de notre expérimentation. Pour conclure, les résultats de notre analyse fournissent des indices sur l’apport du CHINOINFO en tant qu’une ressource d’apprentissage des CLS, ce qui laisse entrevoir l’intérêt de la méthode de description lexicale que nous avons proposée dans un contexte pédagogique. / The teaching and learning of Chinese specialized lexicon is a path strewn with obstacles. For non-native learners, specialized lexical combinations (SLCs) (L’Homme, 2000) raise syntactic and semantic difficulties and thus represent a major challenge in the acquisition of lexical skills. However, there are few methodological proposals to solve these difficulties in the literature devoted to the teaching practice and applied research of Chinese for specific purposes (Q. Li, 2011). In this research, we explore how a lexical description method based on semantic and syntactic representation assists non-native learners in solving lexical problems raised by SLCs. This thesis aims at designing a method for describing SLCs to help non-Chinese speakers solve lexical difficulties. The proposed method is applied to develop CHINOINFO, a Chinese-French dictionary of computer science and information technology terms. This lexical resource is designed for French-speaking learners of Chinese and can also be used as a writing tool for language professionals (translators, technical writers, and proofreaders), as well as professionals in this field. The secondary objective of this thesis is to evaluate the pedagogical effectiveness of the developed resource among French-speaking university students. This research draws on concepts derived from three theoretical frameworks. Firstly, the Explanatory and Combinatorial Lexicology (Mel’čuk et al., 1995) provides theoretical support for founding the lexical description on the representation of semantic features of the specialized lexicon. Secondly, the collection and analysis of SLCs are guided by the lexical-semantic approach to terminology (L’Homme, 2020a). Finally, we draw on the cognitive approach to second language didactics (Chastain, 1990) to explore the effective ways to organize and present the descriptive information of the specialized lexicon. Our methodological approach was carried out in three stages. We started by assembling a specialized Chinese corpus to extract a sample of SLCs and their descriptive information. The analysis of the data collected from the corpus led us to anticipate three types of syntaxico-semantic difficulties raised by SLCs: 1) distinguishing polysemes in different SLCs; 2) identifying, in a given context, the meaning of a lexical combination that is syntactically ambiguous; and 3) selecting appropriate co-occurrents for a term. In the second stage, we deployed different strategies to describe the syntaxico-semantic features of SLCs. Subsequently, a descriptive method that incorporates the proposed solutions has been applied to the creation of CHINOINFO. This online lexical resource contains 91 basic terms related to computer science and information technology. For these terms, we encoded a total of 282 related terms and 644 SLCs. The organization of content in the entries has been largely inspired by the conversion of DiCoInfo (Observatoire de linguistique Sens-Texte, 2022) into a learner’s dictionary (Alipour, 2014). We used various techniques to make the resource user-friendly. The final stage of our research consists of a comparative experiment to evaluate the pedagogical effectiveness of CHINOINFO. We had two groups of French-speaking learners, the control group (CG) and the experimental group (EG), take a lexical test by providing them with several reference materials. Only the EG had access to CHINOINFO during the test. We also collected the information about the participants' learning profile and their appreciation of the test and the proposed reference materials. Overall, the comparative analysis of the test results shows that the GE succeeded better in solving the three types of difficulties raised by the SLCs. The participants were quite satisfied with the organization of the lexical test. The EG encountered less difficulty in answering questions during the test since they felt better equipped to find elements of answers in the reference materials than the GC. The EG commented favorably on the utility of CHINOINFO in solving lexical problems. To conclude, the results of our experiment provide clues about the pedagogical interest of CHINOINFO as a SLC learning resource, which suggests the relevance of the lexical description method we proposed in a pedagogical context. lexique spécialisé combinaison lexicale spécialisée polysémie Lexicologie explicative et combinatoire approche lexico-sémantique chinois sur objectifs spécifiques ressource lexicale approche basée sur le corpus specialized lexicon specialized lexical combination polysemy Explanatory and Combinatorial Lexicology lexical-semantic approach Cognitive-code learning theory Chinese for Specific Purposes lexical resource corpus-based approach

Page generated in 0.0517 seconds