Global ETD Search

1	Traitements linguistiques pour la reconnaissance automatique de la parole appliquée à la langue arabe : de l'arabe standard vers l'arabe dialectal Boujelbane Jarraya, Rahma 05 December 2015 (has links) Les différents dialectes de la langue arabe (DA) présentent de grandes variations phonologiques, morphologiques, lexicales et syntaxiques par rapport à la langue Arabe Standard Moderne (MSA). Jusqu’à récemment, ces dialectes n’étaient présents que sous leurs formes orales et la plupart des ressources existantes pour la langue arabe se limite à l’Arabe Standard (MSA), conduisant à une abondance d’outils pour le traitement automatique de cette variété. Étant donné les différences significatives entre le MSA et les DA, les performances de ces outils s’écroulent lors du traitement des DA. Cette situation conduit à une augmentation notable de l’ambiguïté dans les approches computationnelles des DA. Les travaux décrits dans cette thèse s’inscrivent dans ce cadre à travers la modélisation de l’oral parlé dans les médias tunisiens. Cette source de données contient une quantité importante d’Alternance Codique (AC) entre la langue normative MSA et le dialecte parlé en Tunisie (DT). La présence de ce dernier d’une manière désordonnée dans le discours pose une sérieuse problématique pour le Traitement Automatique de Langue et fait de cet oral une langue peu dotée. Toutefois, les ressources nécessaires pour modéliser cet oral sont quasiment inexistantes. Ainsi, l’objectif de cette thèse consiste à pallier ce manque afin de construire un modèle de langage dédié à un système de reconnaissance automatique pour l’oral parlé dans les médias tunisiens. Pour ce fait, nous décrivons dans cette thèse une méthodologie de création de ressources et nous l’évaluons par rapport à une tâche de modélisation de langage. Les résultats obtenu sont encourageants. / The different dialects of the arabic language have a large phonological, morphological, lexical and syntactic variations when compared to the standard written arabic language called MSA (Modern Standard Arabic). Until recently, these dialects were presented only in their oral form and most of the existing resources for the Arabic language is limited to the Standard Arabic (MSA), leading to an abundance of tools for the automatic processing of this variety. Given the significant differences between the MSA and DA, the performance of these tools fall down when processing AD. This situation leads to a significant increase of the ambiguity in computational approaches of AD.This thesis is part of this framework by modeling the oral spoken in the Tunisian media. This data source contains a significant amount of Code Switching (CS) between the normative language MSA and the Dialect spoken in Tunisia (DT). The presence of the latter in a disorderly manner in the discourse poses a serious problem for NLP (Natural Language Processing) and makes this oral a less resourced language. However, the resources required to model this oral are almost nonexistent. Thus, the objective of this thesis is to fill this gap in order to build a language model dedicated to an automatic recognition system for the oral spoken in the Tunisian media. For this reason, we describe in this thesis a resource generation methodologyand we evaluate it relative to a language modeling task. The results obtained are encouraging. Corpus oral Dialecte tunisien Modèle de langue Ressources Oral corpus Tunisian dialect Language model Resources 004
2	A practical approach to the standardisation and elaboration of Zulu as a technical language Van Huyssteen, Linda 30 November 2003 (has links) The lack of terminology in Zulu can be overcome if it is developed to meet international scientific and technical demands. This lack of terminology can be traced back to the absence of proper language policy implementation with regard to the African languages. Even though Zulu possesses the basic elements that are necessary for its development, such as orthographical standards, dictionaries, grammars and published literature, a number of problems exist within the technical elaboration and standardisation processes: * Inconsistencies in the application of standard rules, in relation to both orthography and terminology. * The lack of standardisation of the (technical) word-formation patterns in Zulu. (Generally the role of culture in elaboration has largely been overlooked). * The avoidance of exploiting written technical text corpora as a resource for terminology. (Text encoding by means of corpus query tools in term extraction has just begun in Zulu and needs to be properly exemplified). * The avoidance of introducing oral technical corpora as a resource for improving the acceptability of technical terminology by, for instance, designing a type of reusable corpus annotation. This study contributes towards solving these problems by offering a practical approach within the context of the real written, standard and oral Zulu language, mainly within the medical terminological domain. This approach offers a reusable methodological foundation with proper language exemplification that can guide terminologists in terminological research, or to some extent even train them, to achieve effective technical elaboration and eventual standardisation. This thesis aims at attaining consistent standardisation on the orthographical level in order to ease the elaboration task of the terminologist. It also aims at standardising the methods of word- (term-) formation linking them to cultural factors, such as taboo. However, this thesis also emphasises the significance of using written and oral technical corpora as terminology resource. This, for instance, is made possible through the application of corpus linguistics, in semi-automatic term extraction from a written technical corpus to aid lemmatisation (listing entries) and in corpus annotation to improve the acceptability of terminology, based on the comparison of standard terms with oral terms. / Linguistics / D. Litt et Phil. (Linguistics) Corpus planning Standardisation Zulu language elaboration Technical language Orthographical standard Word-formation Culture related aspects Corpus linguistics Frequency count Actual frequency Concordance Written corpus Oral corpus Text encoding Oral corpus annotation 496.39865 Zulu language -- Grammar
3	A practical approach to the standardisation and elaboration of Zulu as a technical language Van Huyssteen, Linda 30 November 2003 (has links) The lack of terminology in Zulu can be overcome if it is developed to meet international scientific and technical demands. This lack of terminology can be traced back to the absence of proper language policy implementation with regard to the African languages. Even though Zulu possesses the basic elements that are necessary for its development, such as orthographical standards, dictionaries, grammars and published literature, a number of problems exist within the technical elaboration and standardisation processes: * Inconsistencies in the application of standard rules, in relation to both orthography and terminology. * The lack of standardisation of the (technical) word-formation patterns in Zulu. (Generally the role of culture in elaboration has largely been overlooked). * The avoidance of exploiting written technical text corpora as a resource for terminology. (Text encoding by means of corpus query tools in term extraction has just begun in Zulu and needs to be properly exemplified). * The avoidance of introducing oral technical corpora as a resource for improving the acceptability of technical terminology by, for instance, designing a type of reusable corpus annotation. This study contributes towards solving these problems by offering a practical approach within the context of the real written, standard and oral Zulu language, mainly within the medical terminological domain. This approach offers a reusable methodological foundation with proper language exemplification that can guide terminologists in terminological research, or to some extent even train them, to achieve effective technical elaboration and eventual standardisation. This thesis aims at attaining consistent standardisation on the orthographical level in order to ease the elaboration task of the terminologist. It also aims at standardising the methods of word- (term-) formation linking them to cultural factors, such as taboo. However, this thesis also emphasises the significance of using written and oral technical corpora as terminology resource. This, for instance, is made possible through the application of corpus linguistics, in semi-automatic term extraction from a written technical corpus to aid lemmatisation (listing entries) and in corpus annotation to improve the acceptability of terminology, based on the comparison of standard terms with oral terms. / Linguistics and Modern Languages / D. Litt et Phil. (Linguistics) Corpus planning Standardisation Zulu language elaboration Technical language Orthographical standard Word-formation Culture related aspects Corpus linguistics Frequency count Actual frequency Concordance Written corpus Oral corpus Text encoding Oral corpus annotation 496.39865 Zulu language -- Grammar
4	Des ligateurs de cause : étude contrastive entre le français parlé à Paris et l’arabe parlé à Tripoli (Libye). Proprietés syntaxiques et fonctionnements pragmatico-discursifs / Ligators of cause : contrastive study between the spoken Arabic of Tripoli and the spoken French of Paris. Syntactic properties and pragmatic-discursive function Benmoftah, Najah 15 April 2016 (has links) Cette thèse en linguistique contrastive décrit et met en opposition les propriétés syntaxiques ainsi que les fonctionnements pragmatico-discursifs de parce que en français parlé dans le septième arrondissement de Paris et de certains de ses équivalents en arabe parlé à Tripoli (Libye) : liʔǝnna, ʕlēxāṭǝṛ, māhu et biḥkum.Pour ce qui concerne l’arabe parlé à Tripoli, ces ligateurs peuvent appartenir à deux classes grammaticales différentes : ils peuvent être des ligateurs conjonctionnels et/ou des ligateurs prépositionnels. Cela dépend de leur degré de grammaticalisation. Alors que liʔanna et māhu sont des ligateurs conjonctionnels qui introduisent des propositions subordonnées organisées autour de prédicats verbaux ou non-verbaux, ʕlēxāṭǝṛ et biḥkum peuvent s’employer comme ligateurs prépositionnels et introduire des compléments circonstanciels, ou grammaticalisés comme ligateurs conjonctionnels et introduire des propositions causales.De plus, ces ligateurs peuvent occuper une position canonique lorsque le ligateur suit une proposition principale et introduit une causale, ou une position non-canonique pour laquelle il existe deux cas de figure : soit l’énoncé commence par la causale qui est introduite par un ligateur de cause et la causale est suivie par la proposition principale, soit l’énoncé commence par la proposition principale qui est suivie par la causale qui n’est pas introduite par un ligateur de cause ; ce dernier se trouve en fin de causale et clôture l’énoncé. D’un point de vue pragmatique, la modification de l’ordre des constituants, lorsque les ligateurs et les causales ne sont pas en position canonique, permet de focaliser la causale.Contrairement à l’arabe de Tripoli, l’examen du Corpus du Français Parlé Parisien des années 2000 (CFPP2000) montre que parce que est un ligateur conjonctionnel et introduit des propositions causales qui s’organisent autour de prédicats verbaux, très rarement averbaux.De plus, parce que peut occuper une position canonique lorsque le ligateur suit une proposition principale et introduit une causale et non canonique lorsque parce que suit le présentatif c’est et introduit une causale. Mais il ne peut pas être postposé. Il n’accepte pas non plus de suffixe.En outre, parce que peut être repris mais sous la forme réduite « que », lorsqu’il introduit plusieurs propositions causales. On remarque alors une série de « que ».Parce que ne peut pas non plus relier deux énoncés coordonnés par la préposition et. D’un point de vue pragmatique lorsque l'énoncé commence par c'est parce que, cette structure permet de focaliser la causale. / This contrastive linguistic thesis describes and contrasts the syntactic properties and the pragmatic-discursive function of parce que in spoken French in the seventh district of Paris and some of its Arab equivalents in spoken Arabic of Tripoli (Libya): liʔǝnna, ʕlēxāṭǝṛ, māhu and biḥkum.Regarding the spoken Arabic of Tripoli, these ligators may belong to two different grammatical classes: they may be conjunctional ligators and / or prepositional ligators. It depends on their degree of grammaticalization. While liʔanna and māhu are conjunctional ligators that introduce causal clauses organized around verbal or non-verbal predicates, ʕlēxāṭǝṛ and biḥkum can be used as prepositional ligators and introduce circumstantial complements or be grammaticalized as conjunctional ligators and introduce causal clause.In addition, these ligators can occupy a canonical position when the ligator follows a main clause and introduces a causal clause or a non-canonical position for which there are two cases : either the utterance begins with the causal which is introduced by the ligator of cause and is followed by the main clause, or the utterance begins with the main clause which is followed by the causal not introduced by a ligator of cause; the latter is found at the end of the causal and closing the utterance. From a pragmatic point of view, changing the order of the constituents when ligators and causal clauses are not in canonical position allows the focalization of the causal clause.Unlike the spoken Arabic of Tripoli, the examination of the “Corpus Français Parlé Parsien des années 2000 (CFPP2000)” shows that parce que is conjunctional ligator. It introduces a causal clause organized around verbal predicate, rarely non-verbal.Parce que can occupy a canonical position when the ligator follows a main clause and introduces a causal clause and a non-canonical position when parce que follows c’est and introduces a causal clause. However, it cannot be postponed and it does not accept either suffix.When parce que introduces several causal clauses, it may be found but in reduced form que, giving a series of que.In addition, parce que cannot connect two utterances coordinated by the preposition et. From a pragmatic point of view, when the utterance begins with c’est parce que this structure allows to focalisation of the causal clause. Cause Igateurs de cause Français parlé à Paris Corpus oraux Causalité Subordination Arabe parlé à Tripoli Focalisation Cause Ligators of cause Spoken French in Paris Oral corpus Causality Subordination Spoken Arabic in Tripoli, Focalization

1

Page generated in 0.0574 seconds