Spelling suggestions: "subject:"modern standard arabic"" "subject:"modern standard krabic""
1 |
Assellema, ça va? : aspects of ethnolinguistic vitality, language attitudes and behaviour in TunisiaLawson, Sarah Rosemary January 2001 (has links)
No description available.
|
2 |
Automatic Readability Detection for Modern Standard ArabicForsyth, Jonathan Neil 19 March 2014 (has links) (PDF)
Research for automatic readability prediction of text has increased in the last decade and has shown that various machine learning methods can effectively address this problem. Many researchers have applied machine learning to readability prediction for English, while Modern Standard Arabic (MSA) has received little attention. Here I describe a system which leverages machine learning to automatically predict the readability of MSA. I gathered a corpus comprising 179 documents that were annotated with the Interagency Language Roundtable (ILR) levels. Then, I extracted lexical and discourse features from each document. Finally, I applied the Tilburg Memory-Based Learning (TiMBL) machine learning system to read these features and predict the ILR level of each document using 10-fold cross validation for both 3-level and 5-level classification tasks and an 80/20 division for a 5-level classification task. I measured performance using the F-score. For 3-level and 5-level classifications my system achieved F-scores of 0.719 and 0.519 respectively. I discuss the implication of these results and the possibility of future development.
|
3 |
Conditional Sentences in Egyptian Colloquial and Modern Standard Arabic: A Corpus StudyBentley, Randell S. 01 March 2015 (has links) (PDF)
This thesis examines the difference between conditional phrases in Egyptian Colloquial (EC) and Modern Standard Arabic (MSA). It focuses on two different conditional particles 'iḏa and law. Verb tenses featured after the conditional particle determine the difference between EC and MSA usage. Grammars for EC and MSA provide a prescriptive approach for a comparison with empirical data from Arabic corpora. The study uses data from the ArabiCorpus along with a corpus of Egyptian Colloquial that were compiled specifically for this study. The results of this study demonstrate that each particle (‘iḏa and law) and register (EC and MSA) favors a certain tense. Also, the data contrast with rules prescribed by grammars for MSA. Present tense verbs appear in the proposed condition for particle law a total of 22 out of 400 tokens (5.5%). Verb tense also plays an important role in determining the connecting particle for MSA sentences. The results demonstrate that the selection of connecting particles for law does not occur by chance but is instead systematic in nature. An apodosis containing a past tense verb strongly favors the connecter la, while one with a non-past tense verb strongly favors the connector fa.
|
4 |
Exploiting phonological constraints and automatic identification of speaker classes for Arabic speech recognitionAlsharhan, Iman January 2014 (has links)
The aim of this thesis is to investigate a number of factors that could affect the performance of an Arabic automatic speech understanding (ASU) system. The work described in this thesis belongs to the speech recognition (ASR) phase, but the fact that it is part of an ASU project rather than a stand-alone piece of work on ASR influences the way in which it will be carried out. Our main concern in this work is to determine the best way to exploit the phonological properties of the Arabic language in order to improve the performance of the speech recogniser. One of the main challenges facing the processing of Arabic is the effect of the local context, which induces changes in the phonetic representation of a given text, thereby causing the recognition engine to misclassifiy it. The proposed solution is to develop a set of language-dependent grapheme-to-allophone rules that can predict such allophonic variations and eventually provide a phonetic transcription that is sensitive to the local context for the ASR system. The novel aspect of this method is that the pronunciation of each word is extracted directly from a context-sensitive phonetic transcription rather than a predened dictionary that typically does not reect the actual pronunciation of the word. Besides investigating the boundary effect on pronunciation, the research also seeks to address the problem of Arabic's complex morphology. Two solutions are proposed to tackle this problem, namely, using underspecified phonetic transcription to build the system, and using phonemes instead of words to build the hidden markov models (HMMS). The research also seeks to investigate several technical settings that might have an effect on the system's performance. These include training on the sub-population to minimise the variation caused by training on the main undifferentiated population, as well as investigating the correlation between training size and performance of the ASR system.
|
5 |
Grammatical Aspects of Rural Palestinian ArabicJanuary 2019 (has links)
abstract: ABSTRACT
This study explores some grammatical aspects of Rural Palestinian Arabic (RPA), spoken in the vicinity of the city of Tulkarm in the Northwest part of the West Bank, and compares the variety to Modern Standard Arabic (MSA) and Urban Palestinian Arabic (UPA). The study introduces an overview of the Arabic language and its colloquial dialects and the status of diglossia in the Arab world. Subject-verb agreement in MSA and RPA is also discussed.
The focus of this study is on the pronominal system and negation in both MSA and RPA. It investigates the correlations between dependent subject pronouns and independent pronouns and their phonological and syntactic relationships. I argue that dependent subject pronouns are reduced forms of the independent subject pronoun. The study explains how dependent subject pronouns are formed by deleting the initial syllable, except for the first person singular and the third person masculine plural, which use suppletive forms instead. Dependent object pronouns are also derived from their independent counterparts by the deletion of the second syllable, with the exception of third person plural pronouns, which take the same form as clitics attached to their hosts.
I argue that dependent subject pronouns are agreement affixes used to mark verb argument features, whereas pronominal object and possessive pronouns are clitics attached to their hosts, which can be verbs, nouns, prepositions, and quantifiers. This study investigates other uses of subject pronouns, such as the use of third person pronouns as copulas in both MSA and RPA. Additionally, third person pronouns are used as question pronouns for yes/no questions in RPA.
The dissertation also explores the morphosyntactic properties of sentential negation in RPA in comparison to sentential negation in MSA. The study shows that the negative markers ma: and -iš are used to negate perfective and imperfective verbs, while muš precedes non-verbal predicates, such as adjectives, prepositional phrases (PPs), and participles. The main predicate in the negative phrase does not need the noun phrase (NP) to raise to T if there is no need to merge with the negative element.
Keywords: Standard Arabic, Rural Palestinian Arabic, Urban Palestinian Arabic, independent pronouns, dependent pronouns, pronominal clitics, copula pronouns, negation / Dissertation/Thesis / Doctoral Dissertation English 2019
|
6 |
Grammatical Gender Processing in Standard Arabic as a First and a Second LanguageAlamry, Ali 17 December 2019 (has links)
The present dissertation investigates grammatical gender representation and processing in Modern Standard Arabic (MSA) as a first (L1) and a second (L2) language. It mainly examines whether L2 can process gender agreement in a native-like manner, and the extent to which L2 processing is influenced by the properties of the L2 speakers’ L1. Additionally, it examines whether L2 gender agreement processing is influenced by noun animacy (animate and inanimate) and word order (verb-subject and subject-verb). A series of experiments using both online and offline techniques were conducted to address these questions. In all of the experiments, gender agreement between verb and nouns was examined. The first series of experiments examined native speakers of MSA (n=49) using a self-paced reading task (SPR), an event-related potential (ERP) experiment, and a grammaticality judgment (GJ) task. Results of these experiments revealed that native speakers were sensitive to grammatical violations. Native speakers showed longer reaction times (RT) in the SPR task, and a P600 effect in the ERP, in responses to sentences with mismatched gender agreement as compared to sentences with matched gender agreement. They also performed at ceiling in the GJ task. The second series of experiments examined L2 speakers of MSA (n=74) using an SPR task, and a GJ task. Both experiments included adult L2 speakers whom were divided into two subgroups, -Gender and +Gender, based on whether or not their L1s has a grammatical gender system. The results of both experiments revealed that both groups were sensitive to gender agreement violations. The L2 speakers showed longer RTs, in the SPR task, in responses to sentences with mismatched gender agreement as compared to sentences with matched gender agreement. No difference was found between the L2 groups in this task. The L2 speakers also performed well in the GJ task, as they were able to correctly identify the grammatical and ungrammatical sentences. Interestingly in this task, the -Gender group outperformed +Gender group, which could be due to proficiency in the L2 as the former group obtained a better score on the proficiency task, or it could be that +Gender group showed negative transfer from their L1s. Based on the results of these two experiments, this dissertation argues that late L2 speakers are not restricted to their L1 grammar, and thus, they are able to acquire gender agreement system of their L2 even if this feature is not instantiated in their L1. The results provide converging evidence for the FTFA rather than FFFH model, as it appears that the -Gender group was able to reset their L1 gender parameter according to the L2 gender values. Although the L2 speakers were advanced, they showed slower RTs than the native speakers in the SPR task, and lower accuracy in the GJT. However, it is possible that they are still in the process of acquiring gender agreement of MSA and have not reached their final stage of acquisition. This is supported by the fact that some L2 speakers from both -Gender and +Gender groups performed as well as native speakers in both SPR and GJ tasks. Regarding the effect of animacy, the L2 speakers had slower RT and lower accuracy on sentences with inanimate nouns than on those with animate ones, which is in line with previous L2 studies (Anton-Medez, 1999; Alarcón, 2009; Gelin, & Bugaiska, 2014). The native speakers, on the other hand, showed no effect of animacy in both SPR task and GJT. Further, no N400 effect was observed as a result of semantic gender agreement violations in the ERP experiment. Finally, the results revealed a potential effect of word order. Both the native and L2 speakers showed longer RTs on VS word order than SV word order in the SPR task. Further the native speakers showed earlier and greater P600 effect on VS word order than SV word order in the ERP. This result suggests that processing gender agreement violation is more complex in the VS word order than in the SV word order due to the inherent asymmetry in the subject-verb agreement system in the two-word orders in MSA.
|
7 |
Traitement automatique du dialecte tunisien à l'aide d'outils et de ressources de l'arabe standard : application à l'étiquetage morphosyntaxique / Natural Language Processing Of Tunisian Dialect using Standard Arabic Tools and Resources : application to Part-Of-Speech TaggingHamdi, Ahmed 04 December 2015 (has links)
Le développement d’outils de traitement automatique pour les dialectes de l’arabe se heurte à l’absence de ressources pour ces derniers. Comme conséquence d’une situation de diglossie, il existe une variante de l’arabe, l’arabe moderne standard, pour laquelle de nombreuses ressources ont été développées et ont permis de construire des outils de traitement automatique de la langue. Étant donné la proximité des dialectes de l’arabe, avec l’arabe moderne standard, une voie consiste à réaliser une conversion surfacique du dialecte vers l’arabe mo- derne standard afin de pouvoir utiliser les outils existants pour l’arabe standard. Dans ce travail, nous nous intéressons particulièrement au traitement du dialecte tunisien. Nous proposons un système de conversion du tunisien vers une forme approximative de l’arabe standard pour laquelle l’application des outils conçus pour ce dernier permet d’obtenir de bons résultats. Afin de valider cette approche, nous avons eu recours à un étiqueteur morphosyntaxique conçu pour l’étiquetage de l’arabe standard. Ce dernier permet d’assigner des étiquettes morphosyntaxiques à la sortie de notre système de conver- sion. Ces étiquettes sont finalement projetées sur le tunisien. Notre système atteint une précision de 89% suite à la conversion qui repré- sente une augmentation absolue de ∼20% par rapport à l’étiquetage d’avant la conversion. / Developing natural language processing tools usually requires a large number of resources (lexica, annotated corpora, ...), which often do not exist for less- resourced languages. One way to overcome the problem of lack of resources is to devote substantial efforts to build new ones from scratch. Another approach is to exploit existing resources of closely related languages. Taking advantage of the closeness of standard Arabic and its dialects, one way to solve the problem of limited resources, consists in performing a conversion of Arabic dialects into standard Arabic in order to use the tools developed to handle the latter. In this work, we focus especially on processing Tunisian Arabic dialect. We propose a conversion system of Tunisian into a closely form of standard Arabic for which the application of natural language processing tools designed for the latter provides good results. In order to validate our approach, we focused on part-of-speech tagging. Our system achieved an accuracy of 89% which presents ∼20% of absolute improvement over a standard Arabic tagger baseline.
|
8 |
Du terme prédicatif au cadre sémantique : méthodologie de compilation d'une ressource terminologique pour les termes arabes de l'informatiqueGhazzawi, Nizar 08 1900 (has links)
La description des termes dans les ressources terminologiques traditionnelles se limite à certaines informations, comme le terme (principalement nominal), sa définition et son équivalent dans une langue étrangère. Cette description donne rarement d’autres informations qui peuvent être très utiles pour l’utilisateur, surtout s’il consulte les ressources dans le but d’approfondir ses connaissances dans un domaine de spécialité, maitriser la rédaction professionnelle ou trouver des contextes où le terme recherché est réalisé. Les informations pouvant être utiles dans ce sens comprennent la description de la structure actancielle des termes, des contextes provenant de sources authentiques et l’inclusion d’autres parties du discours comme les verbes.
Les verbes et les noms déverbaux, ou les unités terminologiques prédicatives (UTP), souvent ignorés par la terminologie classique, revêtent une grande importance lorsqu’il s’agit d’exprimer une action, un processus ou un évènement. Or, la description de ces unités nécessite un modèle de description terminologique qui rend compte de leurs particularités. Un certain nombre de terminologues (Condamines 1993, Mathieu-Colas 2002, Gross et Mathieu-Colas 2001 et L’Homme 2012, 2015) ont d’ailleurs proposé des modèles de description basés sur différents cadres théoriques.
Notre recherche consiste à proposer une méthodologie de description terminologique des UTP de la langue arabe, notamment l’arabe standard moderne (ASM), selon la théorie de la Sémantique des cadres (Frame Semantics) de Fillmore (1976, 1977, 1982, 1985) et son application, le projet FrameNet (Ruppenhofer et al. 2010). Le domaine de spécialité qui nous intéresse est l’informatique. Dans notre recherche, nous nous appuyons sur un corpus recueilli du web et nous nous inspirons d’une ressource terminologique existante, le DiCoInfo (L’Homme 2008), pour compiler notre propre ressource. Nos objectifs se résument comme suit. Premièrement, nous souhaitons jeter les premières bases d’une version en ASM de cette ressource. Cette version a ses propres particularités : 1) nous visons des unités bien spécifiques, à savoir les UTP verbales et déverbales; 2) la méthodologie développée pour la compilation du DiCoInfo original devra être adaptée pour prendre en compte une langue sémitique. Par la suite, nous souhaitons créer une version en cadres de cette ressource, où nous regroupons les UTP dans des cadres sémantiques, en nous inspirant du modèle de FrameNet. À cette ressource, nous ajoutons les UTP anglaises et françaises, puisque cette partie du travail a une portée multilingue.
La méthodologie consiste à extraire automatiquement les unités terminologiques verbales et nominales (UTV et UTN), comme Ham~ala (حمل) (télécharger) et taHmiyl (تحميل) (téléchargement). Pour ce faire, nous avons adapté un extracteur automatique existant, TermoStat (Drouin 2004). Ensuite, à l’aide des critères de validation terminologique (L’Homme 2004), nous validons le statut terminologique d’une partie des candidats. Après la validation, nous procédons à la création de fiches terminologiques, à l’aide d’un éditeur XML, pour chaque UTV et UTN retenue. Ces fiches comprennent certains éléments comme la structure actancielle des UTP et jusqu’à vingt contextes annotés. La dernière étape consiste à créer des cadres sémantiques à partir des UTP de l’ASM. Nous associons également des UTP anglaises et françaises en fonction des cadres créés. Cette association a mené à la création d’une ressource terminologique appelée « DiCoInfo : A Framed Version ». Dans cette ressource, les UTP qui partagent les mêmes propriétés sémantiques et structures actancielles sont regroupées dans des cadres sémantiques. Par exemple, le cadre sémantique Product_development regroupe des UTP comme Taw~ara (طور) (développer), to develop et développer.
À la suite de ces étapes, nous avons obtenu un total de 106 UTP ASM compilées dans la version en ASM du DiCoInfo et 57 cadres sémantiques associés à ces unités dans la version en cadres du DiCoInfo. Notre recherche montre que l’ASM peut être décrite avec la méthodologie que nous avons mise au point. / The description of terms in traditional terminological resources is limited to certain details, such as the term (which is usually a noun), its definition, and its equivalent. This description seldom takes into account other details, which can be of high importance for the users, especially if they consult resources to enhance their knowledge of the domain, to improve professional writing, or to find contexts where the term is realized. The information that might be useful includes the description of the actantial structure of the terms, contexts from authentic resources and the inclusion of other parts of speech such as verbs.
Verbs and deverbal nouns, or predicative terminological units (PTUs), which are often ignored by traditional terminology, are of great importance especially for expressing actions, processes or events. But the description of these units requires a model of terminological description that takes into account their special features. Some terminologists (Condamines 1993, Mathieu-Colas 2002, Gross et Mathieu-Colas 2001 et L’Homme 2012, 2015) proposed description models based on different theoretical frameworks.
Our research consists of proposing a methodology of terminological description of PTUs of the Arabic language, in particular Modern Standard Arabic (MSA), according to the theory of Frame Semantics of Fillmore (1976, 1977, 1982, 1985) and its application, the FrameNet project (Ruppenhofer et al. 2010). The specialized domain in which we are interested is computing. In our research, we compiled a corpus that we collected from online material and we based our method on an existing online terminological resource called the DiCoInfo (L’Homme 2008) in our pursuit to compile our own. Our objectives are the following. First, we will lay the foundations of an MSA version of the aforementioned resource. This version has its own features: 1) we target specific units, namely verbal and deverbal PTUs; 2) the developed methodology for the compilation of the original DiCoInfo should be adapted to take into account a Semitic language. Afterwards, we will create a framed version of this resource. In this version, we organize the PTUs in semantic frames according to the model of FrameNet. Since this frame version has a multilingual dimension, we add English and French PTUs to the resource.
Our methodology consists of automatically extracting the verbal and nominal terminological units (VTUs and NTUs) such as Ham~ala (حمل) (download). To do this, we integrated the MSA to an existing automatic extractor, TermoStat (Drouin 2004). Then, with the help of terminological validation criteria, we validate the terminological status of the candidates. After the validation, we create terminological files with an XML editor for each VTU and NTU. These files contain elements, such as the actantial structure of the PTUs and up to 20 annotated contexts. The last step consists of creating semantic frames from the MSA PTUs. We also associate English and French PTUs to the created frames. This association resulted in the creation of a second terminological resource called “DiCoInfo: A Framed Version”. In this resource, the PTUs that share the same semantic features and actantial structures are organized in semantic frames. For example, the semantic frame Product_development groups PTUs such as Taw~ara (طور) (develop), to develop and développer.
As a result of our methodology, we obtained a total of 106 PTUs in MSA compiled in the MSA version of DiCoInfo and 57 semantic frames associated to these units in the framed version. Our research shows that the MSA can be described using the methodology that we set up.
|
9 |
The Arabic verb : form and meaning in the vowel-lengthening patternsDanks, Warwick January 2010 (has links)
The research presented in this dissertation adopts an empirical Saussurean structuralist approach to elucidating the true meaning of the verb patterns characterised formally by vowel lengthening in Modern Standard Arabic (MSA). The verbal system as a whole is examined in order to place the patterns of interest (III and VI) in context, the complexities of Arabic verbal morphology are explored and the challenges revealed by previous attempts to draw links between form and meaning are presented. An exhaustive dictionary survey is employed to provide quantifiable data to empirically test the largely accepted view that the vowel lengthening patterns have mutual/reciprocal meaning. Finding the traditional explanation inadequate and prone to too many exceptions, alternative commonalities of meaning are similarly investigated. Whilst confirming the detransitivising function of the ta- prefix which derives pattern VI from pattern III, analysis of valency data also precludes transitivity as a viable explanation for pattern III meaning compared with the base form. Examination of formally similar morphology in certain nouns leads to the intuitive possibility that vowel lengthening has aspectual meaning. A model of linguistic aspect is investigated for its applicability to MSA and used to isolate the aspectual feature common to the majority of pattern III and pattern VI verbs, which is determined to be atelicity. A set of verbs which appear to be exceptional in that they are not attributable to atelic aspectual categories is found to be characterised by inceptive meaning and a three-phase model of event time structure is developed to include an inceptive verbal category, demonstrating that these verbs too are atelic. Thus the form-meaning relationship which is discovered is that the vowel lengthening verbal patterns in Modern Standard Arabic have atelic aspectual meaning.
|
10 |
The Effect of Rephrasing Word Problems on the Achievements of Arab Students in MathematicsMahajne, Asad, Amit, Miriam 07 May 2012 (has links) (PDF)
Language is the learning device and the device which forms the student's knowledge in math, his ability to define concepts, express mathematical ideas and solve mathematical problems. Difficulties in the Language are seen more in word problems, clarity and in the way the text is read by the student have a direct effect on the understanding of the problem and therefore, on its solution, could delay the problem solving process. The connection between language and mathematical achievements has a more distinctive significance regarding the Arab student. This is due to the fact that the language which is used in the schools and in textbooks is classical (traditional) Arabic. It is far different than the language used in everyday
conversations with family and friends (the spoken Arabic). Our research examine whether rephrasing word problems can affect the achievements of the Arab students in it. The experimental group received mathematics instruction using learning materials of word problems that were rewritten in a “middle language” closer to the students’ everyday language (spoken Arabic), thus keeping the mathematical level of the problems. The research findings showed that students in the experimental group improved their achievements in word and geometric problems significantly more than students from control group.
|
Page generated in 0.0844 seconds