• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3
  • 1
  • Tagged with
  • 5
  • 5
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Arabic text recognition of printed manuscripts : efficient recognition of off-line printed Arabic text using Hidden Markov Models, Bigram Statistical Language Model, and post-processing

Al-Muhtaseb, Husni Abdulghani January 2010 (has links)
Arabic text recognition was not researched as thoroughly as other natural languages. The need for automatic Arabic text recognition is clear. In addition to the traditional applications like postal address reading, check verification in banks, and office automation, there is a large interest in searching scanned documents that are available on the internet and for searching handwritten manuscripts. Other possible applications are building digital libraries, recognizing text on digitized maps, recognizing vehicle license plates, using it as first phase in text readers for visually impaired people and understanding filled forms. This research work aims to contribute to the current research in the field of optical character recognition (OCR) of printed Arabic text by developing novel techniques and schemes to advance the performance of the state of the art Arabic OCR systems. Statistical and analytical analysis for Arabic Text was carried out to estimate the probabilities of occurrences of Arabic character for use with Hidden Markov models (HMM) and other techniques. Since there is no publicly available dataset for printed Arabic text for recognition purposes it was decided to create one. In addition, a minimal Arabic script is proposed. The proposed script contains all basic shapes of Arabic letters. The script provides efficient representation for Arabic text in terms of effort and time. Based on the success of using HMM for speech and text recognition, the use of HMM for the automatic recognition of Arabic text was investigated. The HMM technique adapts to noise and font variations and does not require word or character segmentation of Arabic line images. In the feature extraction phase, experiments were conducted with a number of different features to investigate their suitability for HMM. Finally, a novel set of features, which resulted in high recognition rates for different fonts, was selected. The developed techniques do not need word or character segmentation before the classification phase as segmentation is a byproduct of recognition. This seems to be the most advantageous feature of using HMM for Arabic text as segmentation tends to produce errors which are usually propagated to the classification phase. Eight different Arabic fonts were used in the classification phase. The recognition rates were in the range from 98% to 99.9% depending on the used fonts. As far as we know, these are new results in their context. Moreover, the proposed technique could be used for other languages. A proof-of-concept experiment was conducted on English characters with a recognition rate of 98.9% using the same HMM setup. The same techniques where conducted on Bangla characters with a recognition rate above 95%. Moreover, the recognition of printed Arabic text with multi-fonts was also conducted using the same technique. Fonts were categorized into different groups. New high recognition results were achieved. To enhance the recognition rate further, a post-processing module was developed to correct the OCR output through character level post-processing and word level post-processing. The use of this module increased the accuracy of the recognition rate by more than 1%.
2

On the Swahili documents in Arabic script from the Congo (19th century)

Luffin, Xavier 14 August 2012 (has links) (PDF)
Si les documents rédigés en kiswahili à l’aide des caractères arabes provenant d’Afrique de l’Est sont bien renseignés depuis longtemps, qu’il s’agisse de correspondance ou de littérature, l’existence de tels documents provenant d’Afrique Centrale, et en particulier du Congo, est encore très mal connue. Pourtant, outre les témoignages de divers observateurs ou acteurs européens des débuts de la colonisation, plusieurs documents conservés pour la plupart en Belgique ont subsisté jusqu’à nos jours. Il s’agit essentiellement de la correspondance de marchands swahilis établis dans l’ancien district des Stanley Falls, mais aussi de traités, d’échanges «diplomatiques» ou de notes personnelles, remontant essentiellement aux deux dernières décennies du 19ème siècle. Ces documents se révèlent être une source intéressante à la fois pour l’Histoire du Congo précolonial et pour l’étude diachronique du kiswahili et de son expansion géographique. / Though the existence of Swahili documents in Arabic script originating from East Africa – mainly Tanzania and Kenya – has been well documented for a long time (see for instance Büttner 1892, Allen 1970, Dammann 1993 and the recent Swahili Manuscripts Database of the SOAS), very few things regarding such manuscripts in Central Africa, and especially the Congo, have been reported up to now. However, several museums and archives in Belgium and elsewhere hold documents written in Swahili with Arabic script coming from what is today the DRC, along with other documents in the Arabic language.1 All of them date back to the two last decades of the 19th century. Most of these documents are to be found in the Historical Archives of the Royal Museum of Central Africa (MRAC), Tervuren, but some other Belgian institutions like the African Archives (AA) of the Belgian Ministry of Foreign Affairs, the Library of the University of Liège (ULg) and the Army Museum (MRA) in Brussels, also contain some examples of these documents. Other possible sources should be explored, like the personal archives of families whose ancestors worked in the Congo during the colonial time – most of the Swahili documents in Tervuren are personal papers belonging to former Belgian officers, which were donated to the Museum after their death – as well as the archives of Christian missionary orders. Nevertheless, nothing is known about the presence of such documents in DRC today, but we can suppose that some of them have been preserved in places like mosques, Koranic schools or personal archives.
3

On the Swahili documents in Arabic script from the Congo (19th century)

Luffin, Xavier January 2007 (has links)
Si les documents rédigés en kiswahili à l’aide des caractères arabes provenant d’Afrique de l’Est sont bien renseignés depuis longtemps, qu’il s’agisse de correspondance ou de littérature, l’existence de tels documents provenant d’Afrique Centrale, et en particulier du Congo, est encore très mal connue. Pourtant, outre les témoignages de divers observateurs ou acteurs européens des débuts de la colonisation, plusieurs documents conservés pour la plupart en Belgique ont subsisté jusqu’à nos jours. Il s’agit essentiellement de la correspondance de marchands swahilis établis dans l’ancien district des Stanley Falls, mais aussi de traités, d’échanges «diplomatiques» ou de notes personnelles, remontant essentiellement aux deux dernières décennies du 19ème siècle. Ces documents se révèlent être une source intéressante à la fois pour l’Histoire du Congo précolonial et pour l’étude diachronique du kiswahili et de son expansion géographique. / Though the existence of Swahili documents in Arabic script originating from East Africa – mainly Tanzania and Kenya – has been well documented for a long time (see for instance Büttner 1892, Allen 1970, Dammann 1993 and the recent Swahili Manuscripts Database of the SOAS), very few things regarding such manuscripts in Central Africa, and especially the Congo, have been reported up to now. However, several museums and archives in Belgium and elsewhere hold documents written in Swahili with Arabic script coming from what is today the DRC, along with other documents in the Arabic language.1 All of them date back to the two last decades of the 19th century. Most of these documents are to be found in the Historical Archives of the Royal Museum of Central Africa (MRAC), Tervuren, but some other Belgian institutions like the African Archives (AA) of the Belgian Ministry of Foreign Affairs, the Library of the University of Liège (ULg) and the Army Museum (MRA) in Brussels, also contain some examples of these documents. Other possible sources should be explored, like the personal archives of families whose ancestors worked in the Congo during the colonial time – most of the Swahili documents in Tervuren are personal papers belonging to former Belgian officers, which were donated to the Museum after their death – as well as the archives of Christian missionary orders. Nevertheless, nothing is known about the presence of such documents in DRC today, but we can suppose that some of them have been preserved in places like mosques, Koranic schools or personal archives.
4

Arabic text recognition of printed manuscripts. Efficient recognition of off-line printed Arabic text using Hidden Markov Models, Bigram Statistical Language Model, and post-processing.

Al-Muhtaseb, Husni A. January 2010 (has links)
Arabic text recognition was not researched as thoroughly as other natural languages. The need for automatic Arabic text recognition is clear. In addition to the traditional applications like postal address reading, check verification in banks, and office automation, there is a large interest in searching scanned documents that are available on the internet and for searching handwritten manuscripts. Other possible applications are building digital libraries, recognizing text on digitized maps, recognizing vehicle license plates, using it as first phase in text readers for visually impaired people and understanding filled forms. This research work aims to contribute to the current research in the field of optical character recognition (OCR) of printed Arabic text by developing novel techniques and schemes to advance the performance of the state of the art Arabic OCR systems. Statistical and analytical analysis for Arabic Text was carried out to estimate the probabilities of occurrences of Arabic character for use with Hidden Markov models (HMM) and other techniques. Since there is no publicly available dataset for printed Arabic text for recognition purposes it was decided to create one. In addition, a minimal Arabic script is proposed. The proposed script contains all basic shapes of Arabic letters. The script provides efficient representation for Arabic text in terms of effort and time. Based on the success of using HMM for speech and text recognition, the use of HMM for the automatic recognition of Arabic text was investigated. The HMM technique adapts to noise and font variations and does not require word or character segmentation of Arabic line images. In the feature extraction phase, experiments were conducted with a number of different features to investigate their suitability for HMM. Finally, a novel set of features, which resulted in high recognition rates for different fonts, was selected. The developed techniques do not need word or character segmentation before the classification phase as segmentation is a byproduct of recognition. This seems to be the most advantageous feature of using HMM for Arabic text as segmentation tends to produce errors which are usually propagated to the classification phase. Eight different Arabic fonts were used in the classification phase. The recognition rates were in the range from 98% to 99.9% depending on the used fonts. As far as we know, these are new results in their context. Moreover, the proposed technique could be used for other languages. A proof-of-concept experiment was conducted on English characters with a recognition rate of 98.9% using the same HMM setup. The same techniques where conducted on Bangla characters with a recognition rate above 95%. Moreover, the recognition of printed Arabic text with multi-fonts was also conducted using the same technique. Fonts were categorized into different groups. New high recognition results were achieved. To enhance the recognition rate further, a post-processing module was developed to correct the OCR output through character level post-processing and word level post-processing. The use of this module increased the accuracy of the recognition rate by more than 1%. / King Fahd University of Petroleum and Minerals (KFUPM)
5

L'utilisation de l'arabe écrit en caractères arabes par les Juifs aux XIXe et XXe siècles / The use of arabic as a written language in Arabic characters by the jews in the XIXth and XXth century

Langella, Maria-Luisa 10 December 2011 (has links)
L'utilisation de l'arabe écrit en caractères arabes par les Juifs entre la fin du XIX° et la fin du XX° siècle s'inscrit dans la continuité d'un rapport de longue durée entre les Juifs et la langue arabe, et constitue un phénomène linguistique jusqu'à présent peu étudié. Afin d'en délimiter les contours et d'en prendre la mesure, nous avons constitué, à partir du travail de Shmuel Moreh en Israël, un corpus bibliographique de 654 notices de textes publiés en langue arabe par des auteurs juifs. Son analyse nous a permis de mettre en évidence la faible ampleur de ce phénomène. Premièrement du point de vue de son étendue dans le temps, car même si la première notice de notre corpus date de 1847 et la dernière de 2008, ce n'est qu'entre 1930 et 1970 que se concentre la plupart des documents répertoriés. Deuxièmement, du point de vue de son étendue géographique, car c’est essentiellement en Egypte, en Iraq et finalement en Israël que se développe ce phénomène. A ce sujet, nous préciserons cependant que celui-ci s’est exporté vers Israël, suite au départ des Juifs des pays arabes principalement durant les années 1950. Troisièmement, car il n’est soutenu que par un petit nombre d'individus, sur l’ensemble des auteurs de notre corpus. Ces considérations mises à part, nous avons pu observer un certain dynamisme dans cette production écrite. Celui-ci se manifeste d’abord du point de vue de l'hétérogénéité des genres observés dans le corpus, allant de la poésie au théâtre, en passant par les romans, les nouvelles, les essais et le journalisme. Il apparaît ensuite à travers les différentes variétés de langue arabe utilisées, telles que l’arabe classique, ou les dialectes locaux. / The use of Arabic language, in Arabic characters, by the Jews between the end of the XIXth century and the end of the XXth century is one aspect of the long-standing relationship between the Jews and the Arabic language, and constitutes a distinctive linguistic phenomenon which has so far been little researched. In order to outline it and describe it, and building on Shmuel Moreh’s pioneering work in Israel, we have established a bibliographic corpus of some 654 texts and works published by Jewish authors in the Arabic language in Arabic characters. Its analysis has enabled us to highlight the limited extent of this phenomenon. First of all, from a chronological point of view: although the first reference at our disposal dates back to 1847 and the last one to 2008, most of this literature was produced between 1930 and 1970. Secondly, from a geographical point of view: this phenomenon is associated mainly with Egypt, Iraq and later Israel. In this regard, it must be noted that the phenomenon was exported to Israel after the departure of the Jews from the Arab countries principally during the 1950s, and involves almost exclusively émigré writers. Thirdly, because it involves only a small number of individuals, out of the total number of authors listed in our corpus. However, despite all these considerations, this literature is characterised by a certain degree of dynamism. This can be seen first of all in the heterogeneity of the genres observed, spanning poetry, theatre, novels, short stories, essays and journalism, and in its employ of different varieties of Arabic, such as Classical Arabic or local dialects.

Page generated in 0.039 seconds