Global ETD Search

1	Arabic text recognition of printed manuscripts : efficient recognition of off-line printed Arabic text using Hidden Markov Models, Bigram Statistical Language Model, and post-processing Al-Muhtaseb, Husni Abdulghani January 2010 (has links) Arabic text recognition was not researched as thoroughly as other natural languages. The need for automatic Arabic text recognition is clear. In addition to the traditional applications like postal address reading, check verification in banks, and office automation, there is a large interest in searching scanned documents that are available on the internet and for searching handwritten manuscripts. Other possible applications are building digital libraries, recognizing text on digitized maps, recognizing vehicle license plates, using it as first phase in text readers for visually impaired people and understanding filled forms. This research work aims to contribute to the current research in the field of optical character recognition (OCR) of printed Arabic text by developing novel techniques and schemes to advance the performance of the state of the art Arabic OCR systems. Statistical and analytical analysis for Arabic Text was carried out to estimate the probabilities of occurrences of Arabic character for use with Hidden Markov models (HMM) and other techniques. Since there is no publicly available dataset for printed Arabic text for recognition purposes it was decided to create one. In addition, a minimal Arabic script is proposed. The proposed script contains all basic shapes of Arabic letters. The script provides efficient representation for Arabic text in terms of effort and time. Based on the success of using HMM for speech and text recognition, the use of HMM for the automatic recognition of Arabic text was investigated. The HMM technique adapts to noise and font variations and does not require word or character segmentation of Arabic line images. In the feature extraction phase, experiments were conducted with a number of different features to investigate their suitability for HMM. Finally, a novel set of features, which resulted in high recognition rates for different fonts, was selected. The developed techniques do not need word or character segmentation before the classification phase as segmentation is a byproduct of recognition. This seems to be the most advantageous feature of using HMM for Arabic text as segmentation tends to produce errors which are usually propagated to the classification phase. Eight different Arabic fonts were used in the classification phase. The recognition rates were in the range from 98% to 99.9% depending on the used fonts. As far as we know, these are new results in their context. Moreover, the proposed technique could be used for other languages. A proof-of-concept experiment was conducted on English characters with a recognition rate of 98.9% using the same HMM setup. The same techniques where conducted on Bangla characters with a recognition rate above 95%. Moreover, the recognition of printed Arabic text with multi-fonts was also conducted using the same technique. Fonts were categorized into different groups. New high recognition results were achieved. To enhance the recognition rate further, a post-processing module was developed to correct the OCR output through character level post-processing and word level post-processing. The use of this module increased the accuracy of the recognition rate by more than 1%. 005.3
2	Modelování dynamiky prosodie pro rozpoznávání řečníka / Modelling Prosodic Dynamics for Speaker Recognition Jančík, Zdeněk January 2008 (has links) Most current automatic speaker recognition system extract speaker-depend features by looking at short-term spectral information. This approach ignores long-term information. I explored approach that use the fundamental frequency and energy trajectories for each speaker. This approach models prosody dynamics on single fonemes or syllables. It is known from literature that prosodic systems do not work as well the acoustic one but it improve the system when fusing. I verified this assumption by fusing my results with state of the art acoustic system from BUT. Data from standard evaluation campaigns organized by National Institute of Standarts and Technology are used for all experiments.
3	Arabic text recognition of printed manuscripts. Efficient recognition of off-line printed Arabic text using Hidden Markov Models, Bigram Statistical Language Model, and post-processing. Al-Muhtaseb, Husni A. January 2010 (has links) Arabic text recognition was not researched as thoroughly as other natural languages. The need for automatic Arabic text recognition is clear. In addition to the traditional applications like postal address reading, check verification in banks, and office automation, there is a large interest in searching scanned documents that are available on the internet and for searching handwritten manuscripts. Other possible applications are building digital libraries, recognizing text on digitized maps, recognizing vehicle license plates, using it as first phase in text readers for visually impaired people and understanding filled forms. This research work aims to contribute to the current research in the field of optical character recognition (OCR) of printed Arabic text by developing novel techniques and schemes to advance the performance of the state of the art Arabic OCR systems. Statistical and analytical analysis for Arabic Text was carried out to estimate the probabilities of occurrences of Arabic character for use with Hidden Markov models (HMM) and other techniques. Since there is no publicly available dataset for printed Arabic text for recognition purposes it was decided to create one. In addition, a minimal Arabic script is proposed. The proposed script contains all basic shapes of Arabic letters. The script provides efficient representation for Arabic text in terms of effort and time. Based on the success of using HMM for speech and text recognition, the use of HMM for the automatic recognition of Arabic text was investigated. The HMM technique adapts to noise and font variations and does not require word or character segmentation of Arabic line images. In the feature extraction phase, experiments were conducted with a number of different features to investigate their suitability for HMM. Finally, a novel set of features, which resulted in high recognition rates for different fonts, was selected. The developed techniques do not need word or character segmentation before the classification phase as segmentation is a byproduct of recognition. This seems to be the most advantageous feature of using HMM for Arabic text as segmentation tends to produce errors which are usually propagated to the classification phase. Eight different Arabic fonts were used in the classification phase. The recognition rates were in the range from 98% to 99.9% depending on the used fonts. As far as we know, these are new results in their context. Moreover, the proposed technique could be used for other languages. A proof-of-concept experiment was conducted on English characters with a recognition rate of 98.9% using the same HMM setup. The same techniques where conducted on Bangla characters with a recognition rate above 95%. Moreover, the recognition of printed Arabic text with multi-fonts was also conducted using the same technique. Fonts were categorized into different groups. New high recognition results were achieved. To enhance the recognition rate further, a post-processing module was developed to correct the OCR output through character level post-processing and word level post-processing. The use of this module increased the accuracy of the recognition rate by more than 1%. / King Fahd University of Petroleum and Minerals (KFUPM) Arabic text recognition Hidden Markov Models Feature extraction Omni font recognition Minimal Arabic script Bigram Statistical Language Model Optical character recognition (OCR) Statistical and analytical analysis
4	The Effect of Neighborhood Size and Morphology in the Chinese Language Nguyen, Long 01 January 2016 (has links) The Neighborhood Size Effect (NSE), characterized as the effect in word determination based on changing one orthographic aspect of that word. The amount of words that can be created through such manipulation is called a neighborhood size (NS). Number of other factors such as frequency, how often a word appears and morphology, the combination of meaningful word units, have been suggested to have an overriding effect on NSE. In addition, there is a lack of research on NSE with non-alphabetical languages such as Chinese, which uses characters comprising of a multitude of semantic or phonetic markers. In this experiment, participants coming from mainland China were presented with 60 individual characters and 59 characters with Chinese morphology made up of two characters which form single words. Both conditions, were manipulated with NS by adjusting the semantic or phonetical radical within a character and with frequency by using a website that measures how frequent a character appeared within the language. Both character conditions were found to have a significant effect with frequency and neighborhood size (NS) with characters with higher frequency and lower NS found to have higher accuracy and lower reaction times. With low frequency single characters, it was that those with higher neighborhood size had greater delay in reaction time and lower accuracy. With low frequency morphologically constructed characters, it was found that lower neighborhood size had higher accuracy, but no significant result with regards to reaction time. Due to differing accuracy results with NS and character condition, it is suggested that further factors such as morphological processing in single characters and bigram frequency in morphologically constructed characters might have an effect on word determination in conjunction with neighborhood size. Thus, it is a possibility that Chinese morphological may depend more on other factors than neighborhood size. Chinese Characters Morphology Neighborhood Size Frequency Bigram Frequency Applied Behavior Analysis Chinese Studies Cognition and Perception Cognitive Psychology Experimental Analysis of Behavior Other Languages, Societies, and Cultures
5	Surface Realization Using a Featurized Syntactic Statistical Language Model Packer, Thomas L. 13 March 2006 (has links) An important challenge in natural language surface realization is the generation of grammatical sentences from incomplete sentence plans. Realization can be broken into a two-stage process consisting of an over-generating rule-based module followed by a ranker that outputs the most probable candidate sentence based on a statistical language model. Thus far, an n-gram language model has been evaluated in this context. More sophisticated syntactic knowledge is expected to improve such a ranker. In this thesis, a new language model based on featurized functional dependency syntax was developed and evaluated. Generation accuracies and cross-entropy for the new language model did not beat the comparison bigram language model. natural language generation natural language processing NLP NLG Bayesian networks decision trees context specific independence realization statistical language model standard pipeline architecture n-gram (bigram) language model syntax features statistical model machine learning Computer Sciences

1

Page generated in 0.0427 seconds