Global ETD Search

1	Semantic prosody in Thai Supanfai, Pornthip January 2017 (has links) Semantic prosody is an important concept and has become a primary research interest in corpus linguistics. This thesis undertakes the groundwork of fundamental research into semantic prosody in Thai, a language which has not been subject to studies of semantic prosody before, to set out the parameters for subsequent research in this area. In particular, it addresses these three research questions: 1. What are the advantages and disadvantages of the major approaches to semantic prosody proposed in the literature for describing semantic prosody in Thai? 2. What variation in semantic prosodies across genres can be identified for Thai words? 3. To what extent are the semantic prosodies of words identified as translation-equivalents in widely-used bilingual dictionaries in Thai and English similar or different? The datasets employed in the analysis are the Thai National Corpus and the British National Corpus. To address each research question, a small number of Thai words are selected for the analysis. Two primary approaches, the polarity-oriented approach and the EUM-oriented approach, are employed to identify semantic prosody. Within the polarity-oriented approach, which is founded in work by Louw, Stubbs, and Partington, semantic prosody is identified based on collocates, and is restricted to the positive vs. negative opposition. Within the EUM-oriented approach, which is based in the studies of Sinclair, semantic prosody is identified by examining concordance lines for a pragmatic function or meaning that is spread across an extended unit of meaning. The results of the analysis show that the two primary approaches to semantic prosody do operate successfully with the Thai data. A range of semantic prosodies are identified for /kreeŋcay/ ‘considerate’, /kɔ̀ ɔhâykə̀ ət/ ‘cause’, and /chɔ̂ɔp/ ‘like’, the objects under study, by the two approaches. The discussion of these semantic prosodies shows that the two approaches are useful for different purposes. The polarity-oriented approach is useful when one’s aim is to investigate a word’s tendency to co-occur with positive or negative words. Particularly, it reveals the hidden evaluative potential of words whose evaluation is not obvious from their core semantics. The EUM-oriented approach is, by contrast, suitable for the examination of an extended unit of meaning and its pragmatic function in the Sinclairian sense. They both also have some advantages and disadvantages in terms of practicality. On the issue of variation in semantic prosodies across genres, some variation is indeed found to exist. From the concordance analysis of 19 verbs, each in four different genres, namely academic writing, fiction, newspaper stories, and non-academic non-fiction, 21 different extended units of meaning are identified from 14 of the verbs. The level of variation in the use of these extended units of meaning across genres, which implies variation in semantic prosodies, is considerable with some extended units of meaning, but is limited with others. In particular, a notable contrast is identified between academic and fiction genres in terms of which extended units (and semantic prosodies) are common. Finally, the majority of the translationequivalent pairs under study (36 out of 48) show the same semantic prosody; of these, most present a neutral semantic prosody. In cases where the pairs show different semantic prosodies, there are not any cases where one word in the pair shows a positive semantic prosody, and the other shows a negative semantic prosody, and vice versa. It is thus arguable that there is a relationship between semantic prosody in Thai and English – not a genetic or areal relationship, but one that arises from a functional basis, that is, the meanings that the pairs of words under study express in both languages. 495.9
2	A systematic study of offline recognition of Thai printed and handwritten characters Sae-Tang, Sutat January 2011 (has links) Thai characters pose some unique problems, which differ from English and other oriental scripts. The structure of Thai characters consists of small loops combined with curves and there is an absence of spaces between each word and sentence. In each line, moreover, Thai characters can be composed on four levels, depending on the type of character being written. This research focuses on OCR for the Thai language: printed and offline handwritten character recognition. An attempt to overcome the problems by simple but effective methods is the main consideration. A printed OCR developed by the National Electronics and Computer Technology Center (NECTEC) uses Kohonen self- organising maps (SOMs) for rough classification and back-propagation neural networks for fine classification. An evaluation of the NECTEC OCR is performed on a printed dataset that contains over 0.6 million tokens. Comparisons of the classifier, with and without the aspect ratio, and with and without SOMs, yield small, but statistically significant differences in recognition rate. A very straightforward classifier, the nearest neighbour, was examined to evaluate overall recognition performance and to compare with the classifier. It shows a significant improvement in recognition rate (about 98%) over the NECTEC classifier (about 96%) on both the original and distorted data (rotated and noisy), but at the expense of longer recognition times. For offline handwritten character recognition, three different classifiers are evaluated on three different datasets that contain, on average, approximately 10,000 tokens each. The neural network and HMMs are more effective and give higher recognition rates than the nearest neighbour classifier on three datasets. The best result obtained from the HMMs is 91.1% on ThaiCAM dataset. However, when evaluated on a different dataset, the recognition rates drastically reduce, due to differences in many aspects of online and offline handwritten data. An improvement in classification rates was obtained by adjusting the stroke width of a character in the online handwritten dataset (12 percentage points) and combining the training sets from the three datasets (7.6 percentage points). A boosting algorithm called AdaBoost yields a slight improvement in recognition rate (1.2 percentage points) over the original classifiers (without applying the AdaBoost algorithm). 495.9

Search results

Semantic prosody in Thai

A systematic study of offline recognition of Thai printed and handwritten characters