Return to search

Processing Techniques for Written Taiwanese --Tone Sandhi and POS Tagging / 台語文處理技術:以變調及詞性標記為例

博士 / 國立臺灣大學 / 資訊工程學研究所 / 97 / Taiwan Southern Min (Taiwanese) is an important language that has received only a little attention in the world. The characteristic of written Taiwanese is quite different from Mandarin or English in some respects. We will focus on Taiwanese processing techniques in this dissertation.
POJ is an important script of Taiwanese. We introduce character code of POJ, and mention the numbered POJ as the interchange code for various POJ encodings. Then, we propose a two-stage search strategy for POJ text search, and propose POJ syllable query expansion. We also describe the display method for POJ, POJ word processing utilities and word segmentation method for HR mixed script.
We propose a rule-based tone sandhi algorithm. We translate every word into Mandarin, and obtain the POS information. Using the POS data and tone sandhi rules, we then tag each syllable with its post-sandhi tone marker. Finally we implemented a Taiwanese tone sandhi processing system. Our system achieves 97.4% and 89.0% accuracy rate with training and test data, respectively.
Additionally, we propose a POS tagging method. We develop a word alignment checker to help the two Taiwanese scripts word alignment work, select the most adequate Mandarin word using Hidden Markov probabilistic model, and finally tag the word using Maximal Entropy Markov Model classifier. We achieve an accuracy rate of 91.5% on Taiwanese POS tagging work.
We have established some useful online written Taiwanese tools for past several years. Based on these tools and preliminary research results, we hope the written Taiwanese processing related research can be promoted.

Identiferoai:union.ndltd.org:TW/097NTU05392005
Date January 2009
CreatorsUn-Gian Iunn, 楊允言
Contributors高成炎
Source SetsNational Digital Library of Theses and Dissertations in Taiwan
Languageen_US
Detected LanguageEnglish
Type學位論文 ; thesis
Format139

Page generated in 0.0019 seconds