Return to search

An empirical study on Chinese text compression: from character-based to word-based approach.

by Kwok-Shing Cheng. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1997. / Includes bibliographical references (leaves 114-120). / Abstract --- p.i / Acknowledgement --- p.iii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Importance of Text Compression --- p.1 / Chapter 1.2 --- Motivation of this Research --- p.2 / Chapter 1.3 --- Characteristics of Chinese --- p.2 / Chapter 1.3.1 --- Huge size of character set --- p.3 / Chapter 1.3.2 --- Lack of word segmentation --- p.3 / Chapter 1.3.3 --- Rich semantics --- p.3 / Chapter 1.4 --- Different Coding Schemes for Chinese --- p.4 / Chapter 1.4.1 --- Big5 Code --- p.4 / Chapter 1.4.2 --- GB (Guo Biao) Code --- p.4 / Chapter 1.4.3 --- HZ (Hanzi) Code --- p.5 / Chapter 1.4.4 --- Unicode Code --- p.5 / Chapter 1.5 --- Modeling and Coding for Chinese Text --- p.6 / Chapter 1.6 --- Static and Adaptive Modeling --- p.6 / Chapter 1.7 --- One-Pass and Two-Pass Modeling --- p.8 / Chapter 1.8 --- Ordering of models --- p.9 / Chapter 1.9 --- Two Sets of Benchmark Files and the Platform --- p.9 / Chapter 1.10 --- Outline of the Thesis --- p.11 / Chapter 2 --- A Survey of Chinese Text Compression --- p.13 / Chapter 2.1 --- Entropy for Chinese Text --- p.14 / Chapter 2.2 --- Weakness of Traditional Compression Algorithms on Chinese Text --- p.15 / Chapter 2.3 --- Statistical Class Algorithms for Compressing Chinese --- p.16 / Chapter 2.3.1 --- Huffman coding scheme --- p.17 / Chapter 2.3.2 --- Arithmetic Coding Scheme --- p.22 / Chapter 2.3.3 --- Restricted Variable Length Coding Scheme --- p.26 / Chapter 2.4 --- Dictionary-based Class Algorithms for Compressing Chinese --- p.27 / Chapter 2.5 --- Experiments and Results --- p.32 / Chapter 2.6 --- Chapter Summary --- p.35 / Chapter 3 --- Indicator Dependent Huffman Coding Scheme --- p.37 / Chapter 3.1 --- Chinese Character Identification Routine --- p.37 / Chapter 3.2 --- Reduction of Header Size --- p.39 / Chapter 3.3 --- Semi-adaptive IDC for Chinese Text --- p.44 / Chapter 3.3.1 --- Theoretical Analysis of Partition Technique for Com- pression --- p.48 / Chapter 3.3.2 --- Experiments and Results of the Semi-adaptive IDC --- p.50 / Chapter 3.4 --- Adaptive IDC for Chinese Text --- p.54 / Chapter 3.4.1 --- Experiments and Results of the Adaptive IDC --- p.57 / Chapter 3.5 --- Chapter Summary --- p.58 / Chapter 4 --- Cascading LZ Algorithms with Huffman Coding Schemes --- p.59 / Chapter 4.1 --- Variations of Huffman Coding Scheme --- p.60 / Chapter 4.1.1 --- Analysis of EPDC and PDC --- p.60 / Chapter 4.1.2 --- "Analysis of PDC, 16Huff and IDC" --- p.65 / Chapter 4.1.3 --- Time and Memory Consumption --- p.71 / Chapter 4.2 --- "Cascading LZSS with PDC, 16Huff and IDC" --- p.73 / Chapter 4.2.1 --- Experimental Results --- p.76 / Chapter 4.3 --- "Cascading LZW with PDC, 16Huff and IDC" --- p.79 / Chapter 4.3.1 --- Experimental Results --- p.82 / Chapter 4.4 --- Chapter Summary --- p.84 / Chapter 5 --- Applying Compression Algorithms to Word-segmented Chi- nese Text --- p.85 / Chapter 5.1 --- Background of word-based compression algorithms --- p.86 / Chapter 5.2 --- Terminology and Benchmark Files for Word Segmentation Model --- p.88 / Chapter 5.3 --- Word Segmentation Model --- p.88 / Chapter 5.4 --- Chinese Entropy from Byte to Word --- p.91 / Chapter 5.5 --- The Generalized Compression and Decompression Model for Word-segmented Chinese text --- p.92 / Chapter 5.6 --- Applying Huffman Coding Scheme to Word-segmented Chinese text --- p.94 / Chapter 5.7 --- Applying WLZSSHUF to Word-segmented Chinese text --- p.97 / Chapter 5.8 --- Applying WLZWHUF to Word-segmented Chinese text --- p.102 / Chapter 5.9 --- Match Ratio and Compression Ratio --- p.105 / Chapter 5.10 --- Chapter Summary --- p.108 / Chapter 6 --- Concluding Remarks --- p.110 / Chapter 6.1 --- Conclusions --- p.110 / Chapter 6.2 --- Contributions --- p.111 / Chapter 6.3 --- Future Directions --- p.112 / Chapter 6.3.1 --- Integrate Decremental Coding Scheme with IDC --- p.112 / Chapter 6.3.2 --- Re-order the Character Sequences in the Sliding Window of LZSS --- p.113 / Chapter 6.3.3 --- Multiple Huffman Trees for Word-based Compression --- p.113 / Bibliography --- p.114

Identiferoai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_322712
Date January 1997
ContributorsCheng, Kwok-Shing., Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering.
Source SetsThe Chinese University of Hong Kong
LanguageEnglish
Detected LanguageEnglish
TypeText, bibliography
Formatprint, x, 120 leaves : ill. ; 30 cm.
RightsUse of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.0019 seconds