Automatic text summarization can efficiently and effectively save users¡¦ time while reading text documents. The objective of automatic text summarization is to extract essential sentences that cover almost all the concepts of a document so that
users are able to comprehend the ideas the document tries to address by simply reading through the corresponding summary. This research focuses on developing a hybrid automatic text summarization
approach, KCS, to enhancing the quality of summaries.
This approach basically consists of two major components: first, it employs the K-mixture probabilistic model to calculate term weights in a statistical sense; it then identifies the term relationship
between nouns and nouns as well as nouns and verbs, which results in the connective strength (CS) of nouns. With the connective strengths available scores of sentences can be calculated and ranked to be extracted.
We conduct three experiments to justify the proposed approach. The quality of summary is examined by its capability of increasing accuracy of text classification,while the classifier employed, the Naïve Bayes classifier, is kept the same through all experiments. The results show that the K-mixture model is more contributive to document classification than traditional TFIDF weighting scheme. It, however, is still no better than CS, a more complex linguistic-based approach. More importantly, our proposed approach, KCS, performs best among all approaches considered. It implies that KCS can extract more representative sentences from the document and its feasibility in text summarization applications is thus justified.
Identifer | oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-1018107-100507 |
Date | 18 October 2007 |
Creators | Yuan, Li-An |
Contributors | Hsiao Wen Feng, Sun Pei Chen, Chang Te Min |
Publisher | NSYSU |
Source Sets | NSYSU Electronic Thesis and Dissertation Archive |
Language | Cholon |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-1018107-100507 |
Rights | unrestricted, Copyright information available at source archive |
Page generated in 0.0019 seconds