Return to search

自動摘要方法之研究與探討

隨著網際網路的發展,人類能夠獲取的資訊也隨之增加。因此,如何增加獲取資訊的效率便成為重要的研究之一。自動摘要系統的目的即在於協助使用者有效率的閱讀。其根本的問題為:如何從文章中找出重要的資訊並呈現給使用者。本文採用三種方法進行摘要,應用於英文的新聞文件:第一種為利用ontology建立一領域的文章可能的主題資訊,並利用該資訊選出重要的段落作為摘要。第二種方法為建立一領域的ontology後,利用ontology所定義的標籤建構摘要的樣板,再利用該樣版搜尋所需的資訊,並將摘要輸出。第三種方法為利用各種不同的特徵找出文章中較重要的段落。另外,我們也將文章依其主題予以分類,利用不同主題的文章呈現的特徵表現不同,改良原本的特徵選取摘要的方法。本文呈現三種以不同方式獲取文章的主題資訊的方法,並利用該資訊呈現文章中較重要的訊息,經過實驗的評估皆獲得一定的成果。 / In the past decade, the explosively growing number of online articles has made efficient information gathering a challenging necessity. We need ways to absorb the information contained in the news articles effectively. Automatically providing summaries of articles is one way to save people time. The essential problems of automatic summarization are: how to identify the useful information and how to present the results to readers. I compare and analyze three kinds of summarization methods. The first method constructs a domain-depend knowledge based on the ontology approach, then use the ontology for gathering the main topics, and chooses the desired proportion of paragraphs as the summary by gathered topical information. The second method is similar to that taken by information-extraction systems. I organize semantic tags into an ontological structure, and the summarization system learns tags patterns for creating summaries from tagged data. The summarization system creates summaries by extracting useful information from a news article, and replaces the semantic tags with extracts in selected tag patterns. The third method analyzes the effects of several previously proposed features for summarization under different situations. The most important observation is that the effectiveness if these features depends on the topics of the news articles Hence, I collect statistical information about the features for different possible new topics, and apply such conditional probabilistic information for extracting summaries. Effectiveness of these proposed methods vary from case to case, but is believed to be satisfactory based on the experimental results.

Identiferoai:union.ndltd.org:CHENGCHI/G0090753010
Creators吳家威, Wu, Chia-wei
Publisher國立政治大學
Source SetsNational Chengchi University Libraries
Language中文
Detected LanguageEnglish
Typetext
RightsCopyright © nccu library on behalf of the copyright holders

Page generated in 0.0038 seconds