Return to search

Incremental Aspect Model Learning on Streaming¡@Documents

Owing to the development of Internet, excessive online data drive users to apply tools to assist them in obtaining desired and useful information. Information retrieval techniques serve as one of the major assistance tools that ease users¡¦ information processing loads. However, most current IR models do not consider processing streaming information which essentially characterizes today¡¦s Web environment. The approach to re-building models based on the full knowledge of data at hand triggered by the new incoming information every time is impractical, inefficient, and costly.
Instead, IR models that can be adapted to streaming information incrementally should be considered under the dynamic environment.
Therefore, this research is to propose an IR related technique, the incremental aspect model (ISM), which not only uncovers latent aspects from the collected
documents but also adapts the aspect model on streaming documents chronologically.
There are two stages in ISM: in Stage I, we employ probabilistic latent semantic indexing (PLSI) technique to build a primary aspect model; and in Stage II, with out-of-date data removing and new data folding-in, the aspect model can be expanded using the derived spectral method if new aspects significantly exist.
Three experiments are conducted accordingly to verify ISM. Results from the first two experiments show the robust performance of ISM in incremental text clustering tasks. In Experiment III, ISM performs the task of storylines tracking on the 2010 Soccer World Cup event. It illustrates ISM¡¦s incremental learning ability to discover different themes around the event at any time. The feasibility of our proposed approach in real applications is thus justified.

Identiferoai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0816110-145751
Date16 August 2010
CreatorsWu, Cheng-Wei
ContributorsWen-Feng Hsiao, Te-Min Chang, San-Yih Hwang, Chien-Chin Chen
PublisherNSYSU
Source SetsNSYSU Electronic Thesis and Dissertation Archive
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0816110-145751
Rightsoff_campus_withheld, Copyright information available at source archive

Page generated in 0.0024 seconds