1 |
基於概念飄移探勘的社群多媒體之熱門程度預測 / Popularity prediction of social multimedia based on concept drift mining鄭世宏, Jheng, Shih Hong Unknown Date (has links)
近年來社群平台(Social Media)的興起,提供了人與人之間簡便且快速互相交換各式各樣內容的機會。社群多媒體(Social Multimedia)指的就是使用者在社群平台上所互相交換的多媒體內容,相較於單純的多媒體內容而言,社群多媒體多了寶貴的大量社群平台使用者之間分享互動的記錄,以及社群平台使用者在社群網絡(Social Network)中的各項資訊。如此一來為多媒體內容提供了更多面向的資料,讓社群多媒體比起單純的多媒體內容有更多的應用的可能。
微網誌(Microblog)是個可以讓使用者自由的即時分享文字訊息的平台,有著許多使用者的當下的心情、眼前所看到聽到的事或與朋友對話等。而微網誌平台相較於其它單純用來分享多媒體內容的社群平台(例如YouTube或Flickr)而言,在微網誌平台上的多媒體內容有明顯的分享傳遞現象。而本研究的目標,就是要利用些多媒體內容在微網誌平台上的分享傳遞的特性與資料,針對群多媒體內容進行熱門預測。
隨著時間的前進,若以單一同樣的規則來進行熱門預測,將可能造成預測準確率的下降;再者,即使是在同樣的時間點,不同的多媒體內容會有各自隨著時間在熱門上的變化趨勢,還是會有需要不同的規則來進行熱門預測的可能性,也就是所謂的局部概念飄移現象。在此我們將熱門預測問題轉為資料探勘(Data Mining)中的分類(Classification)問題,並同時將局部概念飄移現象納入考慮,提出一個針對微網誌平台上多媒體內容的熱門預測方法。實驗結果顯示,有考慮局部概念飄移的熱門預測方法,在準確率的表現上明顯的優於GCD方法(平均有4%的提升)與Baseline方法(平均有10%的提升),代表我們的熱門預測方法更適合微網誌平台上的多媒體內容,也代表的確有概念飄移與局部概念飄移的現象存在。 / In recent years, the rise of social media offers an easy and fast way for information exchange. Social multimedia refers to the multimedia content that users share on the social media. Different from traditional multimedia, social multimedia contains both the multimedia and user behavior information on social media.
Microblog is one type of social media. Compared to other social media such as YouTube and Flickr, microblogs provide a more friendly environment for users to propagate social multimedia. The goal of this thesis is to make use of the characteristics and information of propagation on microblogs for popularity prediction of social multimedia.
The popularity prediction method based on concept drift mining is proposed. In particular, the local concept drift mechanism is employed to capture the local characteristics of social multimedia. By taking the local concept drift into consideration, the task of popularity prediction is transformed into the ensemble classification problem. Experiments on social multimedia collected from plurk show that the proposed approach performs well.
|
2 |
串流資料分析在台灣股市指數期貨之應用 / An Application of Streaming Data Analysis on TAIEX Futures林宏哲, Lin, Hong Che Unknown Date (has links)
資料串流探勘是一個重要的研究領域,因為在現實中有許多重要的資料以串流的形式產生或被收集,金融市場的資料常常是一種資料串流,而通常這類型資料的本質是變動性大的。在這篇論文中我們運應了資料串流探勘的技術去預測台灣加權指數期貨的漲跌。對機器而言,預測期貨這種資料串流並不容易,而困難度跟概念飄移的種類與程度或頻率有關。概念飄移表示資料的潛在分布改變,這造成預測的準確率會急遽下降,因此我們專注在如何處理概念飄移。首先我們根據實驗的結果推測台灣加權指數期貨可能存在高頻率的概念飄移。另外實驗結果指出,使用偵測概念飄移的演算法可以大幅改善預測的準確率,甚至對於原本表現不好的演算法都能有顯著的改善。在這篇論文中我們亦整理出專門處理各類概念飄移的演算法。此外,我們提出了一個多分類器演算法,有助於偵測「重複發生」類別的概念飄移。該演算法相比改進之前,其最大的特色在於不需要使用者設定每個子分類器的樣本數,而該樣本數是影響演算法的關鍵之一。 / Data stream mining is an important research field, because data is usually generated and collected in a form of a stream in many cases in the real world. Financial market data is such an example. It is intrinsically dynamic and usually generated in a sequential manner. In this thesis, we apply data stream mining techniques to the prediction of Taiwan Stock Exchange Capitalization Weighted Stock Index Futures or TAIEX Futures. Our goal is to predict the rising or falling of the futures. The prediction is difficult and the difficulty is associated with concept drift, which indicates changes in the underlying data distribution. Therefore, we focus on concept drift handling. We first show that concept drift occurs frequently in the TAIEX Futures data by referring to the results from an empirical study. In addition, the results indicate that a concept drift detection method can improve the accuracy of the prediction even when it is used with a data stream mining algorithm that does not perform well. Next, we explore methods that can help us identify the types of concept drift. The experimental results indicate that sudden and reoccurring concept drift exist in the TAIEX Futures data. Moreover, we propose an ensemble based algorithm for reoccurring concept drift. The most characteristic feature of the proposed algorithm is that it can adaptively determine the chunk size, which is an important parameter for other concept drift handling algorithms.
|
Page generated in 0.0277 seconds