隨着在線電子商務網站,音樂視頻網站和社會性共享推薦網站的迅速發展,網站用戶面臨爆炸性增長的選擇。前所未見的大量選擇導致信息過載問題。信息過載問題是指由於存在數量巨大的信息,用戶不能有效的理解並做出選擇的問題。推薦系統是解決信息過載問題的一個關鍵組成部分。過去數十年,推薦系統技術有了長足的進步。研究重點又基於臨近用戶的方法向基於模型的方法過度。然而,推薦系統仍然不夠成熟完善。在本論文中,我們基於真實生活中遇到的問題提出改善推薦系統的方法。 / 首先,我們提出推薦系統的在線學習算法。傳統推薦系統使用批量式學習算法進行訓練。這些方法容易理解並且容易實現。然而批量式學習算法不能有效應對當今推薦系統所面臨的動態情況。新的用戶和新的物品不斷加入推薦系統。在批量式學習算法框架下,要將這些新用戶新物品納入系統,需要對所有數據進行重新學習。另外,在批量式學習算法的每一個步驟中,我們需要處理所有的數據。在現今推薦系統規模下,這通常是非常耗時的。在線學習算法可以通過對每一個數據點調整模型而解決上述兩個問題。 / 其次,我們深入調查大量推薦系統所作的一個假設。該假設默認推薦系統蒐集的打分數據的分佈和未蒐集到的打分數據的分佈是完全一致的。我們使用在真實推薦系統中蒐集的數據證明這個假設極不可能爲真。使用失數據理論的方法,我們提出一個不基於改假設的模型。我們的模型放棄了這個假設並且能夠得到公正的推薦。 / 再次,我們詳細調研推薦系統中的垃圾用戶問題。垃圾用戶的打分會污染正常用戶的數據並導致正常用戶的體驗受到影響。我們提出使用用戶聲譽系統去記錄用戶的聲譽並利用用戶的聲譽去區分垃圾用戶和正常用戶。我們提出一個聲譽生成系統的框架。許多聲譽生成系統是我們聲譽生成系統框架的一個實做。基於該框架,我們還提出一個基於矩陣分解的用戶聲譽生成系統。該系統擁有出衆的分辨垃圾用戶的能力。 / 最後,我們將基於內容的推薦和協同過濾推薦有機結合以便減輕乃至解決冷啓動問題。冷啓動問題是指推薦系統中關於某個用戶或物品的信息是如此之少以至於系統不能對該用戶或改物品做出有效的推薦。用戶的文字性評價中通常包含大量用戶喜好和物品屬性信息。但用戶的文字性評價通常都被直接棄。我們提出一個同時使用基於內容的方法去處理用戶文字性評價信息,使用協同過濾方法處理用戶打分的整合式推薦模型。我們的模型能有效減輕冷啓動問題的影響並且對黑盒協同過濾算法提供可理解的詞彙標籤。這些標籤有助於幫助推薦系統提供推薦的原因。 / 綜上所述,在本論文中我們解決了推薦系統面臨的實際問題並從各個方面對傳統推薦系統進行改進。大量真實數據上的實驗驗證我們提出方法的有效性和高效性。 / With the rapid development of e-commerce websites, music and video streaming websites and social sharing websites, users are facing an explosion of choices nowadays. The presence of unprecedentedly large amount of choices leads to the information overload problem, which refers to the difficulty a user faces in understanding an issue and making decisions that are caused by the presence of too much information. Recommender systems learn users’ preferences based on past behaviors and make suggestions for them. These systems are the key component to alleviate and solve the information overload problem. Encouraging progress has been achieved in the research of recommender systems from neighborhood-based methods to model-based methods. However, recommender systems employed today are far from perfect. In this thesis, we propose to improve the recommender systems from four perspectives motivated by real life problems. / First and foremost, we develop online algorithms for collaborative filtering methods, which are widely applicable to recommender systems. Traditionally batch-training algorithms are developed for collaborative filtering methods. They enjoy the advantage of easy to understand and simple to implement. However, the batch-training algorithms fail to consider the dynamic scenario where new users and new items join the system constantly. In order to make recommendations for these new users and on these new items, batchtraining algorithms need to re-train the model from scratch. During the training process of batch-training algorithms, all the data have to be processed in each iteration. This is prohibitively slow given the sheer size of users and items faced by a real recommender system. Online learning algorithms can solve both of the problems by updating the model incrementally based on a rating point. / Secondly, we question an assumption made implicitly by most recommender systems. Most existing recommender systems assume that the rating distribution of collected ratings and that of the unobserved ratings are the same. Using data collected from a real life recommender system, we show that this assumption is unlikely to be true. By employing the powerful missing data theory, we develop a model that drops this unrealistic assumption and makes unbiased predictions. / Thirdly we examine the spam problem confronted by recommender systems. The ratings assigned by spam users contaminate the data of a recommender system and lead to deteriorated experience for normal users. We propose to use a reputation estimation system to keep track of users’ reputations and identify spam users based on their reputations. We develop a unified framework for reputation estimation that subsumes a number of existing reputation estimation methods. Based on the framework, we also develop a matrix factorization based method that demonstrates outstanding discrimination ability. / Lastly, we integrate content-based filtering with collaborative filtering to alleviate the cold-start problem. The cold-start problem refers to the situation where a system has too little information concerning a user or an item to make accurate recommendations. With the readily available rich information embedded in review comments, which are generally discarded, we can alleviate the cold-start problem. Additionally, we can tag the black box collaborative filtering algorithm with interpretable tags that help a recommender system to provide reasons on why items are being recommended. / In summary, we solve some of the major problems faced by recommender systems and improve them from various perspectives in this thesis. Extensive experiments on real life large-scale datasets confirm the effectiveness and efficiency of proposed models. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Ling, Guang. / Thesis (Ph.D.) Chinese University of Hong Kong, 2015. / Includes bibliographical references (leaves 169-184). / Abstracts also in Chinese. / Ling, Guang.
Identifer | oai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_1077680 |
Date | January 2015 |
Contributors | Ling, Guang (author.), Lyu, Michael R. (thesis advisor.), Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering, (degree granting institution.) |
Source Sets | The Chinese University of Hong Kong |
Language | English, Chinese |
Detected Language | English |
Type | Text, bibliography, text |
Format | electronic resource], electronic resource, remote, 1 online resource (xv, 184 leaves) : illustrations (some color), computer, online resource |
Rights | Use of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/) |
Page generated in 0.0029 seconds