Collaborative filtering based recommender systems use information about a user's preferences to make personalized predictions about content, such as topics, people, or products, that they might find relevant. As the volume of accessible information and active users on the Internet continues to grow, it becomes increasingly difficult to compute recommendations quickly and accurately over a large dataset. In this study, we will introduce an algorithmic framework built on top of Apache Spark for parallel computation of the neighborhood-based collaborative filtering problem, which allows the algorithm to scale linearly with a growing number of users. We also investigate several different variants of this technique including user and item-based recommendation approaches, correlation and vector-based similarity calculations, and selective down-sampling of user interactions. Finally, we provide an experimental comparison of these techniques on the MovieLens dataset consisting of 10 million movie ratings.
Identifer | oai:union.ndltd.org:CLAREMONT/oai:scholarship.claremont.edu:cmc_theses-1914 |
Date | 01 January 2014 |
Creators | Casey, Walker Evan |
Publisher | Scholarship @ Claremont |
Source Sets | Claremont Colleges |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | CMC Senior Theses |
Rights | © 2014 Walker Evan Casey |
Page generated in 0.003 seconds