Spelling suggestions: "subject:"alternating least aquare"" "subject:"alternating least equare""
1 |
Leerec : A scalable product recommendation engine suitable for transaction data.Flodin, Anton January 2018 (has links)
We are currently living in the Internet of Things (IoT) era, which involves devices that are connected to Internet and are communicating with each other. Each year, the number of devices increases rapidly, which result in rapid growth of data that is generated. This large amount of data is sometimes titled as Big Data, which is generated from different sources, such as log data of user behavior. These log files can be collected and analyzed in different ways, such as creating product recommendations. Product recommendations have been around since the late 90s, when the amount of data collected were not at the same level as it is today. The aim of this thesis has been to investigating methods to process and create product recommendations to see how well they are adapted for Big Data. This has been accomplished by three theory studies on how to process user events, how to make the product recommendation algorithm called collaborative filtering scalable and finally how to convert implicit feedback to explicit feedback (ratings). This resulted in a recommendation engine consisting of Apache Spark as the data processing system, which had three functions: read multiple log files and concatenate log files for each month, parsing the log files of the user events to create explicit ratings from the transactions and create four types of recommendations. The NoSQL database MongoDB was chosen as the database to store the different types of product recommendations that was created. To be able to get the recommendations from the recommendation engine and the database, a REST API was implemented which can be used by any third-party. What can be concluded from the results of this thesis work is that the system that was implemented is partial scalable. This means that Apache Spark was scalable for both concatenating files, parse and create ratings and also create the recommendations using the ALS method. However, MongoDB was shown to be not scalable when managing more than 100 concurrent requests. Future work involves making the recommendation engine distributed in a multi-node cluster to utilize the parallelization of Apache Spark. Other recommendations include considering other NoSQL databases that might be more scalable than MongoDB.
|
2 |
三向資料的主成分分析 / 3-way data principal component analysis趙湘琪, Chao, Hsiang Chi Unknown Date (has links)
傳統的主成分分析(principal component analysis)法,只能分析二式二向的資料(2-mode 2-way data),若是要處裡三向三式的資料(3-mode 3-way data)或是更多維的資料,則必須用其它的方法。例如將某一向資料取平均數,再做分析。此法雖然可行,但卻忽略三向資料間可能潛藏的相關性。且社會科學的研究日趨複雜,三向資料也就更常見到,而我們可能也對三向資料間彼此的關聯感到興趣。因此在1960、1970年代,學者開始研究將主成分分析的模型加以擴展成適合分析三向資料的模型。本文除了介紹三向資料主成分分析所使用的Tucker3模型及其參數估計法外,也以28家股票上市公司為實例,探討資本結構影響因素於五年間(1989~1993年)在不同公司群組間的變化情形。
|
3 |
Recommender System for Gym CustomersSundaramurthy, Roshni January 2020 (has links)
Recommender systems provide new opportunities for retrieving personalized information on the Internet. Due to the availability of big data, the fitness industries are now focusing on building an efficient recommender system for their end-users. This thesis investigates the possibilities of building an efficient recommender system for gym users. BRP Systems AB has provided the gym data for evaluation and it consists of approximately 896,000 customer interactions with 8 features. Four different matrix factorization methods, Latent semantic analysis using Singular value decomposition, Alternating least square, Bayesian personalized ranking, and Logistic matrix factorization that are based on implicit feedback are applied for the given data. These methods decompose the implicit data matrix of user-gym group activity interactions into the product of two lower-dimensional matrices. They are used to calculate the similarities between the user and activity interactions and based on the score, the top-k recommendations are provided. These methods are evaluated by the ranking metrics such as Precision@k, Mean average precision (MAP) @k, Area under the curve (AUC) score, and Normalized discounted cumulative gain (NDCG) @k. The qualitative analysis is also performed to evaluate the results of the recommendations. For this specific dataset, it is found that the optimal method is the Alternating least square method which achieved around 90\% AUC for the overall system and managed to give personalized recommendations to the users.
|
Page generated in 0.0862 seconds