Return to search

運用kNN文字探勘分析智慧型終端App群集之研究 / The study of analyzing smart handheld device App's clusters by using kNN text mining

隨著智慧型終端設備日益普及,使用者對App需求逐漸增加,各大企業也因此開創了一種新的互動性行銷方式。同時,App下載所帶來的龐大商機也促使許多開發人員紛紛加入App的開發行列,造成App的數量呈現爆炸性成長,而讓使用者在面對種類繁多的App時,無法做出有效率的選擇。故本研究將透過文字探勘與kNN集群分析技術,分析網友發表的App推薦文並將App進行分群;再藉由參數的調整,期望能透過衡量指標的評估來獲得最佳品質之分群,以便作為使用者選擇App之參考依據。
為了使大量App進行分群以解決使用者「資訊超載」的問題,本研究以App Store之遊戲類App為分析對象,蒐集了439篇App推薦文章,並依App推薦對象之異同,將其合併成357篇App推薦文章;接著,透過文字探勘技術將文章轉換成可相互比較的向量空間模型,再利用kNN群集分析對其進行分群。同時,藉由參數組合中k值與文件相似度門檻值的調整來獲得最佳品質之分群;其分群品質的評估則透過平均群內相似度等指標來進行衡量;而為了提升分群品質,本研究採用「多階段分群」,以分群後各群集內的文章數量來判斷是否進行再分群或群集合併。
本研究結果顯示第一階段分群在k值為10、文件相似度門檻值為0.025時,能獲得最佳之分群品質。而在後續階段的分群過程中,因群集內文章數減少,故將k值降低並逐漸提高文件相似度門檻值以獲得分群效果。第二階段結束後,可針對已達到分群停止條件之群集進行關鍵詞彙萃取,並可歸類出「棒球/射擊」與「投擲飛行」等6種App類型;其後階段依循相同分群規則可獲得「守城塔防」等14種App類型。分群結束後,共可分出36個群集並獲得20種App類型。分群過程中,平均群內相似度逐漸增加;平均群間相似度則逐漸下降;分群品質衡量指標由第一階段分群後的12.65%提升到第五階段結束時的75.81%。
由本研究可知分群之後相似度高的App會逐漸聚集成群,所獲得之各群集命名結果將能作為使用者選擇App之參考依據;App軟體開發人員也能從各群集之關鍵詞彙中了解使用者所注重的遊戲元素,改善App內容以更符合使用者之需求。而以本研究結果為基礎,透過建立專業詞庫改善分群品質、利用文件摘要技術加強使用者對各群集之了解,或建立App推薦系統等皆可做為未來研究之方向。 / With the popularity of Smart Handheld Devices are increasing, the needs of “App” are spreading. Developers whom devote themselves to this opportunity are also rising, making the total number of Apps growing rapidly. Facing these kind of situation, users couldn’t choose the App they need efficiently. This research uses text mining and kNN Clustering technique analyzing the recommendation reviews of App by netizen then clustering the App recommendation articles; Through the adjustments of parameters, we expect to evaluate the measurement indicators to obtain the best quality cluster to use as a basis for users to select Apps.
In order to solve the information overload for the user, we analyzed apps of the “Games” category form App store and sorted out to 357 App recommendation articles to use as our analysis target. Then we used text mining technique to process the articles and uses kNN clustering analysis to sort out the articles. Simultaneously, we fine tuning the measurement indicators to find the optimal cluster. This research uses multi-phase clustering technique to assure the quality of each cluster.
We discriminate 36 clusters and 20 categories from the clustering results. During the clustering process, the Mean of Intra-cluster Similarity increases gradually; in the contrary, the Mean of Inter-cluster Similarity reduces. The “Cluster Quality” increases from 12.65% significantly to 75.81%. In conclusion, similar Apps will gradually been clustered by its similarities, and can be used to be a reference by its cluster’s name. The App developers can also understands the game elements which the users pay greater attentions and tailored their contents to match the needs of the users according to the key phrases from each cluster. In further discussion, building specialized terms database of App to improve the quality of the clustering, using summarization technique to robust user understanding of each cluster, or to build up App recommendation system is liking to be further studied via using the results by this research.

Identiferoai:union.ndltd.org:CHENGCHI/G0099356010
Creators曾國傑, Tseng, Kuo Chieh
Publisher國立政治大學
Source SetsNational Chengchi University Libraries
Language中文
Detected LanguageEnglish
Typetext
RightsCopyright © nccu library on behalf of the copyright holders

Page generated in 0.0021 seconds