Spelling suggestions: "subject:"priori 演算法"" "subject:"apriori 演算法""
1 |
一個基於記憶體內運算之多維度多顆粒度資料探勘之研究-以yahoo user profile為例 / A Research of Multi-dimensional and Multigranular Data Mining with In-memory Computingwith yahoo user profile林洸儂, Lin, Guang-Nung Unknown Date (has links)
近年來雲端運算技術的發展與電腦設備效能提升,使得以大量電腦主機以水
平擴充的方式組成叢集運算系統,成為一可行的選擇。Apache Hadoop 是Apache
基金會的一個開源軟體框架,它是由Google 公司的MapReduce 與Google 檔案
系統實作成的分布式系統,可以管理數千台以上的電腦群集。Hadoop 利用分散
式檔案系統HDFS 可以提供PB 級以上的資料存放空間,透過MapReduce 框架
可以將應用程式分割成小工作分散到叢集中的運算節點上執行。
此外,企業累積了巨量的資料,如何處理與分析這些結構化或者是非結構化
的資料成了現在熱門研究的議題。因此傳統的資料挖掘方式與演算法必須因應新
的雲端運算技術與分散式框架的概念,進行調整與改良,發展新的方法。
關聯規則是分析資料庫龐大的資料中,項目之間隱含的關聯,常見的應用為
購物籃分析。一般情形下會在特定的維度與特定的顆粒度範圍內挖掘關聯規則,
但這樣的方式無法找出更細微範圍下之規則,例如挖掘一個年度的交易資料無法
發現消費者在聖誕節為了慶祝而購買的商品項目間的規則,但若將時間限縮在
12 月份即可挖掘出這些規則。
Apriori 演算法是挖掘關聯規則的一個著名的演算法,透過產生候選項目集
合與使用自訂的最小支持度進行篩選,產生高頻項目集合,接著以最小信賴度篩
選獲得關聯規則的結果。若有k 種單一項目集合,則候選項目集合最多有2𝑘 − 1
個,計算高頻項目時則需反覆掃描整個資料庫,Apriori 這兩個主要步驟需要耗費
相當大量的運算能力。
因此本研究將資料庫分割成多個資料區塊挖掘關聯規則,再將結果逐步更新
的演算法,解決大範圍挖掘遺失關聯規則的問題,結合spark 分散式運算的架構
實作程式,在電腦群集上平行運算減少關聯規則的挖掘時間。 / Because of improving technique of cloud-computing and increasing capability of
computer equipment, it is feasible to use clusters of computers by horizon scalable a lot
of computers. Apache Hadoop is an open-source software of Apache. It allows the
management of cluster resource, a distributed storage system named Hadoop
Distributed File System (HDFS), and a parallel processing technique called
MapReduce.
Enterprises have accumulated a huge amount of data. It is a hot issue to process
and analyze these structured or unstructured data. Traditional methods and algorithms
of data mining must make adjustments and improvement to new cloud computing
technology and concept of decentralized framework.
Association rules is the relations of items from large database. In general, we find
association rules in fixed dimensions and granular database. However, it might loss
infrequent association rules.
Apriori algorithm is one famous algorithm of mining association rule. There are
two main steps in this algorithm spend a lot of computing resource. To generate
Candidate itemset has quantity 2𝑘 − 1, if there are k different item. Second step is to
find frequent, this step must compare all tractions in the database.
This approach divides database to segmentations and finds association rules of
these segmentations. Then, we combine rules of segmentations. It can solve the problem
of missing infrequent itemset. In addition, we implement this method in Spark and
reduce the time of computing.
|
2 |
以雲端運算之概念建構資料採礦中關聯規則與集群分析系統 / Construct a concept of cloud computing and data mining system with association rules and clustering analysis賴建佑 Unknown Date (has links)
雲端運算和資料採礦已成為這二十一世紀的重要發展方向,綜觀現今各個生活層面,已漸漸的融合雲端計算的技術,故結合雲端運算已是一種趨勢。簡而談之,雲端運算是一種讓使用者更加地快速、便利又省成本的一種技術。而資料採礦方面,也已從先前的專門挖掘數字型態的資料,到現在多元的挖掘,像是文字、圖像採礦。資料採礦雖然比雲端運算發展的早,但是其功用是可以相輔相成的,有鑑於此,本研究係要發展出一資料採礦分析系統,使得使用者方便又簡易的操作。並針對特定的資料採礦分析方法-關聯規則及集群分析去研究,並利用Apriori 演算法及K-means方法,和Microsoft Excel VBA和R軟體共同結合出此資料採礦系統。
|
3 |
資料挖掘在房地產價格上之運用 / Data Mining Technique with an Application to the Real Estate Price Estimation高健維 Unknown Date (has links)
在現今資訊潮流中,企業的龐大資料庫可藉由統計及人工智慧的科學技術尋找出有價值的隱藏事件。利用資料做深入分析,找出其中的知識,並根據企業的問題,建立不同的模型,進而提供企業進行決策時的參考依據。資料挖掘的工作是近年來資料庫應用領域中相當熱門的議題。它雖是個神奇又時髦的技術,卻不是一門創新的學問。美國政府在第二次世界大戰前,就於人口普查以及軍事方面使用資料挖掘的分析方法。隨著資訊科技的進展,新工具的出現,以及網路通訊技術的發展,常常能超越歸納範圍的關係來執行資料挖掘,而由資料堆中挖掘寶藏,使資料挖掘成為企業智慧的一部份。在本篇論文當中,將資料挖掘技術中的關聯法則 ( Association Rule ) 運用至房地產的價格分析,進而提供有效的關聯法則,對於複雜之房價與週邊環境因素作一整合探討。購屋者將有一適當依循的投資計畫,房產業者亦可針對適當的族群做出適當的銷售企畫。 / At this technological stream of time, it is able to extract the value of corporations’ large data sets by applying the knowledge of statistics and the scientific techniques from artificial intelligence. Through the use of these algorithms, the database will be analyzed and its knowledge will be generated. In addition to these, data models will be sorted by different corporation issues resulting in the reference for any strategic decision processes. More advantages are the predictions of future events and how much public is willing to contribute and feedback to new products or promotions. The probability of outcomes will be helpful as references since this information is referable to ensure companies providing quality services at the right time. In another words, companies will have clues in attempts to understand and familiarize their customers’ needs, wants and behaviors, as a result of delivering best services for customers’ satisfactions. Data mining is such a new knowledge that is commonly discussed in the field of database applications. Although it is a relatively new term, the technology is not exactly due to the analysis methods used. Before World War II, the analysis techniques were used in particular to the statistics in census or cases related to military affairs by the US government. Knowledge discovery has been one part of business intelligence in current corporations because these new techniques are inherently geared towards explicit information, rather than just simple analysis. By applying association rules from knowledge discovery technology, this dissertation will provide a discussion of price estimation in real estates. This discussion is involved in investigations into diverse housing prices resulting from the factors of surrounding environment. By referring to this association rule, buyers will acquire information about investment plans while housing agents will gain knowledge for their plans or projects in particular to their target markets.
|
Page generated in 0.0297 seconds