Global ETD Search

1	Rule Generation for Datasets with Ordinal Class Attributes Gopal, Deepthi January 2015 (has links) No description available. Computer Science apriori algorithm
2	Identification of Discriminating Motifs in Heart Rate Time Series Data of Soccer Players Ravindranathan, Sampurna January 2018 (has links) No description available. Computer Science Wearable Devices Soccer Analytics Heart Rate Motifs Apriori Algorithm Feature Selection
3	一個基於記憶體內運算之多維度多顆粒度資料探勘之研究-以yahoo user profile為例 / A Research of Multi-dimensional and Multigranular Data Mining with In-memory Computingwith yahoo user profile 林洸儂, Lin, Guang-Nung Unknown Date (has links) 近年來雲端運算技術的發展與電腦設備效能提升，使得以大量電腦主機以水平擴充的方式組成叢集運算系統，成為一可行的選擇。Apache Hadoop 是Apache 基金會的一個開源軟體框架，它是由Google 公司的MapReduce 與Google 檔案系統實作成的分布式系統，可以管理數千台以上的電腦群集。Hadoop 利用分散式檔案系統HDFS 可以提供PB 級以上的資料存放空間，透過MapReduce 框架可以將應用程式分割成小工作分散到叢集中的運算節點上執行。此外，企業累積了巨量的資料，如何處理與分析這些結構化或者是非結構化的資料成了現在熱門研究的議題。因此傳統的資料挖掘方式與演算法必須因應新的雲端運算技術與分散式框架的概念，進行調整與改良，發展新的方法。關聯規則是分析資料庫龐大的資料中，項目之間隱含的關聯，常見的應用為購物籃分析。一般情形下會在特定的維度與特定的顆粒度範圍內挖掘關聯規則，但這樣的方式無法找出更細微範圍下之規則，例如挖掘一個年度的交易資料無法發現消費者在聖誕節為了慶祝而購買的商品項目間的規則，但若將時間限縮在 12 月份即可挖掘出這些規則。 Apriori 演算法是挖掘關聯規則的一個著名的演算法，透過產生候選項目集合與使用自訂的最小支持度進行篩選，產生高頻項目集合，接著以最小信賴度篩選獲得關聯規則的結果。若有k 種單一項目集合，則候選項目集合最多有2𝑘 − 1 個，計算高頻項目時則需反覆掃描整個資料庫，Apriori 這兩個主要步驟需要耗費相當大量的運算能力。因此本研究將資料庫分割成多個資料區塊挖掘關聯規則，再將結果逐步更新的演算法，解決大範圍挖掘遺失關聯規則的問題，結合spark 分散式運算的架構實作程式，在電腦群集上平行運算減少關聯規則的挖掘時間。 / Because of improving technique of cloud-computing and increasing capability of computer equipment, it is feasible to use clusters of computers by horizon scalable a lot of computers. Apache Hadoop is an open-source software of Apache. It allows the management of cluster resource, a distributed storage system named Hadoop Distributed File System (HDFS), and a parallel processing technique called MapReduce. Enterprises have accumulated a huge amount of data. It is a hot issue to process and analyze these structured or unstructured data. Traditional methods and algorithms of data mining must make adjustments and improvement to new cloud computing technology and concept of decentralized framework. Association rules is the relations of items from large database. In general, we find association rules in fixed dimensions and granular database. However, it might loss infrequent association rules. Apriori algorithm is one famous algorithm of mining association rule. There are two main steps in this algorithm spend a lot of computing resource. To generate Candidate itemset has quantity 2𝑘 − 1, if there are k different item. Second step is to find frequent, this step must compare all tractions in the database. This approach divides database to segmentations and finds association rules of these segmentations. Then, we combine rules of segmentations. It can solve the problem of missing infrequent itemset. In addition, we implement this method in Spark and reduce the time of computing. 關聯規則 Apriori 演算法資料挖掘 Association Rule Apriori Algorithm Data mining Hadoop Spark
4	DARM: Distance-Based Association Rule Mining Icev, Aleksandar 06 May 2003 (has links) The main goal of this thesis work was to develop, implement and evaluate an algorithm that enables mining association rules from datasets that contain quantified distance information among the items. This was accomplished by extending and enhancing the Apriori Algorithm, which is the standard algorithm to mine association rules. The Apriori algorithm is not able to mine association rules that contain distance information among the items that construct the rules. This thesis enhances the main Apriori property by requiring itemsets forming rules to“deviate properly" in addition to satisfying the minimal support threshold. We say that an itemset deviates properly if all combinations of pair-wise distances among the items are highly conserved in the dataset instances where these items occur. This thesis introduces the notion of proper deviation and provides the precise procedure and measures that characterize it. Integrating the notion of distance preserving frequent itemset and proper deviation into the standard Apriori algorithm leads to the construction of our Distance-Based Association Rule Mining (DARM) algorithm. DARM can be applied in data mining and knowledge discovery from genetic, financial, retail, time sequence data, or any domain where the distance information between items is of importance. This thesis chose the area of gene expression and regulation in eukaryotic organisms as the application domain. The data from the domain was used to produce DARM rules. Sets of those rules were used for building predictive models. The accuracy of those models was tested. In addition, predictive accuracies of the models built with and without distance information were compared. spatial data mining distance-based association rules distance-based Apriori algorithm Data mining Gene expression Data processing Eukaryotic cells
5	Mining Association Rules For Quality Related Data In An Electronics Company Kilinc, Yasemin 01 March 2009 (has links) (PDF) Quality has become a central concern as it has been observed that reducing defects will lower the cost of production. Hence, companies generate and store vast amounts of quality related data. Analysis of this data is critical in order to understand the quality problems and their causes, and to take preventive actions. In this thesis, we propose a methodology for this analysis based on one of the data mining techniques, association rules. The methodology is applied for quality related data of an electronics company. Apriori algorithm used in this application generates an excessively large number of rules most of which are redundant. Therefore we implement a three phase elimination process on the generated rules to come up with a reasonably small set of interesting rules. The approach is applied for two different data sets of the company, one for production defects and one for raw material non-conformities. We then validate the resultant rules using a test data set for each problem type and analyze the final set of rules.
6	Data Mining For Rule Discovery In Relational Databases Toprak, Serkan 01 September 2004 (has links) (PDF) Data is mostly stored in relational databases today. However, most data mining algorithms are not capable of working on data stored in relational databases directly. Instead they require a preprocessing step for transforming relational data into algorithm specified form. Moreover, several data mining algorithms provide solutions for single relations only. Therefore, valuable hidden knowledge involving multiple relations remains undiscovered. In this thesis, an implementation is developed for discovering multi-relational association rules in relational databases. The implementation is based on a framework providing a representation of patterns in relational databases, refinement methods of patterns, and primitives for obtaining necessary record counts from database to calculate measures for patterns. The framework exploits meta-data of relational databases for pruning search space of patterns. The implementation extends the framework by employing Apriori algorithm for further pruning the search space and discovering relational recursive patterns. Apriori algorithm is used for finding large itemsets of tables, which are used to refine patterns. Apriori algorithm is modified by changing support calculation method for itemsets. A method for determining recursive relations is described and a solution is provided for handling recursive patterns using aliases. Additionally, continuous attributes of tables are discretized utilizing equal-depth partitioning. The implementation is tested with gene localization prediction task of KDD Cup 2001 and results are compared to those of the winner approach.
7	Análise associativa: identificação de padrões de associação entre o perfil socioeconômico dos alunos do ensino básico e os resultados nas provas de matemática / Association analysis: identification of patterns related to the socioeconomic profiles Lyvia Aloquio 20 February 2014 (has links) Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Nos dias atuais, a maioria das operações feitas por empresas e organizações é armazenada em bancos de dados que podem ser explorados por pesquisadores com o objetivo de se obter informações úteis para auxílio da tomada de decisão. Devido ao grande volume envolvido, a extração e análise dos dados não é uma tarefa simples. O processo geral de conversão de dados brutos em informações úteis chama-se Descoberta de Conhecimento em Bancos de Dados (KDD - Knowledge Discovery in Databases). Uma das etapas deste processo é a Mineração de Dados (Data Mining), que consiste na aplicação de algoritmos e técnicas estatísticas para explorar informações contidas implicitamente em grandes bancos de dados. Muitas áreas utilizam o processo KDD para facilitar o reconhecimento de padrões ou modelos em suas bases de informações. Este trabalho apresenta uma aplicação prática do processo KDD utilizando a base de dados de alunos do 9 ano do ensino básico do Estado do Rio de Janeiro, disponibilizada no site do INEP, com o objetivo de descobrir padrões interessantes entre o perfil socioeconômico do aluno e seu desempenho obtido em Matemática na Prova Brasil 2011. Neste trabalho, utilizando-se da ferramenta chamada Weka (Waikato Environment for Knowledge Analysis), foi aplicada a tarefa de mineração de dados conhecida como associação, onde se extraiu regras por intermédio do algoritmo Apriori. Neste estudo foi possível descobrir, por exemplo, que alunos que já foram reprovados uma vez tendem a tirar uma nota inferior na prova de matemática, assim como alunos que nunca foram reprovados tiveram um melhor desempenho. Outros fatores, como a sua pretensão futura, a escolaridade dos pais, a preferência de matemática, o grupo étnico o qual o aluno pertence, se o aluno lê sites frequentemente, também influenciam positivamente ou negativamente no aprendizado do discente. Também foi feita uma análise de acordo com a infraestrutura da escola onde o aluno estuda e com isso, pôde-se afirmar que os padrões descobertos ocorrem independentemente se estes alunos estudam em escolas que possuem infraestrutura boa ou ruim. Os resultados obtidos podem ser utilizados para traçar perfis de estudantes que tem um melhor ou um pior desempenho em matemática e para a elaboração de políticas públicas na área de educação, voltadas ao ensino fundamental. / Nowadays, most of the transactions made by companies and organizations is stored in databases that can be explored by researchers in order to obtain useful information to aid decision making. Due to the large volume involved, the extraction and analysis of data is not a simple task. The general process of converting raw data into useful information is called Knowledge Discovery in Databases (KDD). One step in this process is the Data Mining, which involves the application of algorithms and statistical techniques to exploit information contained implicitly in large databases. Many areas use the KDD process to facilitate the recognition of patterns or models on their bases of information. This work presents a practical application of KDD process using the database of students in the 9th grade of elementary education in the State of Rio de Janeiro, available in INEP site, with the aim of finding interesting patterns between the socioeconomic profile of the student and his/her performance obtained in Mathematics. The tool called Weka was used and the Apriori algorithm was applied to extracting association rules. This study revealed, for example, that students who have been reproved once tend to get a lower score on the math test, as well as students who had never been disapproved have had superior performance. Other factors like student future perspectives, ethnic group, parent's schooling, satisfaction in mathematics studying, and the frequency of access to Internet also affect positively or negatively the students learning. An analysis related to the schools infrastructure was made, with the conclusion that patterns do not change regardless of the student studying in good or bad infrastructure schools. The results obtained can be used to trace the students profiles which have a better or a worse performance in mathematics and to the development of public policies in education, aimed at elementary education. Aprendizagem de matemática Algoritmo Apriori Mineração de dados Regras de associação Learning of Mathematics Association rules Apriori Algorithm Data mining MATEMATICA APLICADA
8	Análise associativa: identificação de padrões de associação entre o perfil socioeconômico dos alunos do ensino básico e os resultados nas provas de matemática / Association analysis: identification of patterns related to the socioeconomic profiles Lyvia Aloquio 20 February 2014 (has links) Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Nos dias atuais, a maioria das operações feitas por empresas e organizações é armazenada em bancos de dados que podem ser explorados por pesquisadores com o objetivo de se obter informações úteis para auxílio da tomada de decisão. Devido ao grande volume envolvido, a extração e análise dos dados não é uma tarefa simples. O processo geral de conversão de dados brutos em informações úteis chama-se Descoberta de Conhecimento em Bancos de Dados (KDD - Knowledge Discovery in Databases). Uma das etapas deste processo é a Mineração de Dados (Data Mining), que consiste na aplicação de algoritmos e técnicas estatísticas para explorar informações contidas implicitamente em grandes bancos de dados. Muitas áreas utilizam o processo KDD para facilitar o reconhecimento de padrões ou modelos em suas bases de informações. Este trabalho apresenta uma aplicação prática do processo KDD utilizando a base de dados de alunos do 9 ano do ensino básico do Estado do Rio de Janeiro, disponibilizada no site do INEP, com o objetivo de descobrir padrões interessantes entre o perfil socioeconômico do aluno e seu desempenho obtido em Matemática na Prova Brasil 2011. Neste trabalho, utilizando-se da ferramenta chamada Weka (Waikato Environment for Knowledge Analysis), foi aplicada a tarefa de mineração de dados conhecida como associação, onde se extraiu regras por intermédio do algoritmo Apriori. Neste estudo foi possível descobrir, por exemplo, que alunos que já foram reprovados uma vez tendem a tirar uma nota inferior na prova de matemática, assim como alunos que nunca foram reprovados tiveram um melhor desempenho. Outros fatores, como a sua pretensão futura, a escolaridade dos pais, a preferência de matemática, o grupo étnico o qual o aluno pertence, se o aluno lê sites frequentemente, também influenciam positivamente ou negativamente no aprendizado do discente. Também foi feita uma análise de acordo com a infraestrutura da escola onde o aluno estuda e com isso, pôde-se afirmar que os padrões descobertos ocorrem independentemente se estes alunos estudam em escolas que possuem infraestrutura boa ou ruim. Os resultados obtidos podem ser utilizados para traçar perfis de estudantes que tem um melhor ou um pior desempenho em matemática e para a elaboração de políticas públicas na área de educação, voltadas ao ensino fundamental. / Nowadays, most of the transactions made by companies and organizations is stored in databases that can be explored by researchers in order to obtain useful information to aid decision making. Due to the large volume involved, the extraction and analysis of data is not a simple task. The general process of converting raw data into useful information is called Knowledge Discovery in Databases (KDD). One step in this process is the Data Mining, which involves the application of algorithms and statistical techniques to exploit information contained implicitly in large databases. Many areas use the KDD process to facilitate the recognition of patterns or models on their bases of information. This work presents a practical application of KDD process using the database of students in the 9th grade of elementary education in the State of Rio de Janeiro, available in INEP site, with the aim of finding interesting patterns between the socioeconomic profile of the student and his/her performance obtained in Mathematics. The tool called Weka was used and the Apriori algorithm was applied to extracting association rules. This study revealed, for example, that students who have been reproved once tend to get a lower score on the math test, as well as students who had never been disapproved have had superior performance. Other factors like student future perspectives, ethnic group, parent's schooling, satisfaction in mathematics studying, and the frequency of access to Internet also affect positively or negatively the students learning. An analysis related to the schools infrastructure was made, with the conclusion that patterns do not change regardless of the student studying in good or bad infrastructure schools. The results obtained can be used to trace the students profiles which have a better or a worse performance in mathematics and to the development of public policies in education, aimed at elementary education. Aprendizagem de matemática Algoritmo Apriori Mineração de dados Regras de associação Learning of Mathematics Association rules Apriori Algorithm Data mining MATEMATICA APLICADA
9	Development of a data-driven marketing strategy for an online pharmacy Holmér, Gelaye Worku, Gamage, Ishara H. January 2022 (has links) The term electronic commerce (e-commerce) refers to a business model that allows companies and individuals to buy and sell goods and services over the internet. The focus of this thesis is on online pharmacies, a segment of the ecommerce market. Even though internet pharmacies are still subject to the same stringent rules imposed on pharmacies that limit the scope for their market growth, it has shown a notable increase in the past decades. The main goal of this thesis is to develop a data-driven marketing strategy based on a Swedish based online pharmacy’s daily sales data. The methodology of the data analysis includes exploratory data analysis (EDA) and market basket analysis (MBA) using the Apriori algorithm and the application of marketing frameworks and theories from a data-driven standpoint. In addition to the data analysis, this paper proposes a conceptual framework of a digital marketing strategy based on the RACE framework (reach, act, convert, and engage). The result of the analysis has led to the following data-driven marketing strategy: Special attention should be paid to association rules with a high lift ration value; high gross profit margin percentile (GPMP) products should have a volume-based marketing strategy that focuses on lower prices on subsequent items; and price bundling is the best marketing strategy for low GPMP products. Some of the practical ideas mentioned in this thesis paper include optimizing keyword search for a high GPMP product type and sending reminder emails and push alerts to avoid cart abandonment. The findings and recommendations presented in this thesis can be used by online pharmacies to extract knowledge that may support several decisions ranging from raising overall order size, marketing campaigns, to increasing the sales of products with a high gross profit margin. Online pharmacies data-driven marketing strategies market basket analysis exploratory analysis Apriori algorithm Computer and Information Sciences Data- och informationsvetenskap
10	應用記憶體內運算於多維度多顆粒度資料探勘之研究―以醫療服務創新為例 / A Research Into In-memory Computing In Multidimensional, Multi-granularity Data Mining ― With Healthcare Services Innovation 朱家棋, Chu, Chia Chi Unknown Date (has links) 全球面臨人口老化與人口不斷成長的壓力下，對於醫療服務的需求不斷提升。醫療服務領域中常以資料探勘「關聯規則」分析，挖掘隱藏在龐大的醫學資料庫中的知識(knowledge)，以支援臨床決策或創新醫療服務。隨著醫療服務與應用推陳出新(如，電子健康紀錄或行動醫療等)，與醫療機構因應政府政策需長期保存大量病患資料，讓醫療領域面臨如何有效的處理巨量資料。然而傳統的關聯規則演算法，其效能上受到相當大的限制。因此，許多研究提出將關聯規則演算法，在分散式環境中，以Hadoop MapReduce框架實現平行化處理巨量資料運算。其相較於單節點 (single-node) 的運算速度確實有大幅提升。但實際上，MapReduce並不適用於需要密集迭帶運算的關聯規則演算法。本研究藉由Spark記憶體內運算框架，在分散式叢集上實現平行化挖掘多維度多顆粒度挖掘關聯規則，實驗結果可以歸納出下列三點。第一點，當資料規模小時，由於平行化將資料流程分為Map與Reduce處理，因此在小規模資料處理上沒有太大的效益。第二點，當資料規模大時，平行化策略模式與單機版有明顯大幅度差異，整體運行時間相差100倍之多；然而當項目個數大於1萬個時，單機版因記憶體不足而無法運行，但平行化策略依舊可以運行。第三點，整體而言Spark雖然在小規模處理上略慢於單機版的速度，但其運行時間仍小於Hadoop的4倍。大規模處理速度上Spark依舊優於Hadoop版本。因此，在處理大規模資料時，就運算效能與擴充彈性而言，Spark都為最佳化解決方案。 / Under the population aging and population growth and rising demand for Healthcare. Healthcare is facing a big issue how to effectively deal with huge amounts of data. Cased by new healthcare services or applications (such as electronic health records or health care, etc), and also medical institutions in accordance with government policy for long-term preservation of a large number of patient data. But the traditional algorithms for mining association rules, subject to considerable restrictions on their effectiveness. Therefore, many studies suggest that the association rules algorithm in a distributed computing, such as Hadoop MapReduce framework implements parallel to process huge amounts of data operations. But in fact, MapReduce does not apply to require intensive iterative computation algorithm of association rules. Studied in this Spark in-memory computing framework, implemented on a distributed cluster parallel mining association rules mining multidimensional granularity, the experimental results can be summed up in the following three points. 1th, when data is small, due to the parallel data flow consists of Map and Reduce, so not much in the small-scale processing of benefits. 2nd, when the data size is large, parallel strategy models and stand-alone obviously significant differences overall running time is 100 times as much when the item number is greater than 10,000, however, stand-alone version cannot run due to insufficient memory, but parallel strategies can still run. 3rd, overall Spark though somewhat slower than the single version in small scale processing speed, but the running time is less than 4 times times the Hadoop. Massive processing speed Spark is still superior to the Hadoop version. Therefore, when working with large data, operational efficiency and expansion elasticity, Spark for optimum solutions. 資料探勘多維度關聯分析記憶體內運算創新醫療服務應用 Data Mining In-Memory Computing Apriori Algorithm

Search results