Global ETD Search

1	New approaches to weighted frequent pattern mining Yun, Unil 25 April 2007 (has links) Researchers have proposed frequent pattern mining algorithms that are more efficient than previous algorithms and generate fewer but more important patterns. Many techniques such as depth first/breadth first search, use of tree/other data structures, top down/bottom up traversal and vertical/horizontal formats for frequent pattern mining have been developed. Most frequent pattern mining algorithms use a support measure to prune the combinatorial search space. However, support-based pruning is not enough when taking into consideration the characteristics of real datasets. Additionally, after mining datasets to obtain the frequent patterns, there is no way to adjust the number of frequent patterns through user feedback, except for changing the minimum support. Alternative measures for mining frequent patterns have been suggested to address these issues. One of the main limitations of the traditional approach for mining frequent patterns is that all items are treated uniformly when, in reality, items have different importance. For this reason, weighted frequent pattern mining algorithms have been suggested that give different weights to items according to their significance. The main focus in weighted frequent pattern mining concerns satisfying the downward closure property. In this research, frequent pattern mining approaches with weight constraints are suggested. Our main approach is to push weight constraints into the pattern growth algorithm while maintaining the downward closure property. We develop WFIM (Weighted Frequent Itemset Mining with a weight range and a minimum weight), WLPMiner (Weighted frequent Pattern Mining with length decreasing constraints), WIP (Weighted Interesting Pattern mining with a strong weight and/or support affinity), WSpan (Weighted Sequential pattern mining with a weight range and a minimum weight) and WIS (Weighted Interesting Sequential pattern mining with a similar level of support and/or weight affinity) The extensive performance analysis shows that suggested approaches are efficient and scalable in weighted frequent pattern mining. Data mining Frequent pattern mining
2	Efficient frequent pattern mining from big data and its applications Jiang, Fan January 2016 (has links) Frequent pattern mining is an important research areas in data mining. Since its introduction, it has drawn attention of many researchers. Consequently, many algorithms have been proposed. Popular algorithms include level-wise Apriori based algorithms, tree based algorithms, and hyperlinked array structure based algorithms. While these algorithms are popular and beneficial due to some nice properties, they also suffer from some drawbacks such as multiple database scans, recursive tree constructions, or multiple hyperlink adjustments. In the current era of big data, high volumes of a wide variety of valuable data of different veracities can be easily collected or generated at high velocity in various real-life applications. Among these 5V's of big data, I focus on handling high volumes of big data in my Ph.D. thesis. Specifically, I design and implement a new efficient frequent pattern mining algorithmic technique called B-mine, which overcomes some of the aforementioned drawbacks and achieves better performance when compared with existing algorithms. I also extend my B-mine algorithm into a family of algorithms that can perform big data mining efficiently. Moreover, I design four different frameworks that apply this family of algorithms to the real-life application of social network mining. Evaluation results show the efficiency and practicality of all these algorithms. / February 2017 Frequent Pattern Mining Social Network Mining
3	Using Association Analysis for Medical Diagnoses Nunna, Shinjini 01 January 2016 (has links) In order to fully examine the application of association analysis to medical data for the purpose of deriving medical diagnoses, we survey classical association analysis and approaches, the current challenges faced by medical association analysis and proposed solutions, and finally culminate this knowledge in a proposition for the application of medical association analysis to the identification of food intolerance. The field of classical association analysis has been well studied since its introduction in the seminal paper on market basket research in the 1990's. While the theory itself is relatively simple, the brute force approach is prohibitively expensive and thus, creative approaches utilizing various data structures and strategies must be explored for efficiency. Medical association analysis is a burgeoning field with various focuses, including diagnosis systems and gene analysis. There are a number of challenges faced in the field, primarily stemming from characteristics of analysis of complex, voluminous and high dimensional medical data. We examine the challenges faced in the pre-processing, analysis and post-processing phases, and corresponding solutions. Additionally, we survey proposed measures for ensuring the results of medical association analysis will hold up to medical diagnosis standards. Finally, we explore how medical association analysis can be utilized to identify food intolerances. The proposed analysis system is based upon a current method of diagnosis used by medical professionals, and seeks to eliminate manual analysis, while more efficiently and intelligently identifying interesting, and less obvious patterns between patients' food consumption and symptoms to propose a food intolerance diagnosis. Association Analysis Frequent Pattern Mining Databases and Information Systems
4	Discovering Co-Location Patterns and Rules in Uncertain Spatial Datasets Adilmagambetov, Aibek Unknown Date No description available. co-location mining frequent pattern mining uncertain spatial datasets
5	Frequent Pattern Mining among Weighted and Directed Graphs Cederquist, Aaron January 2009 (has links) No description available. Computer Science data mining graph canonical form frequent pattern mining
6	A Classification System For The Problem Of Protein Subcellular Localization Alay, Gokcen 01 September 2007 (has links) (PDF) The focus of this study is on predicting the subcellular localization of a protein. Subcellular localization information is important for protein function annotation which is a fundamental problem in computational biology. For this problem, a classification system is built that has two main parts: a predictor that is based on a feature mapping technique to extract biologically meaningful information from protein sequences and a client/server architecture for searching and predicting subcellular localizations. In the first part of the thesis, we describe a feature mapping technique based on frequent patterns. In the feature mapping technique we describe, frequent patterns in a protein sequence dataset were identified using a search technique based on a priori property and the distribution of these patterns over a new sample is used as a feature vector for classification. The effect of a number of feature selection methods on the classification performance is investigated and the best one is applied. The method is assessed on the subcellular localization prediction problem with 4 compartments (Endoplasmic reticulum (ER) targeted, cytosolic, mitochondrial, and nuclear) and the dataset is the same used in P2SL. Our method improved the overall accuracy to 91.71% which was originally 81.96% by P2SL. In the second part of the thesis, a client/server architecture is designed and implemented based on Simple Object Access Protocol (SOAP) technology which provides a user-friendly interface for accessing the protein subcellular localization predictions. Client part is in fact a Cytoscape plug-in that is used for functional enrichment of biological networks. Instead of the individual use of subcellular localization information, this plug-in lets biologists to analyze a set of genes/proteins under system view. QA Computer Software 76.75-76.765
7	Dažnų sekų analizė sprendimų priėmimui labai didelėse duomenų bazėse / Frequent pattern analysis for decision making in big data Pragarauskaitė, Julija 01 July 2013 (has links) Didžiuliai informacijos kiekiai yra sukaupiami kiekvieną dieną pasaulyje bei jie sparčiai auga. Apytiksliai duomenų tyrybos algoritmai yra labai svarbūs analizuojant tokius didelius duomenų kiekius, nes algoritmų greitis yra ypač svarbus daugelyje sričių, tuo tarpu tikslieji metodai paprastai yra lėti bei naudojami tik uždaviniuose, kuriuose reikalingas tikslus atsakymas. Ši disertacija analizuoja kelias duomenų tyrybos sritis: dažnų sekų paiešką bei vizualizaciją sprendimų priėmimui. Dažnų sekų paieškai buvo pasiūlyti trys nauji apytiksliai metodai, kurie buvo testuojami naudojant tikras bei dirbtinai sugeneruotas duomenų bazes: • Atsitiktinės imties metodas (Random Sampling Method - RSM) formuoja pradinės duomenų bazės atsitiktinę imtį ir nustato dažnas sekas, remiantis atsitiktinės imties analizės rezultatais. Šio metodo privalumas yra teorinis paklaidų tikimybių įvertinimas, naudojantis standartiniais statistiniais metodais. • Daugybinio perskaičiavimo metodas (Multiple Re-sampling Method - MRM) yra RSM metodo patobulinimas, kuris formuoja kelias pradinės duomenų bazės atsitiktines imtis ir taip sumažina paklaidų tikimybes. • Markovo savybe besiremiantis metodas (Markov Property Based Method - MPBM) kelis kartus skaito pradinę duomenų bazę, priklausomai nuo Markovo proceso eilės, bei apskaičiuoja empirinius dažnius remdamasis Markovo savybe. Didelio duomenų kiekio vizualizavimui buvo naudojami pirkėjų internetu elgsenos duomenys, kurie analizuojami naudojant... [toliau žr. visą tekstą] / Huge amounts of digital information are stored in the World today and the amount is increasing by quintillion bytes every day. Approximate data mining algorithms are very important to efficiently deal with such amounts of data due to the computation speed required by various real-world applications, whereas exact data mining methods tend to be slow and are best employed where the precise results are of the highest important. This thesis focuses on several data mining tasks related to analysis of big data: frequent pattern mining and visual representation. For mining frequent patterns in big data, three novel approximate methods are proposed and evaluated on real and artificial databases: • Random Sampling Method (RSM) creates a random sample from the original database and makes assumptions on the frequent and rare sequences based on the analysis results of the random sample. A significant benefit is a theoretical estimate of classification errors made by this method using standard statistical methods. • Multiple Re-sampling Method (MRM) is an improved version of RSM method with a re-sampling strategy that decreases the probability to incorrectly classify the sequences as frequent or rare. • Markov Property Based Method (MPBM) relies upon the Markov property. MPBM requires reading the original database several times (the number equals to the order of the Markov process) and then calculates the empirical frequencies using the Markov property. For visual representation... [to full text] Informatics Dažnų sekų paieška Duomenų analizė Vizualizavimas Frequent pattern mining Data analysis Visualization
8	New techniques for efficiently discovering frequent patterns Jin, Ruoming 01 August 2005 (has links) No description available. Computer Science Frequent Pattern Mining Graph Mining Multiple Query Optimization Mining Data Streams Knowledgeable Cache
9	Efficient Temporal Synopsis of Social Media Streams Abouelnagah, Younes January 2013 (has links) Search and summarization of streaming social media, such as Twitter, requires the ongoing analysis of large volumes of data with dynamically changing characteristics. Tweets are short and repetitious -- lacking context and structure -- making it difficult to generate a coherent synopsis of events within a given time period. Although some established algorithms for frequent itemset analysis might provide an efficient foundation for synopsis generation, the unmodified application of standard methods produces a complex mass of rules, dominated by common language constructs and many trivial variations on topically related results. Moreover, these results are not necessarily specific to events within the time period of interest. To address these problems, we build upon the Linear time Closed itemset Mining (LCM) algorithm, which is particularly suited to the large and sparse vocabulary of tweets. LCM generates only closed itemsets, providing an immediate reduction in the number of trivial results. To reduce the impact of function words and common language constructs, we apply a filltering step that preserves these terms only when they may form part of a relevant collocation. To further reduce trivial results, we propose a novel strengthening of the closure condition of LCM to retain only those results that exceed a threshold of distinctiveness. Finally, we perform temporal ranking, based on information gain, to identify results that are particularly relevant to the time period of interest. We evaluate our work over a collection of tweets gathered in late 2012, exploring the efficiency and filtering characteristic of each processing step, both individually and collectively. Based on our experience, the resulting synopses from various time periods provide understandable and meaningful pictures of events within those periods, with potential application to tasks such as temporal summarization and query expansion for search. Information Retrieval Text Mining Data Mining Social Media Temporal summarization Frequent itemset mining Microblogs Frequent pattern mining Computer Science
10	Efficient Temporal Synopsis of Social Media Streams Abouelnagah, Younes January 2013 (has links) Search and summarization of streaming social media, such as Twitter, requires the ongoing analysis of large volumes of data with dynamically changing characteristics. Tweets are short and repetitious -- lacking context and structure -- making it difficult to generate a coherent synopsis of events within a given time period. Although some established algorithms for frequent itemset analysis might provide an efficient foundation for synopsis generation, the unmodified application of standard methods produces a complex mass of rules, dominated by common language constructs and many trivial variations on topically related results. Moreover, these results are not necessarily specific to events within the time period of interest. To address these problems, we build upon the Linear time Closed itemset Mining (LCM) algorithm, which is particularly suited to the large and sparse vocabulary of tweets. LCM generates only closed itemsets, providing an immediate reduction in the number of trivial results. To reduce the impact of function words and common language constructs, we apply a filltering step that preserves these terms only when they may form part of a relevant collocation. To further reduce trivial results, we propose a novel strengthening of the closure condition of LCM to retain only those results that exceed a threshold of distinctiveness. Finally, we perform temporal ranking, based on information gain, to identify results that are particularly relevant to the time period of interest. We evaluate our work over a collection of tweets gathered in late 2012, exploring the efficiency and filtering characteristic of each processing step, both individually and collectively. Based on our experience, the resulting synopses from various time periods provide understandable and meaningful pictures of events within those periods, with potential application to tasks such as temporal summarization and query expansion for search. Information Retrieval Text Mining Data Mining Social Media Temporal summarization Frequent itemset mining Microblogs Frequent pattern mining Computer Science

Search results