Spelling suggestions: "subject:"interesting""
1 |
A Study on Interestingness Measures for Associative ClassifiersJalali Heravi, Mojdeh 11 1900 (has links)
Associative classification is a rule-based approach to classify data relying on association rule mining by discovering associations between a set of features and a class label. Support and confidence are the de-facto interestingness measures used for discovering relevant association rules. The support-confidence framework has also been used in most, if not all, associative classifiers. Although support and confidence are appropriate measures for building a strong model in many cases, they are still not the ideal measures because in some cases a huge set of rules is generated which could hinder the effectiveness in some cases for which other measures could be better suited.
There are many other rule interestingness measures already used in machine learning, data mining and statistics. This work focuses on using 53 different objective measures for associative classification rules. A wide range of UCI datasets are used to study the impact of different interestingness measures on different phases of associative classifiers based on the number of rules generated and the accuracy obtained. The results show that there are interestingness measures that can significantly reduce the number of rules for almost all datasets while the accuracy of the model is hardly jeopardized or even improved. However, no single measure can be introduced as an obvious winner.
|
2 |
Measuring interestingness of documents using variabilityKONDI CHANDRASEKARAN, PRADEEP KUMAR 01 February 2012 (has links)
The amount of data we are dealing with is being generated at an astronomical pace. With the rapid technological advances in the field of data storage techniques, storing and transmitting copious amounts of data has become very easy and hassle-free. However, exploring those abundant data and finding the interesting ones has always been a huge integral challenge and cumbersome process to people in all industrial sectors. A model to rank data by interest will help in saving the time spent on the large amount of data. In this research we concentrate specifically on ranking the text documents in corpora according to ``interestingness''
We design a state-of-the-art empirical model to rank documents according to ``interestingness''. The model is cost-efficient, fast and automated to an extent which requires minimal human intervention. We identify different categories of documents based on the word-usage pattern which in turn classifies them as being interesting, mundane or anomalous documents. The model is a novel approach which does not depend on the semantics of the words used in the document but is based on the repetition of words and rate of introduction of new words in the document. The model is a generic design which can be applied to a document corpus of any size from any domain. The model can be used to rank new documents introduced into the corpus. We formulate a couple of normalization techniques which can be used to neutralize the impact of variable document length.
We use three approaches, namely dictionary-based data compression, analysis of the rate of new word occurrences and Singular Value Decomposition (SVD). To test the model we use a variety of corpora namely: US Diplomatic Cable releases by Wikileaks, US Presidents State of Union Addresses, Open American National Corpus and 20 Newsgroups articles. The techniques have various pre-processing steps which are totally automated. We compare the results of the three techniques and examine the level of agreement between pair of techniques using a statistical method called the Jaccard coefficient. This approach can also be used to detect the unusual and anomalous documents within the corpus.
The results also contradict the assumptions made by Simon and Yule in deriving an equation for a general text generation model. / Thesis (Master, Computing) -- Queen's University, 2012-01-31 15:28:04.177
|
3 |
A Study on Interestingness Measures for Associative ClassifiersJalali Heravi, Mojdeh Unknown Date
No description available.
|
4 |
MARAS: Multi-Drug Adverse Reactions Analytics SystemKakar, Tabassum 29 April 2016 (has links)
Adverse Drug Reactions (ADRs) are a major cause of morbidity and mortality worldwide. Clinical trials, which are extremely costly, human labor intensive and specific to controlled human subjects, are ineffective to uncover all ADRs related to a drug. There is thus a growing need of computing-supported methods facilitating the automated detection of drugs-related ADRs from large reports data sets; especially ADRs that left undiscovered during clinical trials but later arise due to drug-drug interactions or prolonged usage. For this purpose, big data sets available through drug-surveillance programs and social media provide a wealth of longevity information and thus a huge opportunity. In this research, we thus design a system using machine learning techniques to discover severe unknown ADRs triggered by a combination of drugs, also known as drug-drug-interaction. Our proposed Multi-drug Adverse Reaction Analytics System (MARAS) adopts and adapts an association rule mining-based methodology by incorporating contextual information to detect, highlight and visualize interesting drug combinations that are strongly associated with a set of ADRs. MARAS extracts non-spurious associations that are true representations of the combination of drugs taken and reported by patients. We demonstrate the utility of MARAS via case studies from the medical literature, and the usability of the MARAS system via a user study using real world medical data extracted from the FDA Adverse Event Reporting System (FAERS).
|
5 |
New Probabilistic Interest Measures for Association RulesHahsler, Michael, Hornik, Kurt January 2006 (has links) (PDF)
Mining association rules is an important technique for discovering meaningful patterns in transaction databases. Many different measures of interestingness have been proposed for association rules. However, these measures fail to take the probabilistic properties of the mined data into account. In this paper, we start with presenting a simple probabilistic framework for transaction data which can be used to simulate transaction data when no associations are present. We use such data and a real-world database from a grocery outlet to explore the behavior of confidence and lift, two popular interest measures used for rule mining. The results show that confidence is systematically influenced by the frequency of the items in the left hand side of rules and that lift performs poorly to filter random noise in transaction data. Based on the probabilistic framework we develop two new interest measures, hyper-lift and hyper-confidence, which can be used to filter or order mined association rules. The new measures show significant better performance than lift for applications where spurious rules are problematic. / Series: Research Report Series / Department of Statistics and Mathematics
|
6 |
EFFECTS OF SEDUCTIVE AND BORING DETAILS ON READERS' COMPREHENSION OF EXPLANATORY TEXTSJohnston, Gregory Scott 01 January 2002 (has links)
Two experiments were conducted that examined the effects of tangential information on readers' comprehension of explanatory texts. Participants were recruited from Introduction to Psychology courses. They were assigned to read one of three versions of a text (i.e., a base-text version, a base-text plus seductive details version or a base-text plus boring details version) about the process of lightning or the lifecycle of a white dwarf star. In Experiment 1, participants were told they had to write down everything they could remember from their passage when they finished reading. The base-text group recalled more of the core content than either of the other two groups. Lengthening a text by adding tangential information interfered with readers' ability to recall the information. More interestingly, the boring details group recalled more core content than the seductive details group. The degree of interestingness of the tangential information had an independent effect on readers' memory. Reading times were also recorded and analyzed. The seductive details group spent less time reading the core content of the passage than either the base-text and boring details groups, which did not differ. The presence of seductive details reduced the amount of attention readers allocated to processing the core content of the passage. In Experiment 2, readers were told that they had to verify whether or not certain sentences were presented in the passage they just finished reading. Reading times did not differ among the three groups. A post-hoc analysis of reading times across experiments revealed that participants in Experiment 1 spent more time processing the passages than those in Experiment 2. This suggests that changing the memory task from free-recall to a recognition-based task may have altered readers' online processing. In the sentence verification task, there was a tendency for participants who read a passage that included detail sentences to respond faster but less accurately. The presence of detail sentences lead readers to perform more poorly on identifying whether or not sentences were actually in the passage they read as compared to readers of the same passage without details.
|
7 |
Web Service MiningZheng, George 30 March 2009 (has links)
In this dissertation, we present a novel approach for Web service mining. Web service mining is a new research discipline. It is different from conventional top down service composition approaches that are driven by specific search criteria. Web service mining starts with no such criteria and aims at the discovery of interesting and useful compositions of existing Web services. Web service mining requires the study of three main research topics: semantic description of Web services, efficient bottom up composition of composable services, and interestingness and usefulness evaluation of composed services. We first propose a Web service ontology to describe and organize the constructs of a Web service. We introduce the concept of Web service operation interface for the description of shared Web service capabilities and use Web service domains for grouping Web service capabilities based on these interfaces. We take clues from how Nature solves the problem of molecular composition and introduce the notion of Web service recognition to help devise efficient bottom up service composition strategies. We introduce several service recognition mechanisms that take advantage of the domain-based categorization of Web service capabilities and ontology-based description of operation semantics. We take clues from the drug discovery process and propose a Web service mining framework to group relevant mining activities into a progression of phases that would lead to the eventual discovery of useful compositions. Based on the composition strategies that are derived from recognition mechanisms, we propose a set of algorithms in the screening phase of the framework to automatically identify leads of service compositions. We propose objective interestingness and usefulness measures in the evaluation phase to narrow down the pool of composition leads for further exploration. To demonstrate the effectiveness of our framework and to address challenges faced by existing biological data representation methodologies, we have applied relevant techniques presented in this dissertation to the field of biological pathway discovery. / Ph. D.
|
8 |
Measuring Interestingness in Outliers with Explanation Facility using Belief NetworksMasood, Adnan 01 January 2014 (has links)
This research explores the potential of improving the explainability of outliers using Bayesian Belief Networks as background knowledge. Outliers are deviations from the usual trends of data. Mining outliers may help discover potential anomalies and fraudulent activities. Meaningful outliers can be retrieved and analyzed by using domain knowledge. Domain knowledge (or background knowledge) is represented using probabilistic graphical models such as Bayesian belief networks. Bayesian networks are graph-based representation used to model and encode mutual relationships between entities. Due to their probabilistic graphical nature, Belief Networks are an ideal way to capture the sensitivity, causal inference, uncertainty and background knowledge in real world data sets. Bayesian Networks effectively present the causal relationships between different entities (nodes) using conditional probability. This probabilistic relationship shows the degree of belief between entities. A quantitative measure which computes changes in this degree of belief acts as a sensitivity measure .
The first contribution of this research is enhancing the performance for measurement of sensitivity based on earlier research work, the Interestingness Filtering Engine Miner algorithm. The algorithm developed (IBOX - Interestingness based Bayesian outlier eXplainer) provides progressive improvement in the performance and sensitivity scoring of earlier works. Earlier approaches compute sensitivity by measuring divergence among conditional probability of training and test data, while using only couple of probabilistic interestingness measures such as Mutual information and Support to calculate belief sensitivity. With ingrained support from the literature as well as quantitative evidence, IBOX provides a framework to use multiple interestingness measures resulting in better performance and improved sensitivity analysis. The results provide improved performance, and therefore explainability of rare class entities. This research quantitatively validated probabilistic interestingness measures as an effective sensitivity analysis technique in rare class mining. This results in a novel, original, and progressive research contribution to the areas of probabilistic graphical models and outlier analysis.
|
9 |
An Analysis Of Peculiarity Oriented Interestingness Measures On Medical DataAldas, Cem Nuri 01 September 2008 (has links) (PDF)
Peculiar data are regarded as patterns which are significantly distinguishable from other
records, relatively few in number and they are accepted as to be one of the most striking
aspects of the interestingness concept. In clinical domain, peculiar records are probably
signals for malignancy or disorder to be intervened immediately. The investigation of the
rules and mechanisms which lie behind these records will be a meaningful contribution for
improved clinical decision support systems.
In order to discover the most interesting records and patterns, many peculiarity oriented
interestingness measures, each fulfilling a specific requirement, have been developed. In this
thesis well-known peculiarity oriented interestingness measures, Local Outlier Factor (LOF),
Cluster Based Local Outlier Factor (CBLOF) and Record Peculiar Factor (RPF) are
compared. The insights derived from the theoretical infrastructures of the algorithms were
evaluated by using experiments on synthetic and real world medical data. The results are discussed based on the interestingness perspective and some departure points for building a
more developed methodology for knowledge discovery in databases are proposed.
|
10 |
Itemset size-sensitive interestingness measures for association rule mining and link predictionAljandal, Waleed A. January 1900 (has links)
Doctor of Philosophy / Department of Computing and Information Sciences / William H. Hsu / Association rule learning is a data mining technique that can capture relationships between pairs of entities in different domains. The goal of this research is to discover factors from data that can improve the precision, recall, and accuracy of association rules found using interestingness measures and frequent itemset mining. Such factors can be calibrated using validation data and applied to rank candidate rules in domain-dependent tasks such as link existence prediction. In addition, I use interestingness measures themselves as numerical features to improve link existence prediction. The focus of this dissertation is on developing and testing an analytical framework for association rule interestingness measures, to make them sensitive to the relative size of itemsets. I survey existing interestingness measures and then introduce adaptive parametric models for normalizing and optimizing these measures, based on the size of itemsets containing a candidate pair of co-occurring entities. The central thesis of this work is that in certain domains, the link strength between entities is related to the rarity of their shared memberships (i.e., the size of itemsets in which they co-occur), and that a data-driven approach can capture such properties by normalizing the quantitative measures used to rank associations. To test this hypothesis under different levels of variability in itemset size, I develop several test bed domains, each containing an association rule mining task and a link existence prediction task. The definitions of itemset membership and link existence in each domain depend on its local semantics. My primary goals are: to capture quantitative aspects of these local semantics in normalization factors for association rule interestingness measures; to represent these factors as quantitative features for link existence prediction, to apply them to significantly improve precision and recall in several real-world domains; and to build an experimental framework for measuring this improvement, using information theory and classification-based validation.
|
Page generated in 0.109 seconds