Global ETD Search

1	Rough Set Based Rule Evaluations and Their Applications Li, Jiye January 2007 (has links) Knowledge discovery is an important process in data analysis, data mining and machine learning. Typically knowledge is presented in the form of rules. However, knowledge discovery systems often generate a huge amount of rules. One of the challenges we face is how to automatically discover interesting and meaningful knowledge from such discovered rules. It is infeasible for human beings to select important and interesting rules manually. How to provide a measure to evaluate the qualities of rules in order to facilitate the understanding of data mining results becomes our focus. In this thesis, we present a series of rule evaluation techniques for the purpose of facilitating the knowledge understanding process. These evaluation techniques help not only to reduce the number of rules, but also to extract higher quality rules. Empirical studies on both artificial data sets and real world data sets demonstrate how such techniques can contribute to practical systems such as ones for medical diagnosis and web personalization. In the first part of this thesis, we discuss several rule evaluation techniques that are proposed towards rule postprocessing. We show how properly defined rule templates can be used as a rule evaluation approach. We propose two rough set based measures, a Rule Importance Measure, and a Rules-As-Attributes Measure, %a measure of considering rules as attributes, to rank the important and interesting rules. In the second part of this thesis, we show how data preprocessing can help with rule evaluation. Because well preprocessed data is essential for important rule generation, we propose a new approach for processing missing attribute values for enhancing the generated rules. In the third part of this thesis, a rough set based rule evaluation system is demonstrated to show the effectiveness of the measures proposed in this thesis. Furthermore, a new user-centric web personalization system is used as a case study to demonstrate how the proposed evaluation measures can be used in an actual application. Data Mining Rough Set Rule Evaluations Personalization Computer Science
2	Rough Set Based Rule Evaluations and Their Applications Li, Jiye January 2007 (has links) Knowledge discovery is an important process in data analysis, data mining and machine learning. Typically knowledge is presented in the form of rules. However, knowledge discovery systems often generate a huge amount of rules. One of the challenges we face is how to automatically discover interesting and meaningful knowledge from such discovered rules. It is infeasible for human beings to select important and interesting rules manually. How to provide a measure to evaluate the qualities of rules in order to facilitate the understanding of data mining results becomes our focus. In this thesis, we present a series of rule evaluation techniques for the purpose of facilitating the knowledge understanding process. These evaluation techniques help not only to reduce the number of rules, but also to extract higher quality rules. Empirical studies on both artificial data sets and real world data sets demonstrate how such techniques can contribute to practical systems such as ones for medical diagnosis and web personalization. In the first part of this thesis, we discuss several rule evaluation techniques that are proposed towards rule postprocessing. We show how properly defined rule templates can be used as a rule evaluation approach. We propose two rough set based measures, a Rule Importance Measure, and a Rules-As-Attributes Measure, %a measure of considering rules as attributes, to rank the important and interesting rules. In the second part of this thesis, we show how data preprocessing can help with rule evaluation. Because well preprocessed data is essential for important rule generation, we propose a new approach for processing missing attribute values for enhancing the generated rules. In the third part of this thesis, a rough set based rule evaluation system is demonstrated to show the effectiveness of the measures proposed in this thesis. Furthermore, a new user-centric web personalization system is used as a case study to demonstrate how the proposed evaluation measures can be used in an actual application. Data Mining Rough Set Rule Evaluations Personalization Computer Science
3	A Spam Filter Based on Reinforcement and Collaboration Yang, Chih-Chin 07 August 2008 (has links) Growing volume of spam mails have not only decreased the productivity of people but also become a security threat on the Internet. Mail servers should have abilities to filter out spam mails which change time by time precisely and manage increasing spam rules which generated by mail servers automatically and effectively. Most paper only focused on single aspect (especially for spam rule generation) to prevent spam mail. However, in real word, spam prevention is not just applying data mining algorithm for rule generation. To filter out spam mails correctly in a real world, there are still many issues should be considered in addition to spam rule generation. In this paper, we integrate three modules to form a complete anti-spam system, they are spam rule generation module, spam rule reinforcement module and spam rule exchange module. In this paper, rule-based data mining approach is used to generate exchangeable spam rules. The feedback of user¡¦s returns is reinforced spam rule. The distributing spam rules are exchanged through machine-readable XML format. The results of experiment draw the following conclusion: (1) The spam filter can filter out the Chinese mails by analyzing the header characteristics. (2) Rules exchanged among mail improve the spam recall and accuracy of mail servers. (3) Rules reinforced improve the effectiveness of spam rule. Spam Data Mining and Artificial Intelligence Collaborative Rough Set Theory
4	Using optimisation techniques to granulise rough set partitions Crossingham, Bodie 26 January 2009 (has links) Rough set theory (RST) is concerned with the formal approximation of crisp sets and is a mathematical tool which deals with vagueness and uncertainty. RST can be integrated into machine learning and can be used to forecast predictions as well as to determine the causal interpretations for a particular data set. The work performed in this research is concerned with using various optimisation techniques to granulise the rough set input partitions in order to achieve the highest forecasting accuracy produced by the rough set. The forecasting accuracy is measured by using the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. The four optimisation techniques used are genetic algorithm, particle swarm optimisation, hill climbing and simulated annealing. This newly proposed method is tested on two data sets, namely, the human immunodeficiency virus (HIV) data set and the militarised interstate dispute (MID) data set. The results obtained from this granulisation method are compared to two previous static granulisation methods, namely, equal-width-bin and equal-frequency-bin partitioning. The results conclude that all of the proposed optimised methods produce higher forecasting accuracies than that of the two static methods. In the case of the HIV data set, the hill climbing approach produced the highest accuracy, an accuracy of 69.02% is achieved in a time of 12624 minutes. For the MID data, the genetic algorithm approach produced the highest accuracy. The accuracy achieved is 95.82% in a time of 420 minutes. The rules generated from the rough set are linguistic and easy-to-interpret, but this does come at the expense of the accuracy lost in the discretisation process where the granularity of the variables are decreased. Bioinformatics application HIV modelling Evolutionary optimisation techniques Rough set theory
5	Uncertainty Management of Intelligent Feature Selection in Wireless Sensor Networks Mal-Sarkar, Sanchita January 2009 (has links) No description available. Engineering Templates SENSOR NETWORKS UNCERTAINTY rough set theory Templates and Quality
6	High range resolution radar target classification: A rough set approach Nelson, Dale E. January 2001 (has links) No description available. High range resolution radar target classification a rough set approach
7	Data Classification System Based on Combination Optimized Decision Tree : A Study on Missing Data Handling, Rough Set Reduction, and FAVC Set Integration / Dataklassificeringssystem baserat på kombinationsoptimerat beslutsträd : En studie om saknad datahantering, grov uppsättningsreduktion och FAVC-uppsättningsintegration Lu, Xuechun January 2023 (has links) Data classification is a novel data analysis technique that involves extracting valuable information with potential utility from databases. It has found extensive applications in various domains, including finance, insurance, government, education, transportation, and defense. There are several methods available for data classification, with decision tree algorithms being one of the most widely used. These algorithms are based on instance-based inductive learning and offer advantages such as rule extraction, low computational complexity, and the ability to highlight important decision attributes, leading to high classification accuracy. According to statistics, decision tree algorithms[1] are among the most widely utilized data mining algorithms. To address these challenges, a decision tree algorithm is employed to solve classification problems. However, the existing decision tree algorithm exhibits limitations such as low calculation efficiency and multi-valued[2] bias. Therefore, a data classification system based on an optimized decision tree algorithm written in Python and a data storage system based on PostgreSQL were developed. The proposed algorithm surpasses traditional classification algorithms in terms of dimensionality reduction, attribute selection, and scalability. Ultimately, a combined optimization decision tree classifier system is introduced, which exhibits superior performance compared to the widely used ID3[3] algorithm. The improved decision tree algorithm has both theoretical and practical significance for data mining applications. / Dataklassificering är en ny dataanalysteknik som innebär att man extraherar värdefull information med potentiell nytta från databaser. Den har hittat omfattande tillämpningar inom olika domäner, inklusive finans, försäkring, regering, utbildning, transport och försvar. Det finns flera metoder tillgängliga för dataklassificering, där beslutsträdsalgoritmer är en av de mest använda. Dessa algoritmer är baserade på instansbaserad induktiv inlärning och erbjuder fördelar som regelextraktion, låg beräkningskomplexitet och förmågan att lyfta fram viktiga beslutsattribut, vilket leder till hög klassificeringsnoggrannhet. Enligt statistik är beslutsträdsalgoritmer bland de mest använda datautvinningsalgoritmerna. För att hantera dessa utmaningar används en beslutsträdsalgoritm för att lösa klassificeringsproblem. Den befintliga beslutsträds-algoritmen uppvisar dock begränsningar såsom låg beräkningseffektivitet och flervärdig bias. Därför utvecklades ett dataklassificeringssystem baserat på en optimerad beslutsträdsalgoritm skriven i Python och ett datalagringssystem baserat på PostgreSQL. Den föreslagna algoritmen överträffar traditionella klassificeringsalgoritmer när det gäller dimensionsreduktion, attributval och skalbarhet. I slutändan introduceras ett kombinerat optimeringsbeslutsträd-klassificeringssystem, som uppvisar överlägsen prestanda jämfört med den allmänt använda ID3-algoritmen. Den förbättrade beslutsträdsalgoritmen har både teoretisk och praktisk betydelse för datautvinningstillämpningar. Missing data handling Rough set reduction FAVC Set ID3 Saknade datahantering Rough set reducering FAVC Set ID3 Computer and Information Sciences Data- och informationsvetenskap
8	Automatic message annotation and semantic interface for context aware mobile computing Al-Sultany, Ghaidaa Abdalhussein Billal January 2012 (has links) In this thesis, the concept of mobile messaging awareness has been investigated by designing and implementing a framework which is able to annotate the short text messages with context ontology for semantic reasoning inference and classification purposes. The annotated metadata of text message keywords are identified and annotated with concepts, entities and knowledge that drawn from ontology without the need of learning process and the proposed framework supports semantic reasoning based messages awareness for categorization purposes. The first stage of the research is developing the framework of facilitating mobile communication with short text annotated messages (SAMS), which facilitates annotating short text message with part of speech tags augmented with an internal and external metadata. In the SAMS framework the annotation process is carried out automatically at the time of composing a message. The obtained metadata is collected from the device’s file system and the message header information which is then accumulated with the message’s tagged keywords to form an XML file, simultaneously. The significance of annotation process is to assist the proposed framework during the search and retrieval processes to identify the tagged keywords and The Semantic Web Technologies are utilised to improve the reasoning mechanism. Later, the proposed framework is further improved “Contextual Ontology based Short Text Messages reasoning (SOIM)”. SOIM further enhances the search capabilities of SAMS by adopting short text message annotation and semantic reasoning capabilities with domain ontology as Domain ontology is modeled into set of ontological knowledge modules that capture features of contextual entities and features of particular event or situation. Fundamentally, the framework SOIM relies on the hierarchical semantic distance to compute an approximated match degree of new set of relevant keywords to their corresponding abstract class in the domain ontology. Adopting contextual ontology leverages the framework performance to enhance the text comprehension and message categorization. Fuzzy Sets and Rough Sets theory have been integrated with SOIM to improve the inference capabilities and system efficiency. Since SOIM is based on the degree of similarity to choose the matched pattern to the message, the issue of choosing the best-retrieved pattern has arisen during the stage of decision-making. Fuzzy reasoning classifier based rules that adopt the Fuzzy Set theory for decision making have been applied on top of SOIM framework in order to increase the accuracy of the classification process with clearer decision. The issue of uncertainty in the system has been addressed by utilising the Rough Sets theory, in which the irrelevant and indecisive properties which affect the framework efficiency negatively have been ignored during the matching process. 006.76
9	Anti-Spam Study: an Alliance-based Approach Chiu, Yu-fen 12 September 2006 (has links) The growing problem of spam has generated a need for reliable anti-spam filters. There are many filtering techniques along with machine learning and data miming used to reduce the amount of spam. Such algorithms can achieve very high accuracy but with some amount of false positive tradeoff. Generally false positives are prohibitively expensive in the real world. Much work has been done to improve specific algorithms for the task of detecting spam, but less work has been report on leveraging multiple algorithms in email analysis. This study presents an alliance-based approach to classify, discovery and exchange interesting information on spam. Furthermore, the spam filter in this study is build base on the mixture of rough set theory (RST), genetic algorithm (GA) and XCS classifier system. RST has the ability to process imprecise and incomplete data such as spam. GA can speed up the rate of finding the optimal solution (i.e. the rules used to block spam). The reinforcement learning of XCS is a good mechanism to suggest the appropriate classification for the email. The results of spam filtering by alliance-based approach are evaluated by several statistical methods and the performance is great. Two main conclusions can be drawn from this study: (1) the rules exchanged from other mail servers indeed help the filter blocking more spam than before. (2) a combination of algorithms improves both accuracy and reducing false positives for the problem of spam detection. Rough set theory Reinforcement learning XCS classifier system Spam Text classification
10	Toward Better Website Usage: Leveraging Data Mining Techniques and Rough Set Learning to Construct Better-to-use Websites Khasawneh, Natheer Yousef 23 September 2005 (has links) No description available. Computer Science Web Usage Mining Data Mining Rough Set Clustering Analysis Sequence Comparison Sequence Matching

Search results