Global ETD Search

91	Augmenting Dynamic Query Expansion in Microblog Texts Khandpur, Rupinder P. 17 August 2018 (has links) Dynamic query expansion is a method of automatically identifying terms relevant to a target domain based on an incomplete query input. With the explosive growth of online media, such tools are essential for efficient search result refining to track emerging themes in noisy, unstructured text streams. It's crucial for large-scale predictive analytics and decision-making, systems which use open source indicators to find meaningful information rapidly and accurately. The problems of information overload and semantic mismatch are systemic during the Information Retrieval (IR) tasks undertaken by such systems. In this dissertation, we develop approaches to dynamic query expansion algorithms that can help improve the efficacy of such systems using only a small set of seed queries and requires no training or labeled samples. We primarily investigate four significant problems related to the retrieval and assessment of event-related information, viz. (1) How can we adapt the query expansion process to support rank-based analysis when tracking a fixed set of entities? A scalable framework is essential to allow relative assessment of emerging themes such as airport threats. (2) What visual knowledge discovery framework to adopt that can incorporate users' feedback back into the search result refinement process? A crucial step to efficiently integrate real-time `situational awareness' when monitoring specific themes using open source indicators. (3) How can we contextualize query expansions? We focus on capturing semantic relatedness between a query and reference text so that it can quickly adapt to different target domains. (4) How can we synchronously perform knowledge discovery and characterization (unstructured to structured) during the retrieval process? We mainly aim to model high-order, relational aspects of event-related information from microblog texts. / Ph. D. / Analysis of real-time, social media can provide critical insights into ongoing societal events. Where consequences and implications of specific events include monetary losses, threats to critical infrastructure and national security, disruptions to daily life, and a potential to cause loss of life and physical property. It is imperative for developing good ‘ground truth’ to develop adequate data-driven information systems, i.e., an authoritative record of events reported in the media cataloged alongside important dimensions. Availability of high-quality ground truth events can support various analytic efforts, e.g., identifying precursors of attacks, developing predictive indicators using surrogate data sources, and tracking the progression of events over space and time. A dynamic search result refinement is useful for expanding a general set of user queries into a more relevant collection. The challenges of information overload and misalignment of context between the user query and retrieved results can overwhelm both human and machine. In this dissertation, we focus our efforts on these specific challenges. With the ever-increasing volume of user-generated data large-scale analysis is a tedious task. Our first focus is to develop a scalable model that dynamically tracks and ranks evolving topics as they appear in social media. Then to simplify the cognitive tasks involving sense-making of evolving themes, we take a visual approach to retrieve situationally critical and emergent information effectively. This visual analytics approach learns from user’s interactions during the exploratory process and then generates a better representation of the data. Thus, improving the situational understanding and usability of underlying data models. Such features are crucial for big-data based decision & support systems. To make the event-focused retrieval process more robust, we developed a context-rich procedure that adds new relevant key terms to the user’s original query by utilizing the linguistic structures in text. This context-awareness allows the algorithm to retrieve those relevant characteristics that can help users to gain adequate information from social media about real-world events. Online social commentary about events is very informal and can be incomplete. However, to get the complete picture and adequately describe these events we develop an approach that models the underlying relatedness of information and iteratively extract meaning and denotations from event-related texts. We learn how to express the high-order relationships between events and entities and group them to identify those attributes that best explain the events the user is trying to uncover. In all the augmentations we develop, our strategy is to allow only very minimal human supervision using just a small set of seed event triggers and requires no training or labeled samples. We show a comprehensive evaluation of these augmentations on real-world domains - threats on airports, cyber attacks, and protests. We also demonstrate their applicability as for real-time analysis that provides vital event characteristics, and contextually consistent information can be a beneficial aid for emergency responders. Dynamic Query Expansion Microblog Event Retrieval Social Media Analytics Visual Knowledge Discovery
92	Improving RDF data with data mining Abedjan, Ziawasch January 2014 (has links) Linked Open Data (LOD) comprises very many and often large public data sets and knowledge bases. Those datasets are mostly presented in the RDF triple structure of subject, predicate, and object, where each triple represents a statement or fact. Unfortunately, the heterogeneity of available open data requires significant integration steps before it can be used in applications. Meta information, such as ontological definitions and exact range definitions of predicates, are desirable and ideally provided by an ontology. However in the context of LOD, ontologies are often incomplete or simply not available. Thus, it is useful to automatically generate meta information, such as ontological dependencies, range definitions, and topical classifications. Association rule mining, which was originally applied for sales analysis on transactional databases, is a promising and novel technique to explore such data. We designed an adaptation of this technique for min-ing Rdf data and introduce the concept of “mining configurations”, which allows us to mine RDF data sets in various ways. Different configurations enable us to identify schema and value dependencies that in combination result in interesting use cases. To this end, we present rule-based approaches for auto-completion, data enrichment, ontology improvement, and query relaxation. Auto-completion remedies the problem of inconsistent ontology usage, providing an editing user with a sorted list of commonly used predicates. A combination of different configurations step extends this approach to create completely new facts for a knowledge base. We present two approaches for fact generation, a user-based approach where a user selects the entity to be amended with new facts and a data-driven approach where an algorithm discovers entities that have to be amended with missing facts. As knowledge bases constantly grow and evolve, another approach to improve the usage of RDF data is to improve existing ontologies. Here, we present an association rule based approach to reconcile ontology and data. Interlacing different mining configurations, we infer an algorithm to discover synonymously used predicates. Those predicates can be used to expand query results and to support users during query formulation. We provide a wide range of experiments on real world datasets for each use case. The experiments and evaluations show the added value of association rule mining for the integration and usability of RDF data and confirm the appropriateness of our mining configuration methodology. / Linked Open Data (LOD) umfasst viele und oft sehr große öffentlichen Datensätze und Wissensbanken, die hauptsächlich in der RDF Triplestruktur bestehend aus Subjekt, Prädikat und Objekt vorkommen. Dabei repräsentiert jedes Triple einen Fakt. Unglücklicherweise erfordert die Heterogenität der verfügbaren öffentlichen Daten signifikante Integrationsschritte bevor die Daten in Anwendungen genutzt werden können. Meta-Daten wie ontologische Strukturen und Bereichsdefinitionen von Prädikaten sind zwar wünschenswert und idealerweise durch eine Wissensbank verfügbar. Jedoch sind Wissensbanken im Kontext von LOD oft unvollständig oder einfach nicht verfügbar. Deshalb ist es nützlich automatisch Meta-Informationen, wie ontologische Abhängigkeiten, Bereichs-und Domänendefinitionen und thematische Assoziationen von Ressourcen generieren zu können. Eine neue und vielversprechende Technik um solche Daten zu untersuchen basiert auf das entdecken von Assoziationsregeln, welche ursprünglich für Verkaufsanalysen in transaktionalen Datenbanken angewendet wurde. Wir haben eine Adaptierung dieser Technik auf RDF Daten entworfen und stellen das Konzept der Mining Konfigurationen vor, welches uns befähigt in RDF Daten auf unterschiedlichen Weisen Muster zu erkennen. Verschiedene Konfigurationen erlauben uns Schema- und Wertbeziehungen zu erkennen, die für interessante Anwendungen genutzt werden können. In dem Sinne, stellen wir assoziationsbasierte Verfahren für eine Prädikatvorschlagsverfahren, Datenvervollständigung, Ontologieverbesserung und Anfrageerleichterung vor. Das Vorschlagen von Prädikaten behandelt das Problem der inkonsistenten Verwendung von Ontologien, indem einem Benutzer, der einen neuen Fakt einem Rdf-Datensatz hinzufügen will, eine sortierte Liste von passenden Prädikaten vorgeschlagen wird. Eine Kombinierung von verschiedenen Konfigurationen erweitert dieses Verfahren sodass automatisch komplett neue Fakten für eine Wissensbank generiert werden. Hierbei stellen wir zwei Verfahren vor, einen nutzergesteuertenVerfahren, bei dem ein Nutzer die Entität aussucht die erweitert werden soll und einen datengesteuerten Ansatz, bei dem ein Algorithmus selbst die Entitäten aussucht, die mit fehlenden Fakten erweitert werden. Da Wissensbanken stetig wachsen und sich verändern, ist ein anderer Ansatz um die Verwendung von RDF Daten zu erleichtern die Verbesserung von Ontologien. Hierbei präsentieren wir ein Assoziationsregeln-basiertes Verfahren, der Daten und zugrundeliegende Ontologien zusammenführt. Durch die Verflechtung von unterschiedlichen Konfigurationen leiten wir einen neuen Algorithmus her, der gleichbedeutende Prädikate entdeckt. Diese Prädikate können benutzt werden um Ergebnisse einer Anfrage zu erweitern oder einen Nutzer während einer Anfrage zu unterstützen. Für jeden unserer vorgestellten Anwendungen präsentieren wir eine große Auswahl an Experimenten auf Realweltdatensätzen. Die Experimente und Evaluierungen zeigen den Mehrwert von Assoziationsregeln-Generierung für die Integration und Nutzbarkeit von RDF Daten und bestätigen die Angemessenheit unserer konfigurationsbasierten Methodologie um solche Regeln herzuleiten. Assoziationsregeln RDF LOD Mustererkennung Synonyme association rule mining RDF LOD knowledge discovery synonym discovery Data processing Computer science
93	NEW ARTIFACTS FOR THE KNOWLEDGE DISCOVERY VIA DATA ANALYTICS (KDDA) PROCESS Li, Yan 01 January 2014 (has links) Recently, the interest in the business application of analytics and data science has increased significantly. The popularity of data analytics and data science comes from the clear articulation of business problem solving as an end goal. To address limitations in existing literature, this dissertation provides four novel design artifacts for Knowledge Discovery via Data Analytics (KDDA). The first artifact is a Snail Shell KDDA process model that extends existing knowledge discovery process models, but addresses many existing limitations. At the top level, the KDDA Process model highlights the iterative nature of KDDA projects and adds two new phases, namely Problem Formulation and Maintenance. At the second level, generic tasks of the KDDA process model are presented in a comparative manner, highlighting the differences between the new KDDA process model and the traditional knowledge discovery process models. Two case studies are used to demonstrate how to use KDDA process model to guide real world KDDA projects. The second artifact, a methodology for theory building based on quantitative data is a novel application of KDDA process model. The methodology is evaluated using a theory building case from the public health domain. It is not only an instantiation of the Snail Shell KDDA process model, but also makes theoretical contributions to theory building. It demonstrates how analytical techniques can be used as quantitative gauges to assess important construct relationships during the formative phase of theory building. The third artifact is a data mining ontology, the DM3 ontology, to bridge the semantic gap between business users and KDDA expert and facilitate analytical model maintenance and reuse. The DM3 ontology is evaluated using both criteria-based approach and task-based approach. The fourth artifact is a decision support framework for MCDA software selection. The framework enables users choose relevant MCDA software based on a specific decision making situation (DMS). A DMS modeling framework is developed to structure the DMS based on the decision problem and the users' decision preferences and. The framework is implemented into a decision support system and evaluated using application examples from the real-estate domain. Knowledge Discovery Data Analytics Process Model Ontology Theory Building Software Selection Management Information Systems
94	Catégorisation des comportements de conduite en termes de consommation en carburant : une méthode de découverte de connaissances contextuelles à partir des traces d’interactions / Categorization of driving behavior in terms of fuel consumption Traoré, Assitan 19 January 2017 (has links) Cette thèse propose une méthode d'ingénierie des connaissances contextuelles qui permet la modélisation et l'identification du contexte explicatif d'un critère observé. Le contexte est constitué de connaissances explicatives situées permettant une représentation élicitée valide d'un objet dans la situation visée. Ces connaissances sont généralement découvertes lors de l'observation de la réalisation de l'activité dans laquelle cet objet est impliqué. Elles sont donc difficiles à décrire en début d'analyse d'une activité. Toutefois, elles restent nécessaires pour la définition, l'explication et la compréhension efficace d'une activité selon un critère observé caractérisant cette dernière. Cette thèse propose la définition progressive du contexte satisfaisant pour expliquer un critère observé lors de l'observation d'une activité. Cette recherche mobilise les traces d'interaction de l'activité analysée, précise la notion de contexte et exploite les méthodes de fouille de données pour réaliser la catégorisation et la classification d'un critère observé en distinguant les paramètres contextuels et non contextuels. L'environnement développé sur les principes des traces d'interaction, permet d'assister la découverte du contexte explicatif par une approche interactive, à l'aide des connaissances de l'analyste, de distinguer ce qui est contexte de ce qui ne l'est pas. Nous montrons qu'il est possible de construire un contexte valide, en le « découvrant » et en le formulant sous une forme générique, telle que proposée dans la littérature. Une application de la méthode a été effectuée en situation de conduite automobile pour modéliser et identifier le contexte explicatif de la consommation en carburant. En s'appuyant sur les connaissances existantes du domaine, la validation de la méthode est effectuée en étudiant qualitativement les connaissances produites sur la consommation réelle en carburant. La méthode est validée quantitativement en appliquant les règles de classifications établies sur des données collectées de l'activité de conduite. Cette illustration de l'analyse de l'activité de conduite automobile avec la méthode de découverte de connaissances contextuelles, pour déterminer le contexte explicatif de la consommation en carburant, a été effectuée à l'Ifsttar sur des données réelles collectées lors de l'activité de conduite en situation naturelle. Les expérimentations menées montrent des résultats encourageants et permettent d'envisager l'intégration de la méthode de découverte de connaissances contextuelles dans les pratiques des analystes de l'Ifsttar / This thesis proposes an engineering method of contextual knowledge that allows identification and modelling of explanatory context of observed criteria. The context consists of located explanatory knowledge allowing valid representation of an object in the covered situation. This knowledge is generally elicited when observing the activity performance in which the object is involved. They are therefore difficult to describe in the beginning of activity analysis but are necessary for the definition, explanation and effective understanding of an activity according to an observed criterion characterizing this activity. This thesis proposes a progressive definition of adequate context to explain an observed criterion during activity observation. The research mobilizes interaction traces of the analysed activity, clarify context notion and uses data mining methods for classification or categorization of an observed criterion by distinguishing contextual parameters and no contextual parameters. The developed environment, based on interaction traces principles, allows to assist explanatory context discovery by interactive approach, using context analyst knowledge. We demonstrate that it’s possible to build a valid context, by discovering it and by formulating it in a generic form as proposed in literature. An application of the method was performed in driving situation to identify and model the explanatory context of fuel consumption. The method validation is performed by studying produced knowledge on fuel consumption, qualitatively by relying on existing domain knowledge and quantitatively by applying classification rules established trough data collected from driving activity. This illustration of driving activity analysis with the contextual knowledge discovery method to determine explanatory context of fuel consumption was conducted at Ifsttar on real data, collected during driving activity in natural driving situation. The led experiments show encouraging results and allows considering the integration of contextual knowledge discovery method in Ifsttar analyst practices Découverte interactive de connaissances Contexte Traces modélisées Transport Consommation en carburant Interactive knowledge discovery Context Modelled traces Transport Fuel consumption 006
95	Agrupamento de dados fuzzy colaborativo / Collaborative fuzzy clustering Coletta, Luiz Fernando Sommaggio 19 May 2011 (has links) Nas últimas décadas, as técnicas de mineração de dados têm desempenhado um importante papel em diversas áreas do conhecimento humano. Mais recentemente, essas ferramentas têm encontrado espaço em um novo e complexo domínio, nbo qual os dados a serem minerados estão fisicamente distribuídos. Nesse domínio, alguns algorithmos específicos para agrupamento de dados podem ser utilizados - em particular, algumas variantes do algoritmo amplamente Fuzzy C-Means (FCM), as quais têm sido investigadas sob o nome de agrupamento fuzzy colaborativo. Com o objetivo de superar algumas das limitações encontradas em dois desses algoritmos, cinco novos algoritmos foram desenvolvidos nesse trabalho. Esses algoritmos foram estudados em dois cenários específicos de aplicação que levam em conta duas suposições sobre os dados (i.e., se os dados são de uma mesma npopulação ou de diferentes populações). Na prática, tais suposições e a dificuldade em se definir alguns dos parâmetros (que possam ser requeridos), podemn orientar a escolha feita pelo usuário entre os algoitmos diponíveis. Nesse sentido, exemplos ilustrativos destacam as diferenças de desempenho entre os algoritmos estudados e desenvolvidos, permitindo derivar algumas conclusões que podem ser úteis ao aplicar agrupamento fuzzy colaborativo na prática. Análises de complexidade de tempo, espaço, e comunicação também foram realizadas / Data mining techniques have played in important role in several areas of human kwnowledge. More recently, these techniques have found space in a new and complex setting in which the data to be mined are physically distributed. In this setting algorithms for data clustering can be used, such as some variants of the widely used Fuzzy C-Means (FCM) algorithm that support clustering data ditributed across different sites. Those methods have been studied under different names, like collaborative and parallel fuzzy clustring. In this study, we offer some augmentation of the two FCM-based clustering algorithms used to cluster distributed data by arriving at some constructive ways of determining essential parameters of the algorithms (including the number of clusters) and forming a set systematically structured guidelines as to a selection of the specific algorithm dependeing upon a nature of the data environment and the assumption being made about the number of clusters. A thorough complexity analysis including space, time, and communication aspects is reported. A series of detailed numeric experiments is used to illustrate the main ideas discussed in the study Descoberta de conhecimento distribuído Distributed knowledge discovery Índices de validade Validity indices
96	Visualização de operações de junção em sistemas de bases de dados para mineração de dados. / Visualization of join operations in DBMS for data mining. Barioni, Maria Camila Nardini 13 June 2002 (has links) Nas últimas décadas, a capacidade das empresas de gerar e coletar informações aumentou rapidamente. Essa explosão no volume de dados gerou a necessidade do desenvolvimento de novas técnicas e ferramentas que pudessem, além de processar essa enorme quantidade de dados, permitir sua análise para a descoberta de informações úteis, de maneira inteligente e automática. Isso fez surgir um proeminente campo de pesquisa para a extração de informação em bases de dados denominado Knowledge Discovery in Databases KDD, no geral técnicas de mineração de dados DM têm um papel preponderante. A obtenção de bons resultados na etapa de mineração de dados depende fortemente de quão adequadamente o preparo dos dados é realizado. Sendo assim, a etapa de extração de conhecimento (DM) no processo de KDD, é normalmente precedida de uma etapa de pré-processamento, onde os dados que porventura devam ser submetidos à etapa de DM são integrados em uma única relação. Um problema importante enfrentado nessa etapa é que, na maioria das vezes, o usuário ainda não tem uma idéia muito precisa dos dados que devem ser extraídos. Levando em consideração a grande habilidade de exploração da mente humana, este trabalho propõe uma técnica de visualização de dados armazenados em múltiplas relações de uma base de dados relacional, com o intuito de auxiliar o usuário na preparação dos dados a serem minerados. Esta técnica permite que a etapa de DM seja aplicada sobre múltiplas relações simultaneamente, trazendo as operações de junção para serem parte desta etapa. De uma maneira geral, a adoção de junções em ferramentas de DM não é prática, devido ao alto custo computacional associado às operações de junção. Entretanto, os resultados obtidos nas avaliações de desempenho da técnica proposta neste trabalho mostraram que ela reduz esse custo significativamente, tornando possível a exploração visual de múltiplas relações de uma maneira interativa. / In the last decades the capacity of information generation and accumulation increased quickly. With the explosive growth in the volume of data, new techniques and tools are being sought to process it and to automatically discover useful information from it, leading to techniques known as Knowledge Discovery in Databases KDD where, in general, data mining DM techniques play an important role. The results of applying data mining techniques on datasets are highly dependent on proper data preparation. Therefore, in traditional DM processes, data goes through a pre-processing step that results in just one table that is submitted to mining. An important problem faced during this step is that, most of the times, the analyst doesnt have a clear idea of what portions of data should be mined. This work reckons the strong ability of human beings to interpret data represented in graphical format, to develop a technique to visualize data from multiple tables, helping human analysts when preparing data to DM. This technique allows the data mining process to be applied over multiple relations at once, bringing the join operations to become part of this process. In general, the use of multiple tables in DM tools is not practical, due to the high computational cost required to explore them. Experimental evaluation of the proposed technique shows that it reduces this cost significantly, turning it possible to visually explore data from multiple tables in an interactive way. knowledge discovery in databases mineração visual de dados pré-processamento pre-processing visual data mining
97	Reálná úloha dobývání znalostí / Actual role of knowledge discovery in databases Pešek, Jiří January 2012 (has links) The thesis "Actual role of knowledge discovery in databases˝ is concerned with churn prediction in mobile telecommunications. The issue is based on real data of a telecommunication company and it covers all steps of data mining process. In accord with the methodology CRISP-DM, the work looks thouroughly at the following stages: business understanding, data understanding, data preparation, modeling, evaluation and deployment. As far as a system for knowledge discovery in databases is concerned, the tool IBM SPSS Modeler was selected. The introductory chapter of the theoretical part familiarises the reader with the issue of so called churn management, which comprises the given assignment; the basic concepts related to data mining are defined in the chapter as well. The attention is also given to the basic types of tasks of knowledge discovery of databasis and algorithms that are pertinent to the selected assignment (decision trees, regression, neural network, bayesian network and SVM). The methodology describing phases of knowledge discovery in databases is included in a separate chapter, wherein the methodology of CRIPS-DM is examined in greater detail, since it represents the foundation for the solution of our practical assignment. The conclusion of the theoretical part also observes comercial or freely available systems for knowledge discovery in databases.
98	Interestingness Measures for Association Rules in a KDD Process : PostProcessing of Rules with ARQAT Tool Huynh, Xuan-Hiep 07 December 2006 (has links) (PDF) This work takes place in the framework of Knowledge Discovery in Databases (KDD), often called "Data Mining". This domain is both a main research topic and an application ¯eld in companies. KDD aims at discovering previously unknown and useful knowledge in large databases. In the last decade many researches have been published about association rules, which are frequently used in data mining. Association rules, which are implicative tendencies in data, have the advantage to be an unsupervised model. But, in counter part, they often deliver a large number of rules. As a consequence, a postprocessing task is required by the user to help him understand the results. One way to reduce the number of rules - to validate or to select the most interesting ones - is to use interestingness measures adapted to both his/her goals and the dataset studied. Selecting the right interestingness measures is an open problem in KDD. A lot of measures have been proposed to extract the knowledge from large databases and many authors have introduced the interestingness properties for selecting a suitable measure for a given application. Some measures are adequate for some applications but the others are not. In our thesis, we propose to study the set of interestingness measure available in the literature, in order to evaluate their behavior according to the nature of data and the preferences of the user. The ¯nal objective is to guide the user's choice towards the measures best adapted to its needs and in ¯ne to select the most interesting rules. For this purpose, we propose a new approach implemented in a new tool, ARQAT (Association Rule Quality Analysis Tool), in order to facilitate the analysis of the behavior about 40 interest- ingness measures. In addition to elementary statistics, the tool allows a thorough analysis of the correlations between measures using correlation graphs based on the coe±cients suggested by Pear- son, Spearman and Kendall. These graphs are also used to identify the clusters of similar measures. Moreover, we proposed a series of comparative studies on the correlations between interestingness measures on several datasets. We discovered a set of correlations not very sensitive to the nature of the data used, and which we called stable correlations. Finally, 14 graphical and complementary views structured on 5 levels of analysis: ruleset anal- ysis, correlation and clustering analysis, most interesting rules analysis, sensitivity analysis, and comparative analysis are illustrated in order to show the interest of both the exploratory approach and the use of complementary views. [INFO] Computer Science Knowledge Discovery in Databases (KDD) interestingness measures postprocessing of association rules clustering correlation graph stability analysis
99	FCART: A New FCA-based System for Data Analysis and Knowledge Discovery Neznanov, Alexey A., Ilvovsky, Dmitry A., Kuznetsov, Sergei O. 28 May 2013 (has links) (PDF) We introduce a new software system called Formal Concept Analysis Research Toolbox (FCART). Our goal is to create a universal integrated environment for knowledge and data engineers. FCART is constructed upon an iterative data analysis methodology and provides a built-in set of research tools based on Formal Concept Analysis techniques for working with object-attribute data representations. The provided toolset allows for the fast integration of extensions on several levels: from internal scripts to plugins. FCART was successfully applied in several data mining and knowledge discovery tasks. Examples of applying the system in medicine and criminal investigations are considered. Datenanalyse Formale Begriffsanalyse Wissensgewinnung Data Analysis Formal Concept Analysis Knowledge Discovery Software ddc:004 rvk:ST 136 rvk:ST 304
100	Fuzzy-Granular Based Data Mining for Effective Decision Support in Biomedical Applications He, Yuanchen 04 December 2006 (has links) Due to complexity of biomedical problems, adaptive and intelligent knowledge discovery and data mining systems are highly needed to help humans to understand the inherent mechanism of diseases. For biomedical classification problems, typically it is impossible to build a perfect classifier with 100% prediction accuracy. Hence a more realistic target is to build an effective Decision Support System (DSS). In this dissertation, a novel adaptive Fuzzy Association Rules (FARs) mining algorithm, named FARM-DS, is proposed to build such a DSS for binary classification problems in the biomedical domain. Empirical studies show that FARM-DS is competitive to state-of-the-art classifiers in terms of prediction accuracy. More importantly, FARs can provide strong decision support on disease diagnoses due to their easy interpretability. This dissertation also proposes a fuzzy-granular method to select informative and discriminative genes from huge microarray gene expression data. With fuzzy granulation, information loss in the process of gene selection is decreased. As a result, more informative genes for cancer classification are selected and more accurate classifiers can be modeled. Empirical studies show that the proposed method is more accurate than traditional algorithms for cancer classification. And hence we expect that genes being selected can be more helpful for further biological studies. Granular Computing Fuzzy Association Rule Mining Decision Support System Binary Classification Bioinformatics Computational Intelligence Data Mining Knowledge Discovery Computer Sciences

Search results