Spelling suggestions: "subject:"knowledgediscover"" "subject:"knowledgediscoveryfor""
11 |
Analysis Guided Visual Exploration of Multivariate DataYang, Di 04 May 2007 (has links)
Visualization systems traditionally focus on graphical representation of information. They tend not to provide integrated analytical services that could aid users in tackling complex knowledge discovery tasks. Users¡¯ exploration in such environments is usually impeded due to several problems: 1) Valuable information is hard to discover, when too much data is visualized on the screen. 2) They have to manage and organize their discoveries off line, because no systematic discovery management mechanism exists. 3) Their discoveries based on visual exploration alone may lack accuracy. 4) They have no convenient access to the important knowledge learned by other users. To tackle these problems, it has been recognized that analytical tools must be introduced into visualization systems. In this paper, we present a novel analysis-guided exploration system, called the Nugget Management System (NMS). It leverages the collaborative effort of human comprehensibility and machine computations to facilitate users¡¯ visual exploration process. Specifically, NMS first extracts the valuable information (nuggets) hidden in datasets based on the interests of users. Given that similar nuggets may be re-discovered by different users, NMS consolidates the nugget candidate set by clustering based on their semantic similarity. To solve the problem of inaccurate discoveries, data mining techniques are applied to refine the nuggets to best represent the patterns existing in datasets. Lastly, the resulting well-organized nugget pool is used to guide users¡¯ exploration. To evaluate the effectiveness of NMS, we integrated NMS into XmdvTool, a freeware multivariate visualization system. User studies were performed to compare the users¡¯ efficiency and accuracy of finishing tasks on real datasets, with and without the help of NMS. Our user studies confirmed the effectiveness of NMS. Keywords: Visual Analytics, Visual Knowledge
|
12 |
Profiling topics on the Web for knowledge discoverySehgal, Aditya Kumar 01 January 2007 (has links)
The availability of large-scale data on the Web motivates the development of automatic algorithms to analyze topics and to identify relationships between topics. Various approaches have been proposed in the literature. Most focus on specific topics, mainly those representing people, with little attention to topics of other kinds. They are also less flexible in how they represent topics.
In this thesis we study existing methods as well as describe a different approach, based on profiles, for representing topics. A Topic Profile is analogous to a synopsis of a topic and consists of different types of features. Profiles are flexible to allow different combinations of features to be emphasized and are extensible to support new features to be incorporated without having to change the underlying logic.
More generally, topic profiles provide an abstract framework that can be used to create different types of concrete representations for topics. Different options regarding the number of documents considered for a topic or types of features extracted can be decided based on requirements of the problem as well as the characteristics of the data. Topic profiles also provide a framework to explore relationships between topics.
We compare different methods for building profiles and evaluate them in terms of their information content and their ability to predict relationships between topics. We contribute new methods in term weighting and for identifying relevant text segments in web documents.
In this thesis, we present an application of our profile-based approach to explore social networks of US senators generated from web data and compare with networks generated from voting data. We consider both general networks as well as issue-specific networks. We also apply topic profiles for identifying and ranking experts given topics of interest, as part of the 2007 TREC Expert Search task.
Overall, our results show that topic profiles provide a strong foundation for exploring different topics and for mining relationships between topics using web data. Our approach can be applied to a wide range of web knowledge discovery problems, in contrast to existing approaches that are mostly designed for specific problems.
|
13 |
A Belief Theoretic Approach for Automated Collaborative FilteringWickramarathne, Thanuka Lakmal 01 January 2008 (has links)
WICKRAMARATHNE, T. L. (M.S., Electrical and Computer Engineering) A Belief Theoretic Approach for Automated Collaborative Filtering (May 2008) Abstract of a thesis at the University of Miami. Thesis supervised by Professor Kamal Premaratne. No. of pages in text. (84) Automated Collaborative Filtering (ACF) is one of the most successful strategies available for recommender systems. Application of ACF in more sensitive and critical applications however has been hampered by the absence of better mechanisms to accommodate imperfections (ambiguities and uncertainties in ratings, missing ratings, etc.) that are inherent in user preference ratings and propagate such imperfections throughout the decision making process. Thus one is compelled to make various "assumptions" regarding the user preferences giving rise to predictions that lack sufficient integrity. With its Dempster-Shafer belief theoretic basis, CoFiDS, the automated Collaborative Filtering algorithm proposed in this thesis, can (a) represent a wide variety of data imperfections; (b) propagate the partial knowledge that such data imperfections generate throughout the decision-making process; and (c) conveniently incorporate contextual information from multiple sources. The "soft" predictions that CoFiDS generates provide substantial exibility to the domain expert. Depending on the associated DS theoretic belief-plausibility measures, the domain expert can either render a "hard" decision or narrow down the possible set of predictions to as smaller set as necessary. With its capability to accommodate data imperfections, CoFiDS widens the applicability of ACF, from the more popular domains, such as movie and book recommendations, to more sensitive and critical problem domains, such as medical expert support systems, homeland security and surveillance, etc. We use a benchmark movie dataset and a synthetic dataset to validate CoFiDS and compare it to several existing ACF systems.
|
14 |
A Randomness Based Analysis on the Data Size Needed for Removing Deceptive PatternsIBARAKI, Toshihide, BOROS, Endre, YAGIURA, Mutsunori, HARAGUCHI, Kazuya 01 March 2008 (has links)
No description available.
|
15 |
Using Fuzzy Rule Induction for Mining Classification KnowledgeChen, Kun-Hsien 02 August 2000 (has links)
With the computerization of businesses, more and more data are generated and stored in databases for many business applications. Finding interesting patterns among those data may lead to useful knowledge that provides competitive advantage in business. Knowledge discovery in database has thus become an important issue to help business acquire knowledge that assists managerial and operational work. Among many types of knowledge, classification knowledge is widely used. Most classification rules learned by induction algorithms are in the crisp form. Fuzzy linguistic representation of rules, however, is much closer to the way human reasons. The objective of this research is to propose a method to mine classification knowledge from the database with fuzzy descriptions. The procedure contains five steps, starting from data preparation to rule pruning. A rule induction algorithm, RITIO, is employed to generate the classification rules. Fuzzy inference mechanism that includes fuzzy matching and output reasoning is specified to yield the output class. An experiment is conducted using several databases to show advantages of this work. The proposed method is justified with good system performance. It can be easily implemented in various business applications on classification tasks.
|
16 |
Feature Construction, Selection And Consolidation For Knowledge DiscoveryLi, Jiexun January 2007 (has links)
With the rapid advance of information technologies, human beings increasingly rely on computers to accumulate, process, and make use of data. Knowledge discovery techniques have been proposed to automatically search large volumes of data for patterns. Knowledge discovery often requires a set of relevant features to represent the specific domain. My dissertation presents a framework of feature engineering for knowledge discovery, including feature construction, feature selection, and feature consolidation.Five essays in my dissertation present novel approaches to construct, select, or consolidate features in various applications. Feature construction is used to derive new features when relevant features are unknown. Chapter 2 focuses on constructing informative features from a relational database. I introduce a probabilistic relational model-based approach to construct personal and social features for identity matching. Experiments on a criminal dataset showed that social features can improve the matching performance. Chapter 3 focuses on identifying good features for knowledge discovery from text. Four types of writeprint features are constructed and shown effective for authorship analysis of online messages. Feature selection is aimed at identifying a subset of significant features from a high dimensional feature space. Chapter 4 presents a framework of feature selection techniques. This essay focuses on identifying marker genes for microarray-based cancer classification. Our experiments on gene array datasets showed excellent performance for optimal search-based gene subset selection. Feature consolidation is aimed at integrating features from diverse data sources or in heterogeneous representations. Chapter 5 presents a Bayesian framework to integrate gene functional relations extracted from heterogeneous data sources such as gene expression profiles, biological literature, and genome sequences. Chapter 6 focuses on kernel-based methods to capture and consolidate information in heterogeneous data representations. I design and compare different kernels for relation extraction from biomedical literature. Experiments show good performances of tree kernels and composite kernels for biomedical relation extraction.These five essays together compose a framework of feature engineering and present different techniques to construct, select, and consolidate relevant features. This feature engineering framework contributes to the domain of information systems by improving the effectiveness, efficiency, and interpretability of knowledge discovery.
|
17 |
Computer-Enhanced Knowledge Discovery in Environmental ScienceFukuda, Kyoko January 2009 (has links)
Encouraging the use of computer algorithms by developing new algorithms and introducing uncommonly known algorithms for use on environmental science problems is a significant contribution, as it provides knowledge discovery tools to extract new aspects of results and draw new insights, additional to those from general statistical methods. Conducting analysis with appropriately chosen methods, in terms of quality of performance and results, computation time, flexibility and applicability to data of various natures, will help decision making in the policy development and management process for environmental studies. This thesis has three fundamental aims and motivations. Firstly, to develop a flexibly applicable attribute selection method, Tree Node Selection (TNS), and a decision tree assessment tool, Tree Node Selection for assessing decision tree structure (TNS-A), both of which use decision trees pre-generated by the widely used C4.5 decision tree algorithm as their information source, to identify important attributes from data. TNS helps the cost effective and efficient data collection and policy making process by selecting fewer, but important, attributes, and TNS-A provides a tool to assess the decision tree structure to extract information on the relationship of attributes and decisions. Secondly, to introduce the use of new, theoretical or unknown computer algorithms, such as the K-Maximum Subarray Algorithm (K-MSA) and Ant-Miner, by adjusting and maximizing their applicability and practicality to assess environmental science problems to bring new insights. Additionally, the unique advanced statistical and mathematical method, Singular Spectrum Analysis (SSA), is demonstrated as a data pre-processing method to help improve C4.5 results on noisy measurements. Thirdly, to promote, encourage and motivate environmental scientists to use ideas and methods developed in this thesis. The methods were tested with benchmark data and various real environmental science problems: sea container contamination, the Weed Risk Assessment model and weed spatial analysis for New Zealand Biosecurity, air pollution, climate and health, and defoliation imagery. The outcome of this thesis will be to introduce the concept and technique of data mining, a process of knowledge discovery from databases, to environmental science researchers in New Zealand and overseas by collaborating on future research to achieve, together with future policy and management, to maintain and sustain a healthy environment to live in.
|
18 |
Ontology based personalized modeling for chronic disease risk evaluation and knowledge discovery: an integrated approachVerma, Anju January 2009 (has links)
Populations are aging and the prevalence of chronic disease, persisting for many years, is increasing. The most common, non-communicable chronic diseases in developed countries are; cardiovascular disease (CVD), type 2 diabetes, obesity, arthritis and specific cancers. Chronic diseases such as cardiovascular disease, type 2 diabetes and obesity have high prevalence and develop over the course of life due to a number of interrelated factors including genetic predisposition, nutrition and lifestyle. With the development and completion of human genome sequencing, we are able to trace genes responsible for proteins and metabolites that are linked with these diseases. A computerized model focused on organizing knowledge related to genes, nutrition and the three chronic diseases, namely, cardiovascular disease, type 2 diabetes and obesity has been developed for the Ontology-Based Personalized Risk Evaluation for Chronic Disease Project. This model is a Protégé-based ontological representation which has been developed for entering and linking concepts and data for these three chronic diseases. This model facilitates to identify interrelationships between concepts. The ontological representation provides the framework into which information on individual patients, disease symptoms, gene maps, diet and life history can be input, and risks, profiles, and recommendations derived. Personal genome and health data could provide a guide for designing and building a medical health administration system for taking relevant annual medical tests, e.g. gene expression level changes for health surveillance. One method, called transductive neuro-fuzzy inference system with weighted data normalization is used to evaluate personalized risk of chronic disease. This personalized approach has been used for two different chronic diseases, predicting the risk of cardiovascular disease and predicting the risk of type 2 diabetes. For predicting the risk of cardiovascular disease, the National Nutrition Health Survey 97 data from New Zealand population has been used. This data contains clinical, anthropometric and nutritional variables. For predicting risk of type 2 diabetes, data from the Italian population with clinical and genetic variables has been used. It has been discovered that genes responsible for causing type 2 diabetes are different in male and female samples. A framework to integrate the personalized model and the chronic disease ontology is also developed with the aim of providing support for further discovery through the integration of the ontological representation in order to build an expert system in genes of interest and relevant dietary components.
|
19 |
Ontology based personalized modeling for chronic disease risk evaluation and knowledge discovery: an integrated approachVerma, Anju January 2009 (has links)
Populations are aging and the prevalence of chronic disease, persisting for many years, is increasing. The most common, non-communicable chronic diseases in developed countries are; cardiovascular disease (CVD), type 2 diabetes, obesity, arthritis and specific cancers. Chronic diseases such as cardiovascular disease, type 2 diabetes and obesity have high prevalence and develop over the course of life due to a number of interrelated factors including genetic predisposition, nutrition and lifestyle. With the development and completion of human genome sequencing, we are able to trace genes responsible for proteins and metabolites that are linked with these diseases. A computerized model focused on organizing knowledge related to genes, nutrition and the three chronic diseases, namely, cardiovascular disease, type 2 diabetes and obesity has been developed for the Ontology-Based Personalized Risk Evaluation for Chronic Disease Project. This model is a Protégé-based ontological representation which has been developed for entering and linking concepts and data for these three chronic diseases. This model facilitates to identify interrelationships between concepts. The ontological representation provides the framework into which information on individual patients, disease symptoms, gene maps, diet and life history can be input, and risks, profiles, and recommendations derived. Personal genome and health data could provide a guide for designing and building a medical health administration system for taking relevant annual medical tests, e.g. gene expression level changes for health surveillance. One method, called transductive neuro-fuzzy inference system with weighted data normalization is used to evaluate personalized risk of chronic disease. This personalized approach has been used for two different chronic diseases, predicting the risk of cardiovascular disease and predicting the risk of type 2 diabetes. For predicting the risk of cardiovascular disease, the National Nutrition Health Survey 97 data from New Zealand population has been used. This data contains clinical, anthropometric and nutritional variables. For predicting risk of type 2 diabetes, data from the Italian population with clinical and genetic variables has been used. It has been discovered that genes responsible for causing type 2 diabetes are different in male and female samples. A framework to integrate the personalized model and the chronic disease ontology is also developed with the aim of providing support for further discovery through the integration of the ontological representation in order to build an expert system in genes of interest and relevant dietary components.
|
20 |
Ontology based personalized modeling for chronic disease risk evaluation and knowledge discovery: an integrated approachVerma, Anju January 2009 (has links)
Populations are aging and the prevalence of chronic disease, persisting for many years, is increasing. The most common, non-communicable chronic diseases in developed countries are; cardiovascular disease (CVD), type 2 diabetes, obesity, arthritis and specific cancers. Chronic diseases such as cardiovascular disease, type 2 diabetes and obesity have high prevalence and develop over the course of life due to a number of interrelated factors including genetic predisposition, nutrition and lifestyle. With the development and completion of human genome sequencing, we are able to trace genes responsible for proteins and metabolites that are linked with these diseases. A computerized model focused on organizing knowledge related to genes, nutrition and the three chronic diseases, namely, cardiovascular disease, type 2 diabetes and obesity has been developed for the Ontology-Based Personalized Risk Evaluation for Chronic Disease Project. This model is a Protégé-based ontological representation which has been developed for entering and linking concepts and data for these three chronic diseases. This model facilitates to identify interrelationships between concepts. The ontological representation provides the framework into which information on individual patients, disease symptoms, gene maps, diet and life history can be input, and risks, profiles, and recommendations derived. Personal genome and health data could provide a guide for designing and building a medical health administration system for taking relevant annual medical tests, e.g. gene expression level changes for health surveillance. One method, called transductive neuro-fuzzy inference system with weighted data normalization is used to evaluate personalized risk of chronic disease. This personalized approach has been used for two different chronic diseases, predicting the risk of cardiovascular disease and predicting the risk of type 2 diabetes. For predicting the risk of cardiovascular disease, the National Nutrition Health Survey 97 data from New Zealand population has been used. This data contains clinical, anthropometric and nutritional variables. For predicting risk of type 2 diabetes, data from the Italian population with clinical and genetic variables has been used. It has been discovered that genes responsible for causing type 2 diabetes are different in male and female samples. A framework to integrate the personalized model and the chronic disease ontology is also developed with the aim of providing support for further discovery through the integration of the ontological representation in order to build an expert system in genes of interest and relevant dietary components.
|
Page generated in 0.0558 seconds