Spelling suggestions: "subject:"ddd"" "subject:"3dd""
11 |
Mapeamento semiautomático por meio de padrão espectro-temporal de áreas agrícolas e alvos permanentes com evi/modis no Paraná / Semiautomatic mapping of agricultural areas and targets permanent by profile spectrum-temporary of evi / modis in ParanaVerica, Weverton Rodrigo 16 February 2018 (has links)
Submitted by Neusa Fagundes (neusa.fagundes@unioeste.br) on 2018-09-06T19:38:50Z
No. of bitstreams: 2
Weverton_Verica2018.pdf: 4544186 bytes, checksum: 766200b4dea97433d3d88b08cbe3e548 (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2018-09-06T19:38:50Z (GMT). No. of bitstreams: 2
Weverton_Verica2018.pdf: 4544186 bytes, checksum: 766200b4dea97433d3d88b08cbe3e548 (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Previous issue date: 2018-02-16 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / Knowledge of location and quantity of areas for agriculture or either native or planted forests is relevant for public managers to make their decisions based on reliable data. In addition, part of ICMS revenues from the Municipal Participation Fund (FPM) depends on agricultural production data, number of rural properties and the environmental factor. The objective of this research was to design an objective and semiautomatic methodology to map agricultural areas and targets permanent, and later to identify areas of soybean, corn 1st and 2nd crops, winter crops, semi-perennial agriculture, forests and other permanent targets in the state of Paraná for the harvest years (2013/14 to 2016/17), using temporal series of EVI/Modis vegetation indexes. The proposed methodology follows the steps of the Knowledge Discovery Process in Database – KDD, in which the classification task was performed by the Random Forest algorithm. For the validation of the mappings, samples extracted from Landsat-8 images were used, obtaining the global accuracy indices greater than 84.37% and a kappa index ranging from 0.63 to 0.98, hence considered mappings with good or excellent spatial accuracy. The municipal data of the area of soybean, corn 1st crop, corn 2nd crop and winter crops mapped were confronted with the official statistics obtaining coefficients of linear correlation between 0.61 to 0.9, indicating moderate or strong correlation with the data officials. In this way, the proposed semi-automatic methodology was successful in the mapping, as well as the automation of the process of elaboration of the metrics, thus generating a script in the software R in order to facilitate future mappings with low processing time. / O conhecimento da localização e da quantidade de áreas destinadas a agricultura ou a
florestas nativas ou plantadas é relevante para que os gestores públicos tomem suas
decisões pautadas em dados fidedignos com a realidade. Além disto, parte das receitas de
ICMS advindas do Fundo de Participação aos Municípios (FPM) depende de dados de
produção agropecuária, número de propriedades rurais e fator ambiental. Diante disso, esta
dissertação teve como objetivo elaborar uma metodologia objetiva e semiautomática para
mapear áreas agrícolas e alvos permanente e posteriormente identificar áreas de soja, milho
1ª e 2ª safras, culturas de inverno, agricultura semi-perene, florestas e demais alvos
permanentes no estado do Paraná para os anos-safra (2013/14 a 2016/17), utilizando séries
temporais de índices de vegetação EVI/Modis. A metodologia proposta segue os passos do
Processo de descoberta de conhecimento em base de dados – KDD, sendo que para isso
foram elaboradas métricas extraídas do perfil espectro temporal de cada pixel e foi
empregada a tarefa de classificação, realizada pelo algoritmo Random Forest. Para a
validação dos mapeamentos utilizaram-se amostras extraídas de imagens Landsat-8,
obtendo-se os índices de exatidão global maior que 84,37% e um índice kappa variando entre
0,63 e 0,98, sendo, portanto, considerados mapeamentos com boa ou excelente acurácia
espacial. Os dados municipais da área de soja, milho 1ª safra, milho 2ª safra e culturas de
inverno mapeada foram confrontados com as estatísticas oficiais obtendo-se coeficientes de
correlação linear entre 0,61 a 0,9, indicando moderada ou forte correlação com os dados
oficiais. Desse modo, a metodologia semiautomática proposta obteve êxito na realização do
mapeamento, bem como a automatização do processo de elaboração das métricas, gerando,
com isso um script no software R de maneira a facilitar mapeamentos futuros com baixo
tempo de processamento.
|
12 |
Data mining em banco de dados de eletrocardiograma / Data mining in electrocardiogram databasesJosé Alves Ferreira 23 April 2014 (has links)
Neste estudo, foi proposta a exploração de um banco de dados, com informações de exames de eletrocardiogramas (ECG), utilizado pelo sistema denominado Tele-ECG do Instituto Dante Pazzanese de Cardiologia, aplicando a técnica de data mining (mineração de dados) para encontrar padrões que colaborem, no futuro, para a aquisição de conhecimento na análise de eletrocardiograma. A metodologia proposta permite que, com a utilização de data mining, investiguem-se dados à procura de padrões sem a utilização do traçado do ECG. Três pacotes de software (Weka, Orange e R-Project) do tipo open source foram utilizados, contendo, cada um deles, um conjunto de implementações algorítmicas e de diversas técnicas de data mining, além de serem softwares de domínio público. Regras conhecidas foram encontradas (confirmadas pelo especialista médico em análise de eletrocardiograma), evidenciando a validade dessa metodologia. / In this study, the exploration of electrocardiograms (ECG) databases, obtained from a Tele-ECG System of Dante Pazzanese Institute of Cardiology, has been proposed, applying the technique of data mining to find patterns that could collaborate, in the future, for the acquisition of knowledge in the analysis of electrocardiograms. The proposed method was to investigate the data looking for patterns without the use of the ECG traces. Three Data-mining open source software packages (Weka, Orange and R - Project) were used, containing, each one, a set of algorithmic implementations and various data mining techniques, as well as being a public domain software. Known rules were found (confirmed by medical experts in electrocardiogram analysis), showing the validity of the methodology.
|
13 |
Konzeption eines Auswahlverfahrens zur Datenanalyse im Einzelhandel am Beispiel einer Einkaufsverhaltensanalyse im LebensmitteleinzelhandelLohaus, Daniela 12 March 2012 (has links)
Das veränderte Einkaufsverhalten von Einzelhandelskunden führt zu notwendigen Anpassungen von Knowledge-Discovery-in-Databases-(KDD)-Projekten.
Aufgrund der mangelnden Ausrichtung von Theorie und Praxis auf die aktuellen Entwicklungen im Einzelhandel soll die Untersuchung dazu beitragen, Methoden zur Einkaufsverhaltensanalyse zu identifizieren, welche die effiziente und effektive Durchführung des KDD-Projekts gewährleisten. Dazu werden Methoden eingegrenzt und theoriegeleitet Parameter zur kontextspezifischen Methodenauswahl identifiziert. Anschließend sollen die Parameter in ein Auswahlverfahren einfließen welches empirisch evaluiert wird.
|
14 |
Tools and techniques for knowledge discoveryHoward, Craig M. January 2001 (has links)
No description available.
|
15 |
Desarrollo y evaluación de metodologías para la aplicación de regresiones logísticas en modelos de comportamiento bajo supuesto de independenciaBiron Lattes, Miguel Ignacio January 2012 (has links)
Ingeniero Civil Industrial / El presente documento tiene por objetivo desarrollar y evaluar una metodología de construcción de regresiones logísticas para scorings de comportamiento, que se haga cargo del supuesto de independencia de las observaciones inherente al método de estimación de máxima verosimilitud.
Las regresiones logísticas, debido a su facilidad de interpretación y a sus buen desempeño, son ampliamente utilizadas para la estimación de modelos de probabilidad de incumplimiento en la industria financiera, los que a su vez sirven múltiples objetivos: desde la originación de créditos, pasando por la provisión de deuda, hasta la pre aprobación de créditos y cupos de líneas y tarjetas. Es por esta amplia utilización que se considera necesario estudiar si el no cumplimiento de supuestos teóricos de construcción puede afectar la calidad de los scorings creados.
Se generaron cuatro mecanismos de selección de datos que aseguran la independencia de observaciones para ser comparados contra el método que utiliza todas las observaciones de los clientes (algoritmo base), los que posteriormente fueron implementados en una base de datos de una cartera de consumo de una institución financiera, en el marco de la metodología KDD de minería de datos.
Los resultados muestran que los modelos implementados tienen un buen poder de discriminación, llegando a superar el 74% de KS en la base de validación. Sin embargo, ninguno de los métodos propuestos logra superar el desempeño del algoritmo base, lo que posiblemente se debe a que los métodos de selección de datos reducen la disponibilidad de observaciones para el entrenamiento, lo que a su vez disminuye la posibilidad de poder construir modelos más complejos (mayor cantidad de variables) que finalmente entreguen un mejor desempeño.
|
16 |
[en] INTELLIGENT ASSISTANCE FOR KDD-PROCESS ORIENTATION / [pt] ASSISTÊNCIA INTELIGENTE À ORIENTAÇÃO DO PROCESSO DE DESCOBERTA DE CONHECIMENTO EM BASES DE DADOSRONALDO RIBEIRO GOLDSCHMIDT 15 December 2003 (has links)
[pt] A notória complexidade inerente ao processo de KDD -
Descoberta de Conhecimento em Bases de Dados - decorre
essencialmente de aspectos relacionados ao controle e à
condução deste processo (Fayyad et al., 1996b; Hellerstein
et al., 1999). De uma maneira geral, estes aspectos envolvem
dificuldades em perceber inúmeros fatos cuja origem e os
níveis de detalhe são os mais diversos e difusos, em
interpretar adequadamente estes fatos, em conjugar
dinamicamente tais interpretações e em decidir que ações
devem ser realizadas de forma a procurar obter bons
resultados. Como identificar precisamente os objetivos do
processo, como escolher dentre os inúmeros algoritmos de
mineração e de pré-processamento de dados existentes e,
sobretudo, como utilizar adequadamente os algoritmos
escolhidos em cada situação são alguns exemplos
das complexas e recorrentes questões na condução de
processos de KDD. Cabe ao analista humano a árdua tarefa de
orientar a execução de processos de KDD. Para tanto, diante
de cada cenário, o homem utiliza sua experiência anterior,
seus conhecimentos e sua intuição para interpretar e
combinar os fatos de forma a decidir qual a estratégia a
ser adotada (Fayyad et al., 1996a, b; Wirth et al., 1998).
Embora reconhecidamente úteis e desejáveis, são poucas as
alternativas computacionais existentes voltadas a auxiliar
o homem na condução do processo de KDD (Engels, 1996; Amant
e Cohen, 1997; Livingston, 2001; Bernstein et al., 2002;
Brazdil et al., 2003). Aliado ao exposto acima, a demanda
por aplicações de KDD em diversas áreas vem crescendo de
forma muito acentuada nos últimos anos (Buchanan, 2000). É
muito comum não existirem profissionais com experiência em
KDD disponíveis para atender a esta crescente demanda
(Piatetsky-Shapiro, 1999). Neste contexto, a criação de
ferramentas inteligentes que auxiliem o homem no controle
do processo de KDD se mostra ainda mais oportuna (Brachman
e Anand, 1996; Mitchell, 1997). Assim sendo, esta tese teve
como objetivos pesquisar, propor, desenvolver e avaliar uma
Máquina de Assistência Inteligente à Orientação do Processo
de KDD que possa ser utilizada, fundamentalmente, como
instrumento didático voltado à formação de profissionais
especializados na área da Descoberta de Conhecimento em
Bases de Dados. A máquina proposta foi formalizada com base
na Teoria do Planejamento para Resolução de Problemas
(Russell e Norvig, 1995) da Inteligência Artificial
e implementada a partir da integração de funções de
assistência utilizadas em diferentes níveis de controle do
processo de KDD: Definição de Objetivos, Planejamento de
Ações de KDD, Execução dos Planos de Ações de KDD e
Aquisição e Formalização do Conhecimento. A Assistência à
Definição de Objetivos tem como meta auxiliar o homem
na identificação de tarefas de KDD cuja execução seja
potencialmente viável em aplicações de KDD. Esta
assistência foi inspirada na percepção de um certo tipo
de semelhança no nível intensional apresentado entre
determinados bancos de dados. Tal percepção auxilia na
prospecção do tipo de conhecimento a ser procurado, uma vez
que conjuntos de dados com estruturas similares tendem a
despertar interesses similares mesmo em aplicações de KDD
distintas. Conceitos da Teoria da Equivalência entre
Atributos de Bancos de Dados (Larson et al., 1989)
viabilizam a utilização de uma estrutura comum na qual
qualquer base de dados pode ser representada. Desta forma,
bases de dados, ao serem representadas na nova estrutura,
podem ser mapeadas em tarefas de KDD, compatíveis com tal
estrutura. Conceitos de Espaços Topológicos (Lipschutz,
1979) e recursos de Redes Neurais Artificiais (Haykin,
1999) são utilizados para viabilizar os mapeamentos entre
padrões heterogêneos. Uma vez definidos os objetivos em uma
aplicação de KDD, decisões sobre como tais objetivos podem
ser alcançados se tornam necessárias. O primeiro
passo envolve a escolha de qual algoritmo de mineração de dados é o mais
apropriado para o problema em questão. A Assistência ao Planejamento de Ações
de KDD auxilia o homem nesta escolha. Utiliza, para tanto, uma metodologia de
ordenação dos algoritmos de mineração baseada no desempenho prévio destes
algoritmos em problemas similares (Soares et al., 2001; Brazdil et al., 2003).
Critérios de ordenação de algoritmos baseados em similaridade entre bases de
dados nos níveis intensional e extensional foram propostos, descritos e avaliados.
A partir da escolha de um ou mais algoritmos de mineração de dados, o passo
seguinte requer a escolha de como deverá ser realizado o pré-processamento dos
dados. Devido à diversidade de algoritmos de pré-processamento, são muitas as
alternativas de combinação entre eles (Bernstein et al., 2002). A Assistência ao
Planejamento de Ações de KDD também auxilia o homem na formulação e na
escolha do plano ou dos planos de ações de KDD a serem adotados. Utiliza, para
tanto, conceitos da Teoria do Planejamento para Resolução de Problemas.
Uma vez escolhido um plano de ações de KDD, surge a necessidade de
executá-lo. A execução de um plano de ações de KDD compreende a execução, de
forma ordenada, dos algoritmos de KDD previstos no plano. A execução de um
algoritmo de KDD requer conhecimento sobre ele. A Assistência à Execução dos
Planos de Ações de KDD provê orientações específicas sobre algoritmos de KDD.
Adicionalmente, esta assistência dispõe de mecanismos que auxiliam, de forma
especializada, no processo de execução de algoritmos de KDD e na análise dos
resultados obtidos. Alguns destes mecanismos foram descritos e avaliados.
A execução da Assistência à Aquisição e Formalização do Conhecimento
constitui-se em um requisito operacional ao funcionamento da máquina proposta.
Tal assistência tem por objetivo adquirir e disponibilizar os conhecimentos sobre
KDD em uma representação e uma organização que viabilizem o processamento
das funções de assistência mencionadas anteriormente. Diversos recursos e
técnicas de aquisição de conhecimento foram utilizados na concepção desta
assistência. / [en] Generally speaking, such aspects involve difficulties in
perceiving innumerable facts whose origin and levels of
detail are highly diverse and diffused, in adequately
interpreting these facts, in dynamically conjugating such
interpretations, and in deciding which actions must be
performed in order to obtain good results. How are the
objectives of the process to be identified in a precise
manner? How is one among the countless existing data mining
and preprocessing algorithms to be selected? And most
importantly, how can the selected algorithms be put to
suitable use in each different situation? These are but
a few examples of the complex and recurrent questions that
are posed when KDD processes are performed. Human analysts
must cope with the arduous task of orienting the execution
of KDD processes. To this end, in face of each different
scenario, humans resort to their previous experiences,
their knowledge, and their intuition in order to interpret
and combine the facts and therefore be able to decide on
the strategy to be adopted (Fayyad et al., 1996a, b; Wirth
et al., 1998). Although the existing computational
alternatives have proved to be useful and desirable, few of
them are designed to help humans to perform KDD processes
(Engels, 1996; Amant and Cohen, 1997; Livingston, 2001;
Bernstein et al., 2002; Brazdil et al., 2003). In
association with the above-mentioned fact, the demand for
KDD applications in several different areas has increased
dramatically in the past few years (Buchanan, 2000). Quite
commonly, the number of available practitioners with
experience in KDD is not sufficient to satisfy this growing
demand (Piatetsky-Shapiro, 1999). Within such a context,
the creation of intelligent tools that aim to assist humans
in controlling KDD processes proves to be even more
opportune (Brachman and Anand, 1996; Mitchell, 1997).
Such being the case, the objectives of this thesis were to
investigate, propose, develop, and evaluate an Intelligent
Machine for KDD-Process Orientation that is basically
intended to serve as a teaching tool to be used in
professional specialization courses in the area of
Knowledge Discovery in Databases. The basis for
formalization of the proposed machine was the Planning
Theory for Problem-Solving (Russell and Norvig, 1995) in
Artificial Intelligence. Its implementation was based on
the integration of assistance functions that are used at
different KDD process control levels: Goal Definition, KDD
Action-Planning, KDD Action Plan Execution, and Knowledge
Acquisition and Formalization. The Goal Definition
Assistant aims to assist humans in identifying KDD
tasks that are potentially executable in KDD applications.
This assistant was inspired by the detection of a certain
type of similarity between the intensional levels presented
by certain databases. The observation of this fact helps
humans to mine the type of knowledge that must be
discovered since data sets with similar structures tend to
arouse similar interests even in distinct KDD applications.
Concepts from the Theory of Attribute Equivalence in
Databases (Larson et al., 1989) make it possible to use a
common structure in which any database may be represented.
In this manner, when databases are represented in the new
structure, it is possible to map them into KDD tasks that
are compatible with such a structure. Topological space
concepts and ANN resources as described in Topological
Spaces (Lipschutz, 1979) and Artificial Neural Nets
(Haykin, 1999) have been employed so as to allow mapping
between heterogeneous patterns. After the goals have been
defined in a KDD application, it is necessary to decide how
such goals are to be achieved. The first step involves
selecting the most appropriate data mining algorithm for
the problem at hand. The KDD Action-Planning Assistant
helps humans to make this choice. To this end, it makes
use of a methodology for ordering the mining algorithms
that is based on the previous experiences, their knowledge, and their intuition in order to
interpret and combine the facts and therefore be able to decide on the strategy to
be adopted (Fayyad et al., 1996a, b; Wirth et al., 1998). Although the existing
computational alternatives have proved to be useful and desirable, few of them are
designed to help humans to perform KDD processes (Engels, 1996; Amant &
Cohen, 1997; Livingston, 2001; Bernstein et al., 2002; Brazdil et al., 2003). In
association with the above-mentioned fact, the demand for KDD applications in
several different areas has increased dramatically in the past few years (Buchanan,
2000). Quite commonly, the number of available practitioners with experience in
KDD is not sufficient to satisfy this growing demand (Piatetsky-Shapiro, 1999).
Within such a context, the creation of intelligent tools that aim to assist humans in
controlling KDD processes proves to be even more opportune (Brachman &
Anand, 1996; Mitchell, 1997).
Such being the case, the objectives of this thesis were to investigate,
propose, develop, and evaluate an Intelligent Machine for KDD-Process
Orientation that is basically intended to serve as a teaching tool to be used in
professional specialization courses in the area of Knowledge Discovery in
Databases.
The basis for formalization of the proposed machine was the Planning
Theory for Problem-Solving (Russell and Norvig, 1995) in Artificial Intelligence.
Its implementation was based on the integration of assistance functions that are
used at different KDD process control levels: Goal Definition, KDD Action-
Planning, KDD Action Plan Execution, and Knowledge Acquisition and
Formalization.
The Goal Definition Assistant aims to assist humans in identifying KDD
tasks that are potentially executable in KDD applications. This assistant was
inspired by the detection of a certain type of similarity between the intensional
levels presented by certain databases. The observation of this fact helps humans to
mine the type of knowledge that must be discovered since data sets with similar
structures tend to arouse similar interests even in distinct KDD applications.
Concepts from the Theory of Attribute Equivalence in Databases (Larson et al.,
1989) make it possible to use a common structure in which any database may be
represented. In this manner, when databases are represented in the new structure,
it is possible to map them into KDD tasks that are compatible with such a
structure. Topological space concepts and ANN resources as described in
Topological Spaces (Lipschutz, 1979) and Artificial Neural Nets (Haykin, 1999)
have been employed so as to allow mapping between heterogeneous patterns.
After the goals have been defined in a KDD application, it is necessary to
decide how such goals are to be achieved. The first step involves selecting the
most appropriate data mining algorithm for the problem at hand. The KDD
Action-Planning Assistant helps humans to make this choice. To this end, it makes
use of a methodology for ordering the mining algorithms that is based on the
previous performance of these algorithms in similar problems (Soares et al., 2001;
Brazdil et al., 2003). Algorithm ordering criteria based on database similarity at
the intensional and extensional levels were proposed, described and evaluated.
The data mining algorithm or algorithms having been selected, the next step
involves selecting the way in which data preprocessing is to be performed. Since
there is a large variety of preprocessing algorithms, many are the alternatives for
combining them (Bernstein et al., 2002). The KDD Action-Planning Assistant also
helps humans to formulate and to select the KDD action plan or plans to be
adopted. To this end, it makes use of concepts contained in the Planning Theory
for Problem-Solving.
Once a KDD action plan has been chosen, it is necessary to execute it.
Executing a KDD action plan involves the ordered execution of the KDD
algorithms that have been anticipated in the plan. Executing a KDD algorithm
requires knowledge about it. The KDD Action Plan Execution Assistant provides
specific guidance on KDD algorithms. In addition, this assistant is equipped with
mechanisms that provide specialized assistance for performing the KDD
algorithm execution process and for analyzing the results obtained. Some of these
mechanisms have been described and evaluated.
The execution of the Knowledge Acquisition and Formalization Assistant
is an operational requirement for running the proposed machine. The objective of
this assistant is to acquire knowledge about KDD and to make such knowledge
available by representing and organizing it a way that makes it possible to process
the above-mentioned assistance functions. A variety of knowledge acquisition
resources and techniques were employed in the conception of this assistant.
|
17 |
Using knowledge discovery to identify potentially useful patterns of health promotion behavior of 10-12 year old Icelandic childrenOrlygsdottir, Brynja 01 January 2008 (has links)
Icelandic children can expect to live a long and healthy life and have the right to the highest possible standard of health. Despite this, as in other Western countries, the prevalence of psychosocial complaints and long term conditions in Icelandic children is growing and they are struggling with increased levels of preventable health conditions.
The purposes of this cross sectional, secondary analysis were to perform a psychometric evaluation on the instrument School-Children Health Promotion; to describe self-reported health promotion behavior of 10-12 year old Icelandic school children, and to predict novel and potentially useful patterns of health promotion behavior of 10-12 year old Icelandic school children using data mining methods. Existing data from 480 10-12 year old Icelandic school children and 911 parents were analyzed.
Analysis of the instrument School-Children Health Promotion indicates that it is, in general, a valid and reliable instrument for measuring health promotion behavior of 10-12 year old Icelandic children. Five factors emerged from the 21 item instrument, which were labeled: "Positive Thinking." "Diet and Sleep Pattern," "Seek Psycho-social Support," "Coping Behavior," and "Health Habits." The results indicated that girls use more positive health promotion behavior than boys; however, differences in health promotion behavior between 5th and 6th grade students were not obvious. The results of data mining analyses, using the classifiers decision tree (J48) and logistic regression (Logistic) to predict health promotion behavior, showed better performance with the subsets of the five factors and the overall instrument than with the full dataset of 199 items. For the subsets, the logistic regression models performed better than the decision trees with AUC ranging from 0.71 to 0.80. The strongest predictors of health promotion behaviors were validation and caring in friendship, intimate disclosure between friends, and quality of life.
Results of this secondary analysis indicate that friendship is of vital importance with regards to health promotion behavior. Therefore, further studies on the effect friendship has on health promotion behavior of Icelandic children in the 10-12 year old age group are clearly needed.
|
18 |
Investigating the Process of Developing a KDD Model for the Classification of Cases with Cardiovascular Disease Based on a Canadian DatabaseLiu, Chenyu January 2012 (has links)
Medicine and health domains are information intensive fields as data volume has been
increasing constantly from them. In order to make full use of the data, the technique of
Knowledge Discovery in Databases (KDD) has been developed as a comprehensive pathway
to discover valid and unsuspected patterns and trends that are both understandable and useful to data analysts.
The present study aimed to investigate the entire KDD process of developing a classification model for cardiovascular disease (CVD) from a Canadian dataset for the first time. The research data source was Canadian Heart Health Database, which contains 265 easily collected variables and 23,129 instances from ten Canadian provinces. Many practical issues involving in different steps of the integrated process were addressed, and possible solutions were suggested based on the experimental results. Five specific learning schemes representing five distinct KDD approaches were employed, as they were never compared with one another. In addition, two improving approaches including cost-sensitive learning and ensemble learning were also examined. The performance of developed models was
measured in many aspects. The data set was prepared through data cleaning and missing value imputation. Three pairs of experiments demonstrated that the dataset balancing and outlier removal exerted positive influence to the classifier, but the variable normalization was not helpful. Three combinations of subset generation method and evaluation function were tested in variable
subset selection phase, and the combination of Best-First search and Correlation-based
Feature Selection showed comparable goodness and was maintained for other benefits.
Among the five learning schemes investigated, C4.5 decision tree achieved the best
performance on the classification of CVD, followed by Multilayer Feed-forward Network, KNearest Neighbor, Logistic Regression, and Naïve Bayes. Cost-sensitive learning exemplified by the MetaCost algorithm failed to outperform the single C4.5 decision tree when varying the cost matrix from 5:1 to 1:7. In contrast, the models developed from ensemble modeling, especially AdaBoost M1 algorithm, outperformed other models.
Although the model with the best performance might be suitable for CVD screening in
general Canadian population, it is not ready to use in practice. I propose some criteria to improve the further evaluation of the model. Finally, I describe some of the limitations of the study and propose potential solutions to address such limitations through out the KDD process. Such possibilities should be explored in further research.
|
19 |
Investigating the Process of Developing a KDD Model for the Classification of Cases with Cardiovascular Disease Based on a Canadian DatabaseLiu, Chenyu January 2012 (has links)
Medicine and health domains are information intensive fields as data volume has been
increasing constantly from them. In order to make full use of the data, the technique of
Knowledge Discovery in Databases (KDD) has been developed as a comprehensive pathway
to discover valid and unsuspected patterns and trends that are both understandable and useful to data analysts.
The present study aimed to investigate the entire KDD process of developing a classification model for cardiovascular disease (CVD) from a Canadian dataset for the first time. The research data source was Canadian Heart Health Database, which contains 265 easily collected variables and 23,129 instances from ten Canadian provinces. Many practical issues involving in different steps of the integrated process were addressed, and possible solutions were suggested based on the experimental results. Five specific learning schemes representing five distinct KDD approaches were employed, as they were never compared with one another. In addition, two improving approaches including cost-sensitive learning and ensemble learning were also examined. The performance of developed models was
measured in many aspects. The data set was prepared through data cleaning and missing value imputation. Three pairs of experiments demonstrated that the dataset balancing and outlier removal exerted positive influence to the classifier, but the variable normalization was not helpful. Three combinations of subset generation method and evaluation function were tested in variable
subset selection phase, and the combination of Best-First search and Correlation-based
Feature Selection showed comparable goodness and was maintained for other benefits.
Among the five learning schemes investigated, C4.5 decision tree achieved the best
performance on the classification of CVD, followed by Multilayer Feed-forward Network, KNearest Neighbor, Logistic Regression, and Naïve Bayes. Cost-sensitive learning exemplified by the MetaCost algorithm failed to outperform the single C4.5 decision tree when varying the cost matrix from 5:1 to 1:7. In contrast, the models developed from ensemble modeling, especially AdaBoost M1 algorithm, outperformed other models.
Although the model with the best performance might be suitable for CVD screening in
general Canadian population, it is not ready to use in practice. I propose some criteria to improve the further evaluation of the model. Finally, I describe some of the limitations of the study and propose potential solutions to address such limitations through out the KDD process. Such possibilities should be explored in further research.
|
20 |
Expanding Data Mining Theory for Industrial ApplicationsJanuary 2012 (has links)
abstract: The field of Data Mining is widely recognized and accepted for its applications in many business problems to guide decision-making processes based on data. However, in recent times, the scope of these problems has swollen and the methods are under scrutiny for applicability and relevance to real-world circumstances. At the crossroads of innovation and standards, it is important to examine and understand whether the current theoretical methods for industrial applications (which include KDD, SEMMA and CRISP-DM) encompass all possible scenarios that could arise in practical situations. Do the methods require changes or enhancements? As part of the thesis I study the current methods and delineate the ideas of these methods and illuminate their shortcomings which posed challenges during practical implementation. Based on the experiments conducted and the research carried out, I propose an approach which illustrates the business problems with higher accuracy and provides a broader view of the process. It is then applied to different case studies highlighting the different aspects to this approach. / Dissertation/Thesis / M.S. Computer Science 2012
|
Page generated in 0.0383 seconds