• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 38
  • 16
  • 12
  • 5
  • 4
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 83
  • 41
  • 38
  • 26
  • 22
  • 21
  • 15
  • 15
  • 15
  • 11
  • 11
  • 10
  • 10
  • 10
  • 10
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Mapeamento semiautomático por meio de padrão espectro-temporal de áreas agrícolas e alvos permanentes com evi/modis no Paraná / Semiautomatic mapping of agricultural areas and targets permanent by profile spectrum-temporary of evi / modis in Parana

Verica, Weverton Rodrigo 16 February 2018 (has links)
Submitted by Neusa Fagundes (neusa.fagundes@unioeste.br) on 2018-09-06T19:38:50Z No. of bitstreams: 2 Weverton_Verica2018.pdf: 4544186 bytes, checksum: 766200b4dea97433d3d88b08cbe3e548 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2018-09-06T19:38:50Z (GMT). No. of bitstreams: 2 Weverton_Verica2018.pdf: 4544186 bytes, checksum: 766200b4dea97433d3d88b08cbe3e548 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2018-02-16 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / Knowledge of location and quantity of areas for agriculture or either native or planted forests is relevant for public managers to make their decisions based on reliable data. In addition, part of ICMS revenues from the Municipal Participation Fund (FPM) depends on agricultural production data, number of rural properties and the environmental factor. The objective of this research was to design an objective and semiautomatic methodology to map agricultural areas and targets permanent, and later to identify areas of soybean, corn 1st and 2nd crops, winter crops, semi-perennial agriculture, forests and other permanent targets in the state of Paraná for the harvest years (2013/14 to 2016/17), using temporal series of EVI/Modis vegetation indexes. The proposed methodology follows the steps of the Knowledge Discovery Process in Database – KDD, in which the classification task was performed by the Random Forest algorithm. For the validation of the mappings, samples extracted from Landsat-8 images were used, obtaining the global accuracy indices greater than 84.37% and a kappa index ranging from 0.63 to 0.98, hence considered mappings with good or excellent spatial accuracy. The municipal data of the area of soybean, corn 1st crop, corn 2nd crop and winter crops mapped were confronted with the official statistics obtaining coefficients of linear correlation between 0.61 to 0.9, indicating moderate or strong correlation with the data officials. In this way, the proposed semi-automatic methodology was successful in the mapping, as well as the automation of the process of elaboration of the metrics, thus generating a script in the software R in order to facilitate future mappings with low processing time. / O conhecimento da localização e da quantidade de áreas destinadas a agricultura ou a florestas nativas ou plantadas é relevante para que os gestores públicos tomem suas decisões pautadas em dados fidedignos com a realidade. Além disto, parte das receitas de ICMS advindas do Fundo de Participação aos Municípios (FPM) depende de dados de produção agropecuária, número de propriedades rurais e fator ambiental. Diante disso, esta dissertação teve como objetivo elaborar uma metodologia objetiva e semiautomática para mapear áreas agrícolas e alvos permanente e posteriormente identificar áreas de soja, milho 1ª e 2ª safras, culturas de inverno, agricultura semi-perene, florestas e demais alvos permanentes no estado do Paraná para os anos-safra (2013/14 a 2016/17), utilizando séries temporais de índices de vegetação EVI/Modis. A metodologia proposta segue os passos do Processo de descoberta de conhecimento em base de dados – KDD, sendo que para isso foram elaboradas métricas extraídas do perfil espectro temporal de cada pixel e foi empregada a tarefa de classificação, realizada pelo algoritmo Random Forest. Para a validação dos mapeamentos utilizaram-se amostras extraídas de imagens Landsat-8, obtendo-se os índices de exatidão global maior que 84,37% e um índice kappa variando entre 0,63 e 0,98, sendo, portanto, considerados mapeamentos com boa ou excelente acurácia espacial. Os dados municipais da área de soja, milho 1ª safra, milho 2ª safra e culturas de inverno mapeada foram confrontados com as estatísticas oficiais obtendo-se coeficientes de correlação linear entre 0,61 a 0,9, indicando moderada ou forte correlação com os dados oficiais. Desse modo, a metodologia semiautomática proposta obteve êxito na realização do mapeamento, bem como a automatização do processo de elaboração das métricas, gerando, com isso um script no software R de maneira a facilitar mapeamentos futuros com baixo tempo de processamento.
12

Data mining em banco de dados de eletrocardiograma / Data mining in electrocardiogram databases

José Alves Ferreira 23 April 2014 (has links)
Neste estudo, foi proposta a exploração de um banco de dados, com informações de exames de eletrocardiogramas (ECG), utilizado pelo sistema denominado Tele-ECG do Instituto Dante Pazzanese de Cardiologia, aplicando a técnica de data mining (mineração de dados) para encontrar padrões que colaborem, no futuro, para a aquisição de conhecimento na análise de eletrocardiograma. A metodologia proposta permite que, com a utilização de data mining, investiguem-se dados à procura de padrões sem a utilização do traçado do ECG. Três pacotes de software (Weka, Orange e R-Project) do tipo open source foram utilizados, contendo, cada um deles, um conjunto de implementações algorítmicas e de diversas técnicas de data mining, além de serem softwares de domínio público. Regras conhecidas foram encontradas (confirmadas pelo especialista médico em análise de eletrocardiograma), evidenciando a validade dessa metodologia. / In this study, the exploration of electrocardiograms (ECG) databases, obtained from a Tele-ECG System of Dante Pazzanese Institute of Cardiology, has been proposed, applying the technique of data mining to find patterns that could collaborate, in the future, for the acquisition of knowledge in the analysis of electrocardiograms. The proposed method was to investigate the data looking for patterns without the use of the ECG traces. Three Data-mining open source software packages (Weka, Orange and R - Project) were used, containing, each one, a set of algorithmic implementations and various data mining techniques, as well as being a public domain software. Known rules were found (confirmed by medical experts in electrocardiogram analysis), showing the validity of the methodology.
13

Konzeption eines Auswahlverfahrens zur Datenanalyse im Einzelhandel am Beispiel einer Einkaufsverhaltensanalyse im Lebensmitteleinzelhandel

Lohaus, Daniela 12 March 2012 (has links)
Das veränderte Einkaufsverhalten von Einzelhandelskunden führt zu notwendigen Anpassungen von Knowledge-Discovery-in-Databases-(KDD)-Projekten. Aufgrund der mangelnden Ausrichtung von Theorie und Praxis auf die aktuellen Entwicklungen im Einzelhandel soll die Untersuchung dazu beitragen, Methoden zur Einkaufsverhaltensanalyse zu identifizieren, welche die effiziente und effektive Durchführung des KDD-Projekts gewährleisten. Dazu werden Methoden eingegrenzt und theoriegeleitet Parameter zur kontextspezifischen Methodenauswahl identifiziert. Anschließend sollen die Parameter in ein Auswahlverfahren einfließen welches empirisch evaluiert wird.
14

Tools and techniques for knowledge discovery

Howard, Craig M. January 2001 (has links)
No description available.
15

Desarrollo y evaluación de metodologías para la aplicación de regresiones logísticas en modelos de comportamiento bajo supuesto de independencia

Biron Lattes, Miguel Ignacio January 2012 (has links)
Ingeniero Civil Industrial / El presente documento tiene por objetivo desarrollar y evaluar una metodología de construcción de regresiones logísticas para scorings de comportamiento, que se haga cargo del supuesto de independencia de las observaciones inherente al método de estimación de máxima verosimilitud. Las regresiones logísticas, debido a su facilidad de interpretación y a sus buen desempeño, son ampliamente utilizadas para la estimación de modelos de probabilidad de incumplimiento en la industria financiera, los que a su vez sirven múltiples objetivos: desde la originación de créditos, pasando por la provisión de deuda, hasta la pre aprobación de créditos y cupos de líneas y tarjetas. Es por esta amplia utilización que se considera necesario estudiar si el no cumplimiento de supuestos teóricos de construcción puede afectar la calidad de los scorings creados. Se generaron cuatro mecanismos de selección de datos que aseguran la independencia de observaciones para ser comparados contra el método que utiliza todas las observaciones de los clientes (algoritmo base), los que posteriormente fueron implementados en una base de datos de una cartera de consumo de una institución financiera, en el marco de la metodología KDD de minería de datos. Los resultados muestran que los modelos implementados tienen un buen poder de discriminación, llegando a superar el 74% de KS en la base de validación. Sin embargo, ninguno de los métodos propuestos logra superar el desempeño del algoritmo base, lo que posiblemente se debe a que los métodos de selección de datos reducen la disponibilidad de observaciones para el entrenamiento, lo que a su vez disminuye la posibilidad de poder construir modelos más complejos (mayor cantidad de variables) que finalmente entreguen un mejor desempeño.
16

[en] INTELLIGENT ASSISTANCE FOR KDD-PROCESS ORIENTATION / [pt] ASSISTÊNCIA INTELIGENTE À ORIENTAÇÃO DO PROCESSO DE DESCOBERTA DE CONHECIMENTO EM BASES DE DADOS

RONALDO RIBEIRO GOLDSCHMIDT 15 December 2003 (has links)
[pt] A notória complexidade inerente ao processo de KDD - Descoberta de Conhecimento em Bases de Dados - decorre essencialmente de aspectos relacionados ao controle e à condução deste processo (Fayyad et al., 1996b; Hellerstein et al., 1999). De uma maneira geral, estes aspectos envolvem dificuldades em perceber inúmeros fatos cuja origem e os níveis de detalhe são os mais diversos e difusos, em interpretar adequadamente estes fatos, em conjugar dinamicamente tais interpretações e em decidir que ações devem ser realizadas de forma a procurar obter bons resultados. Como identificar precisamente os objetivos do processo, como escolher dentre os inúmeros algoritmos de mineração e de pré-processamento de dados existentes e, sobretudo, como utilizar adequadamente os algoritmos escolhidos em cada situação são alguns exemplos das complexas e recorrentes questões na condução de processos de KDD. Cabe ao analista humano a árdua tarefa de orientar a execução de processos de KDD. Para tanto, diante de cada cenário, o homem utiliza sua experiência anterior, seus conhecimentos e sua intuição para interpretar e combinar os fatos de forma a decidir qual a estratégia a ser adotada (Fayyad et al., 1996a, b; Wirth et al., 1998). Embora reconhecidamente úteis e desejáveis, são poucas as alternativas computacionais existentes voltadas a auxiliar o homem na condução do processo de KDD (Engels, 1996; Amant e Cohen, 1997; Livingston, 2001; Bernstein et al., 2002; Brazdil et al., 2003). Aliado ao exposto acima, a demanda por aplicações de KDD em diversas áreas vem crescendo de forma muito acentuada nos últimos anos (Buchanan, 2000). É muito comum não existirem profissionais com experiência em KDD disponíveis para atender a esta crescente demanda (Piatetsky-Shapiro, 1999). Neste contexto, a criação de ferramentas inteligentes que auxiliem o homem no controle do processo de KDD se mostra ainda mais oportuna (Brachman e Anand, 1996; Mitchell, 1997). Assim sendo, esta tese teve como objetivos pesquisar, propor, desenvolver e avaliar uma Máquina de Assistência Inteligente à Orientação do Processo de KDD que possa ser utilizada, fundamentalmente, como instrumento didático voltado à formação de profissionais especializados na área da Descoberta de Conhecimento em Bases de Dados. A máquina proposta foi formalizada com base na Teoria do Planejamento para Resolução de Problemas (Russell e Norvig, 1995) da Inteligência Artificial e implementada a partir da integração de funções de assistência utilizadas em diferentes níveis de controle do processo de KDD: Definição de Objetivos, Planejamento de Ações de KDD, Execução dos Planos de Ações de KDD e Aquisição e Formalização do Conhecimento. A Assistência à Definição de Objetivos tem como meta auxiliar o homem na identificação de tarefas de KDD cuja execução seja potencialmente viável em aplicações de KDD. Esta assistência foi inspirada na percepção de um certo tipo de semelhança no nível intensional apresentado entre determinados bancos de dados. Tal percepção auxilia na prospecção do tipo de conhecimento a ser procurado, uma vez que conjuntos de dados com estruturas similares tendem a despertar interesses similares mesmo em aplicações de KDD distintas. Conceitos da Teoria da Equivalência entre Atributos de Bancos de Dados (Larson et al., 1989) viabilizam a utilização de uma estrutura comum na qual qualquer base de dados pode ser representada. Desta forma, bases de dados, ao serem representadas na nova estrutura, podem ser mapeadas em tarefas de KDD, compatíveis com tal estrutura. Conceitos de Espaços Topológicos (Lipschutz, 1979) e recursos de Redes Neurais Artificiais (Haykin, 1999) são utilizados para viabilizar os mapeamentos entre padrões heterogêneos. Uma vez definidos os objetivos em uma aplicação de KDD, decisões sobre como tais objetivos podem ser alcançados se tornam necessárias. O primeiro passo envolve a escolha de qual algoritmo de mineração de dados é o mais apropriado para o problema em questão. A Assistência ao Planejamento de Ações de KDD auxilia o homem nesta escolha. Utiliza, para tanto, uma metodologia de ordenação dos algoritmos de mineração baseada no desempenho prévio destes algoritmos em problemas similares (Soares et al., 2001; Brazdil et al., 2003). Critérios de ordenação de algoritmos baseados em similaridade entre bases de dados nos níveis intensional e extensional foram propostos, descritos e avaliados. A partir da escolha de um ou mais algoritmos de mineração de dados, o passo seguinte requer a escolha de como deverá ser realizado o pré-processamento dos dados. Devido à diversidade de algoritmos de pré-processamento, são muitas as alternativas de combinação entre eles (Bernstein et al., 2002). A Assistência ao Planejamento de Ações de KDD também auxilia o homem na formulação e na escolha do plano ou dos planos de ações de KDD a serem adotados. Utiliza, para tanto, conceitos da Teoria do Planejamento para Resolução de Problemas. Uma vez escolhido um plano de ações de KDD, surge a necessidade de executá-lo. A execução de um plano de ações de KDD compreende a execução, de forma ordenada, dos algoritmos de KDD previstos no plano. A execução de um algoritmo de KDD requer conhecimento sobre ele. A Assistência à Execução dos Planos de Ações de KDD provê orientações específicas sobre algoritmos de KDD. Adicionalmente, esta assistência dispõe de mecanismos que auxiliam, de forma especializada, no processo de execução de algoritmos de KDD e na análise dos resultados obtidos. Alguns destes mecanismos foram descritos e avaliados. A execução da Assistência à Aquisição e Formalização do Conhecimento constitui-se em um requisito operacional ao funcionamento da máquina proposta. Tal assistência tem por objetivo adquirir e disponibilizar os conhecimentos sobre KDD em uma representação e uma organização que viabilizem o processamento das funções de assistência mencionadas anteriormente. Diversos recursos e técnicas de aquisição de conhecimento foram utilizados na concepção desta assistência. / [en] Generally speaking, such aspects involve difficulties in perceiving innumerable facts whose origin and levels of detail are highly diverse and diffused, in adequately interpreting these facts, in dynamically conjugating such interpretations, and in deciding which actions must be performed in order to obtain good results. How are the objectives of the process to be identified in a precise manner? How is one among the countless existing data mining and preprocessing algorithms to be selected? And most importantly, how can the selected algorithms be put to suitable use in each different situation? These are but a few examples of the complex and recurrent questions that are posed when KDD processes are performed. Human analysts must cope with the arduous task of orienting the execution of KDD processes. To this end, in face of each different scenario, humans resort to their previous experiences, their knowledge, and their intuition in order to interpret and combine the facts and therefore be able to decide on the strategy to be adopted (Fayyad et al., 1996a, b; Wirth et al., 1998). Although the existing computational alternatives have proved to be useful and desirable, few of them are designed to help humans to perform KDD processes (Engels, 1996; Amant and Cohen, 1997; Livingston, 2001; Bernstein et al., 2002; Brazdil et al., 2003). In association with the above-mentioned fact, the demand for KDD applications in several different areas has increased dramatically in the past few years (Buchanan, 2000). Quite commonly, the number of available practitioners with experience in KDD is not sufficient to satisfy this growing demand (Piatetsky-Shapiro, 1999). Within such a context, the creation of intelligent tools that aim to assist humans in controlling KDD processes proves to be even more opportune (Brachman and Anand, 1996; Mitchell, 1997). Such being the case, the objectives of this thesis were to investigate, propose, develop, and evaluate an Intelligent Machine for KDD-Process Orientation that is basically intended to serve as a teaching tool to be used in professional specialization courses in the area of Knowledge Discovery in Databases. The basis for formalization of the proposed machine was the Planning Theory for Problem-Solving (Russell and Norvig, 1995) in Artificial Intelligence. Its implementation was based on the integration of assistance functions that are used at different KDD process control levels: Goal Definition, KDD Action-Planning, KDD Action Plan Execution, and Knowledge Acquisition and Formalization. The Goal Definition Assistant aims to assist humans in identifying KDD tasks that are potentially executable in KDD applications. This assistant was inspired by the detection of a certain type of similarity between the intensional levels presented by certain databases. The observation of this fact helps humans to mine the type of knowledge that must be discovered since data sets with similar structures tend to arouse similar interests even in distinct KDD applications. Concepts from the Theory of Attribute Equivalence in Databases (Larson et al., 1989) make it possible to use a common structure in which any database may be represented. In this manner, when databases are represented in the new structure, it is possible to map them into KDD tasks that are compatible with such a structure. Topological space concepts and ANN resources as described in Topological Spaces (Lipschutz, 1979) and Artificial Neural Nets (Haykin, 1999) have been employed so as to allow mapping between heterogeneous patterns. After the goals have been defined in a KDD application, it is necessary to decide how such goals are to be achieved. The first step involves selecting the most appropriate data mining algorithm for the problem at hand. The KDD Action-Planning Assistant helps humans to make this choice. To this end, it makes use of a methodology for ordering the mining algorithms that is based on the previous experiences, their knowledge, and their intuition in order to interpret and combine the facts and therefore be able to decide on the strategy to be adopted (Fayyad et al., 1996a, b; Wirth et al., 1998). Although the existing computational alternatives have proved to be useful and desirable, few of them are designed to help humans to perform KDD processes (Engels, 1996; Amant & Cohen, 1997; Livingston, 2001; Bernstein et al., 2002; Brazdil et al., 2003). In association with the above-mentioned fact, the demand for KDD applications in several different areas has increased dramatically in the past few years (Buchanan, 2000). Quite commonly, the number of available practitioners with experience in KDD is not sufficient to satisfy this growing demand (Piatetsky-Shapiro, 1999). Within such a context, the creation of intelligent tools that aim to assist humans in controlling KDD processes proves to be even more opportune (Brachman & Anand, 1996; Mitchell, 1997). Such being the case, the objectives of this thesis were to investigate, propose, develop, and evaluate an Intelligent Machine for KDD-Process Orientation that is basically intended to serve as a teaching tool to be used in professional specialization courses in the area of Knowledge Discovery in Databases. The basis for formalization of the proposed machine was the Planning Theory for Problem-Solving (Russell and Norvig, 1995) in Artificial Intelligence. Its implementation was based on the integration of assistance functions that are used at different KDD process control levels: Goal Definition, KDD Action- Planning, KDD Action Plan Execution, and Knowledge Acquisition and Formalization. The Goal Definition Assistant aims to assist humans in identifying KDD tasks that are potentially executable in KDD applications. This assistant was inspired by the detection of a certain type of similarity between the intensional levels presented by certain databases. The observation of this fact helps humans to mine the type of knowledge that must be discovered since data sets with similar structures tend to arouse similar interests even in distinct KDD applications. Concepts from the Theory of Attribute Equivalence in Databases (Larson et al., 1989) make it possible to use a common structure in which any database may be represented. In this manner, when databases are represented in the new structure, it is possible to map them into KDD tasks that are compatible with such a structure. Topological space concepts and ANN resources as described in Topological Spaces (Lipschutz, 1979) and Artificial Neural Nets (Haykin, 1999) have been employed so as to allow mapping between heterogeneous patterns. After the goals have been defined in a KDD application, it is necessary to decide how such goals are to be achieved. The first step involves selecting the most appropriate data mining algorithm for the problem at hand. The KDD Action-Planning Assistant helps humans to make this choice. To this end, it makes use of a methodology for ordering the mining algorithms that is based on the previous performance of these algorithms in similar problems (Soares et al., 2001; Brazdil et al., 2003). Algorithm ordering criteria based on database similarity at the intensional and extensional levels were proposed, described and evaluated. The data mining algorithm or algorithms having been selected, the next step involves selecting the way in which data preprocessing is to be performed. Since there is a large variety of preprocessing algorithms, many are the alternatives for combining them (Bernstein et al., 2002). The KDD Action-Planning Assistant also helps humans to formulate and to select the KDD action plan or plans to be adopted. To this end, it makes use of concepts contained in the Planning Theory for Problem-Solving. Once a KDD action plan has been chosen, it is necessary to execute it. Executing a KDD action plan involves the ordered execution of the KDD algorithms that have been anticipated in the plan. Executing a KDD algorithm requires knowledge about it. The KDD Action Plan Execution Assistant provides specific guidance on KDD algorithms. In addition, this assistant is equipped with mechanisms that provide specialized assistance for performing the KDD algorithm execution process and for analyzing the results obtained. Some of these mechanisms have been described and evaluated. The execution of the Knowledge Acquisition and Formalization Assistant is an operational requirement for running the proposed machine. The objective of this assistant is to acquire knowledge about KDD and to make such knowledge available by representing and organizing it a way that makes it possible to process the above-mentioned assistance functions. A variety of knowledge acquisition resources and techniques were employed in the conception of this assistant.
17

Using knowledge discovery to identify potentially useful patterns of health promotion behavior of 10-12 year old Icelandic children

Orlygsdottir, Brynja 01 January 2008 (has links)
Icelandic children can expect to live a long and healthy life and have the right to the highest possible standard of health. Despite this, as in other Western countries, the prevalence of psychosocial complaints and long term conditions in Icelandic children is growing and they are struggling with increased levels of preventable health conditions. The purposes of this cross sectional, secondary analysis were to perform a psychometric evaluation on the instrument School-Children Health Promotion; to describe self-reported health promotion behavior of 10-12 year old Icelandic school children, and to predict novel and potentially useful patterns of health promotion behavior of 10-12 year old Icelandic school children using data mining methods. Existing data from 480 10-12 year old Icelandic school children and 911 parents were analyzed. Analysis of the instrument School-Children Health Promotion indicates that it is, in general, a valid and reliable instrument for measuring health promotion behavior of 10-12 year old Icelandic children. Five factors emerged from the 21 item instrument, which were labeled: "Positive Thinking." "Diet and Sleep Pattern," "Seek Psycho-social Support," "Coping Behavior," and "Health Habits." The results indicated that girls use more positive health promotion behavior than boys; however, differences in health promotion behavior between 5th and 6th grade students were not obvious. The results of data mining analyses, using the classifiers decision tree (J48) and logistic regression (Logistic) to predict health promotion behavior, showed better performance with the subsets of the five factors and the overall instrument than with the full dataset of 199 items. For the subsets, the logistic regression models performed better than the decision trees with AUC ranging from 0.71 to 0.80. The strongest predictors of health promotion behaviors were validation and caring in friendship, intimate disclosure between friends, and quality of life. Results of this secondary analysis indicate that friendship is of vital importance with regards to health promotion behavior. Therefore, further studies on the effect friendship has on health promotion behavior of Icelandic children in the 10-12 year old age group are clearly needed.
18

Investigating the Process of Developing a KDD Model for the Classification of Cases with Cardiovascular Disease Based on a Canadian Database

Liu, Chenyu January 2012 (has links)
Medicine and health domains are information intensive fields as data volume has been increasing constantly from them. In order to make full use of the data, the technique of Knowledge Discovery in Databases (KDD) has been developed as a comprehensive pathway to discover valid and unsuspected patterns and trends that are both understandable and useful to data analysts. The present study aimed to investigate the entire KDD process of developing a classification model for cardiovascular disease (CVD) from a Canadian dataset for the first time. The research data source was Canadian Heart Health Database, which contains 265 easily collected variables and 23,129 instances from ten Canadian provinces. Many practical issues involving in different steps of the integrated process were addressed, and possible solutions were suggested based on the experimental results. Five specific learning schemes representing five distinct KDD approaches were employed, as they were never compared with one another. In addition, two improving approaches including cost-sensitive learning and ensemble learning were also examined. The performance of developed models was measured in many aspects. The data set was prepared through data cleaning and missing value imputation. Three pairs of experiments demonstrated that the dataset balancing and outlier removal exerted positive influence to the classifier, but the variable normalization was not helpful. Three combinations of subset generation method and evaluation function were tested in variable subset selection phase, and the combination of Best-First search and Correlation-based Feature Selection showed comparable goodness and was maintained for other benefits. Among the five learning schemes investigated, C4.5 decision tree achieved the best performance on the classification of CVD, followed by Multilayer Feed-forward Network, KNearest Neighbor, Logistic Regression, and Naïve Bayes. Cost-sensitive learning exemplified by the MetaCost algorithm failed to outperform the single C4.5 decision tree when varying the cost matrix from 5:1 to 1:7. In contrast, the models developed from ensemble modeling, especially AdaBoost M1 algorithm, outperformed other models. Although the model with the best performance might be suitable for CVD screening in general Canadian population, it is not ready to use in practice. I propose some criteria to improve the further evaluation of the model. Finally, I describe some of the limitations of the study and propose potential solutions to address such limitations through out the KDD process. Such possibilities should be explored in further research.
19

Investigating the Process of Developing a KDD Model for the Classification of Cases with Cardiovascular Disease Based on a Canadian Database

Liu, Chenyu January 2012 (has links)
Medicine and health domains are information intensive fields as data volume has been increasing constantly from them. In order to make full use of the data, the technique of Knowledge Discovery in Databases (KDD) has been developed as a comprehensive pathway to discover valid and unsuspected patterns and trends that are both understandable and useful to data analysts. The present study aimed to investigate the entire KDD process of developing a classification model for cardiovascular disease (CVD) from a Canadian dataset for the first time. The research data source was Canadian Heart Health Database, which contains 265 easily collected variables and 23,129 instances from ten Canadian provinces. Many practical issues involving in different steps of the integrated process were addressed, and possible solutions were suggested based on the experimental results. Five specific learning schemes representing five distinct KDD approaches were employed, as they were never compared with one another. In addition, two improving approaches including cost-sensitive learning and ensemble learning were also examined. The performance of developed models was measured in many aspects. The data set was prepared through data cleaning and missing value imputation. Three pairs of experiments demonstrated that the dataset balancing and outlier removal exerted positive influence to the classifier, but the variable normalization was not helpful. Three combinations of subset generation method and evaluation function were tested in variable subset selection phase, and the combination of Best-First search and Correlation-based Feature Selection showed comparable goodness and was maintained for other benefits. Among the five learning schemes investigated, C4.5 decision tree achieved the best performance on the classification of CVD, followed by Multilayer Feed-forward Network, KNearest Neighbor, Logistic Regression, and Naïve Bayes. Cost-sensitive learning exemplified by the MetaCost algorithm failed to outperform the single C4.5 decision tree when varying the cost matrix from 5:1 to 1:7. In contrast, the models developed from ensemble modeling, especially AdaBoost M1 algorithm, outperformed other models. Although the model with the best performance might be suitable for CVD screening in general Canadian population, it is not ready to use in practice. I propose some criteria to improve the further evaluation of the model. Finally, I describe some of the limitations of the study and propose potential solutions to address such limitations through out the KDD process. Such possibilities should be explored in further research.
20

Expanding Data Mining Theory for Industrial Applications

January 2012 (has links)
abstract: The field of Data Mining is widely recognized and accepted for its applications in many business problems to guide decision-making processes based on data. However, in recent times, the scope of these problems has swollen and the methods are under scrutiny for applicability and relevance to real-world circumstances. At the crossroads of innovation and standards, it is important to examine and understand whether the current theoretical methods for industrial applications (which include KDD, SEMMA and CRISP-DM) encompass all possible scenarios that could arise in practical situations. Do the methods require changes or enhancements? As part of the thesis I study the current methods and delineate the ideas of these methods and illuminate their shortcomings which posed challenges during practical implementation. Based on the experiments conducted and the research carried out, I propose an approach which illustrates the business problems with higher accuracy and provides a broader view of the process. It is then applied to different case studies highlighting the different aspects to this approach. / Dissertation/Thesis / M.S. Computer Science 2012

Page generated in 0.0473 seconds