Global ETD Search

1	Investigating the Process of Developing a KDD Model for the Classification of Cases with Cardiovascular Disease Based on a Canadian Database Liu, Chenyu January 2012 (has links) Medicine and health domains are information intensive fields as data volume has been increasing constantly from them. In order to make full use of the data, the technique of Knowledge Discovery in Databases (KDD) has been developed as a comprehensive pathway to discover valid and unsuspected patterns and trends that are both understandable and useful to data analysts. The present study aimed to investigate the entire KDD process of developing a classification model for cardiovascular disease (CVD) from a Canadian dataset for the first time. The research data source was Canadian Heart Health Database, which contains 265 easily collected variables and 23,129 instances from ten Canadian provinces. Many practical issues involving in different steps of the integrated process were addressed, and possible solutions were suggested based on the experimental results. Five specific learning schemes representing five distinct KDD approaches were employed, as they were never compared with one another. In addition, two improving approaches including cost-sensitive learning and ensemble learning were also examined. The performance of developed models was measured in many aspects. The data set was prepared through data cleaning and missing value imputation. Three pairs of experiments demonstrated that the dataset balancing and outlier removal exerted positive influence to the classifier, but the variable normalization was not helpful. Three combinations of subset generation method and evaluation function were tested in variable subset selection phase, and the combination of Best-First search and Correlation-based Feature Selection showed comparable goodness and was maintained for other benefits. Among the five learning schemes investigated, C4.5 decision tree achieved the best performance on the classification of CVD, followed by Multilayer Feed-forward Network, KNearest Neighbor, Logistic Regression, and Naïve Bayes. Cost-sensitive learning exemplified by the MetaCost algorithm failed to outperform the single C4.5 decision tree when varying the cost matrix from 5:1 to 1:7. In contrast, the models developed from ensemble modeling, especially AdaBoost M1 algorithm, outperformed other models. Although the model with the best performance might be suitable for CVD screening in general Canadian population, it is not ready to use in practice. I propose some criteria to improve the further evaluation of the model. Finally, I describe some of the limitations of the study and propose potential solutions to address such limitations through out the KDD process. Such possibilities should be explored in further research. classification learning KDD process CVD Canadian database Health Studies and Gerontology
2	Investigating the Process of Developing a KDD Model for the Classification of Cases with Cardiovascular Disease Based on a Canadian Database Liu, Chenyu January 2012 (has links) Medicine and health domains are information intensive fields as data volume has been increasing constantly from them. In order to make full use of the data, the technique of Knowledge Discovery in Databases (KDD) has been developed as a comprehensive pathway to discover valid and unsuspected patterns and trends that are both understandable and useful to data analysts. The present study aimed to investigate the entire KDD process of developing a classification model for cardiovascular disease (CVD) from a Canadian dataset for the first time. The research data source was Canadian Heart Health Database, which contains 265 easily collected variables and 23,129 instances from ten Canadian provinces. Many practical issues involving in different steps of the integrated process were addressed, and possible solutions were suggested based on the experimental results. Five specific learning schemes representing five distinct KDD approaches were employed, as they were never compared with one another. In addition, two improving approaches including cost-sensitive learning and ensemble learning were also examined. The performance of developed models was measured in many aspects. The data set was prepared through data cleaning and missing value imputation. Three pairs of experiments demonstrated that the dataset balancing and outlier removal exerted positive influence to the classifier, but the variable normalization was not helpful. Three combinations of subset generation method and evaluation function were tested in variable subset selection phase, and the combination of Best-First search and Correlation-based Feature Selection showed comparable goodness and was maintained for other benefits. Among the five learning schemes investigated, C4.5 decision tree achieved the best performance on the classification of CVD, followed by Multilayer Feed-forward Network, KNearest Neighbor, Logistic Regression, and Naïve Bayes. Cost-sensitive learning exemplified by the MetaCost algorithm failed to outperform the single C4.5 decision tree when varying the cost matrix from 5:1 to 1:7. In contrast, the models developed from ensemble modeling, especially AdaBoost M1 algorithm, outperformed other models. Although the model with the best performance might be suitable for CVD screening in general Canadian population, it is not ready to use in practice. I propose some criteria to improve the further evaluation of the model. Finally, I describe some of the limitations of the study and propose potential solutions to address such limitations through out the KDD process. Such possibilities should be explored in further research. classification learning KDD process CVD Canadian database Health Studies and Gerontology
3	SISTEMA INTEGRADO DE MONITORAMENTO E CONTROLE DA QUALIDADE DE COMBUSTÍVEL / INTEGRATED SYSTEMS OF TRACKING AND QUALITY CONTROL OF FUEL Marques, Delano Brandes 27 February 2004 (has links) Made available in DSpace on 2016-08-17T14:52:51Z (GMT). No. of bitstreams: 1 Delano Brandes Marques.pdf: 3918036 bytes, checksum: 599a5c86f30b5b6799c9afd54e7b5de7 (MD5) Previous issue date: 2004-02-27 / This work aims the implantation of an Integrated System that, besides allowing a better, more efficient and more practical monitoring, makes possible the control and optimization of problems related to the oil industry. In order to guarantee fuel s quality and normalization, the development of efficient tools that allow it s monitoring of any point (anywhere) and for any type of fuel is indispensable. Considering the variety of criteria, a decision making should be based on the evaluation of the most varied types of space data and not space data. In this sense, Knowledge Discovery in Databases process is used, where the Data Warehouse and Data Mining steps allied to a Geographic Information System are emphasized. This system presents as objective including several fuel monitoring regions. From different information obtained in the ANP databases, an analysis was carried out and a Data Warehouse model proposed. In the sequel, Data Mining techniques (Principal Component Analysis, Clustering Analysis and Multiple Regression) were applied to the results in order to obtain knowledge (patterns). / O presente trabalho apresenta estudos que visam a implantação de um Sistema Integrado que, além de permitir um melhor monitoramento, praticidade e eficiência, possibilite o controle e otimização de problemas relacionados à indústria de petróleo. Para garantir qualidade e normalização do combustível, é indispensável o desenvolvimento de ferramentas eficientes que permitam o seu monitoramento de qualquer ponto e para qualquer tipo de combustível. Considerando a variedade dos critérios, uma tomada de decisão deve ser baseada na avaliação dos mais variados tipos de dados espaciais e não espaciais. Para isto, é utilizado o Processo de Descoberta de Conhecimento, onde são enfatizadas as etapas de Data Warehouse e Data Mining aliadas ao conceito de um Sistema de Informação Geográfica. O sistema tem por objetivo abranger várias regiões de monitoramento de combustíveis. A partir do levantamento e análise das diferentes informações usadas nos bancos de dados da ANP foi proposto um modelo de data warehouse. Na seqüência foram aplicadas técnicas de mineração de dados (Análise de Componentes Principais, Análise de Agrupamento e Regressão) visando à obtenção de conhecimento (padrões). Análise de Combustíveis Processo KDD Mineração de Dados Data Warehouse Sistema de Informação Geográfica Fuel Analysis KDD process Data Warehouse Data Mining Geographic Information Systems
4	Influence of retraint systems during an automobile crash : prediction of injuries for frontal impact sled tests based on biomechanical data mining / Infkuence des systèmes de retenue lors d'un accident automobile : Prédiction des blessures de l'occupant lors d'essais catapultés frontaux basées sur le data mining Cridelich, Carine caroline 17 December 2015 (has links) La sécurité automobile est l’une des principales considérations lors de l’achat d’un véhicule. Avant d’ être commercialisée, une voiture doit répondre aux normes de sécurité du pays, ce qui conduit au développement de systèmes de retenue tels que les airbags et ceintures de sécurité. De plus, des ratings comme EURO NCAP et US NCAP permettent d’évaluer de manière indépendante la sécurité de la voiture. Des essais catapultes sont entre autres effectués pour confirmer le niveau de protection du véhicule et les résultats sont généralement basés sur des valeurs de référence des dommages corporels dérivés de paramètres physiques mesurés dans les mannequins.Cette thèse doctorale présente une approche pour le traitement des données d’entrée (c’est-à-dire des paramètres des systèmes de retenue définis par des experts) suivie d’une classification des essais catapultes frontaux selon ces mêmes paramètres. L’étude est uniquement basée sur les données du passager, les données collectées pour le conducteur n’ étant pas assez complètes pour produire des résultats satisfaisants. L’objectif principal est de créer un modèle qui définit l’influence des paramètres d’entrées sur la sévérité des dommages et qui aide les ingénieurs à avoir un ordre de grandeur des résultats des essais catapultes selon la législation ou le rating choisi. Les valeurs biomécaniques du mannequin (outputs du modèle) ont été regroupées en clusters dans le but de définir des niveaux de dommages corporels. Le modèle ainsi que les différents algorithmes ont été implémentés dans un programme pour une meilleur utilisation quotidienne. / Safety is one of the most important considerations when buying a new car. The car has to achievecrash tests defined by the legislation before being selling in a country, what drives to the developmentof safety systems such as airbags and seat belts. Additionally, ratings like EURO NCAP and US NCAPenable to provide an independent evaluation of the car safety. Frontal sled tests are thus carried outto confirm the protection level of the vehicle and the results are mainly based on injury assessmentreference values derived from physical parameters measured in dummies.This doctoral thesis presents an approach for the treatment of the input data (i.e. parameters ofthe restraint systems defined by experts) followed by a classification of frontal sled tests accordingto those parameters. The study is only based on data from the passenger side, the collected datafor the driver were not enough completed to produce satisfying results. The main objective is tocreate a model that evaluates the input parameters’ influence on the injury severity and helps theengineers having a prediction of the sled tests results according to the chosen legislation or rating.The dummy biomechanical values (outputs of the model) have been regrouped into clusters in orderto define injuries groups. The model and various algorithms have been implemented in a GraphicalUser Interface for a better practical daily use. Sécurité passive Systèmes de retenue Essai catapulte frontal Valeurs biomécaniques du mannequin Méthodes de data mining Procédé du KDD Algorithme de GK Arbres de décision Passive safety Restraint systems Dummy biomechanical values Data mining methods KDD process GK algorithm Classification trees 623

1

Page generated in 0.0365 seconds