Global ETD Search

1	Parameter learning and support vector reduction in support vector regression Yang, Chih-cheng 21 July 2006 (has links) The selection and learning of kernel functions is a very important but rarely studied problem in the field of support vector learning. However, the kernel function of a support vector regression has great influence on its performance. The kernel function projects the dataset from the original data space into the feature space, and therefore the problems which can not be done in low dimensions could be done in a higher dimension through the transform of the kernel function. In this paper, there are two main contributions. Firstly, we introduce the gradient descent method to the learning of kernel functions. Using the gradient descent method, we can conduct learning rules of the parameters which indicate the shape and distribution of the kernel functions. Therefore, we can obtain better kernel functions by training their parameters with respect to the risk minimization principle. Secondly, In order to reduce the number of support vectors, we use the orthogonal least squares method. By choosing the representative support vectors, we may remove the less important support vectors in the support vector regression model. The experimental results have shown that our approach can derive better kernel functions than others and has better generalization ability. Also, the number of support vectors can be effectively reduced. gradient descent method support vector regression support vectors orthogonal least squares kernel function
2	SVM Classification and Analysis of Margin Distance on Microarray Data Shaik Abdul, Ameer Basha 16 June 2011 (has links) No description available. Bioinformatics Computer Science SVM Data mining classification microarray support vectors margin distance
3	Multicategory psi-learning and support vector machine Liu, Yufeng 18 June 2004 (has links) No description available. Statistics classification concave minimization d.c. algorithms generalization error margins nonconvex minimization outer approximation quadratic programming support vectors
4	Methodologies for remaining useful life estimation with multiple sensors in rotating machinery / Μεθοδολογίες εκτίμησης της εναπομένουσας ζωής περιστρεφόμενων συστημάτων μεταφοράς ισχύος με χρήση πολλαπλών αισθητήρων Δημήτριος, Ρούλιας 13 January 2015 (has links) The focus of this thesis was the development of failure prognosis methods (prognostics) in rotating machinery with use of multiple sensors digital signal processing and machine learning techniques. The motivation stems from the void in literature concerning prognostics in meshing gearboxes. Moreover, there are several but inconclusive works regarding bearing prognosis. Few research groups have studied multi-hour fatigue gear experiments and this was one of the contributions of this thesis. Moreover, the study expanded beyond the sheer application of vibration monitoring with the addition of an Oil Debris Monitoring probe (ODM) as well as Acoustic emission (AE). The method of AE monitoring is, once again, proposed as a robust technique for failure prognosis being better correlated with gear pitting level compared to the classic vibration monitoring technique. Moreover, judging from ODM recordings the gear pitting comprises of two phases i) a linear phase, with an almost constant pitting rate and ii) a very short non linear phase where the pitting rate increases exponentially, an explicit indication of a critical failure. Multi-hour gear experiments that are close to real scale applications are very demanding in time as well as in invested capital. To bypass this shortfall a gear failure like simulation is proposed. The simulation framework is based on real life experiments and is applied to assess a number of data-driven Remaining Useful Life (RUL) estimation techniques namely i) Proportional Hazards Μodel (PHM), ii) ε- Support Vector Regression ε-SVR and iii) Exponential extrapolation based on bootstrap sampling. In the current thesis a feature extraction scheme for prognosis is proposed and assessed based on time domain, frequency domain statistical features and Wavelet Packet (WP) energy derived from AE and vibration recordings. ICA is proposed as a preferable fusion technique for gear failure prognostics. Application of ICA for feature fusion provided a clear improvement regarding the earlier presented bootstrap extrapolation technique. Bearings are also taken into account since they are closely connected to gearboxes. In the current thesis a wavelet denoising method is proposed for bearing vibration recordings aiming to the improvement of the diagnostic and prognostic potential of vibration. Finally the importance of data fusion is highlighted in the case of bearings. It is observed that a feature extraction scheme can generalize the application of prognostics, even in cases where RMS may yield no important degradation trend. / Η παρούσα εργασία εστιάζεται στην ανάπτυξη μεθοδολογιών πρόβλεψης τελικής αστοχίας σε περιστρεφόμενα συστήματα με χρήση πολλαπλών αισθητήρων και μεθόδων μηχανικής μάθησης και επεξεργασίας σήματος. Το κίνητρο προήλθε από το κενό που υπάρχει στη βιβλιογραφία όσον αφορά την προγνωστική σε κιβώτια ταχυτήτων. Η προγνωστική σε έδρανα έχει μεν μελετηθεί αλλά σε μικρό βαθμό και η παρούσα εργασία έρχεται να συμβάλλει και σε αυτό τον τομέα. Στα πλαίσια αυτής της εργασίας εκπονήθηκε ένας αριθμός πειραμάτων κόπωσης κιβωτίων ταχυτήτων. Η μελέτη επεκτάθηκε πέραν της παρακολούθησης κατάστασης με τη μέθοδο των κραδασμών και συγκεκριμένα μελετήθηκαν καταγραφές σωματιδίων σιδήρου στο λιπαντικό (ODM) καθώς και Ακουστική Εκπομπής (AE). Η μέθοδος ΑΕ ευρέθη πιο στενά συσχετισμένη με τη σταδιακή υποβάθμιση της ακεραιότητας του κιβωτίου ταχυτήτων σε σχέση με τις καταγραφές κραδασμών. Επίσης με βάση τις καταγραφές του αισθητήρα σωματιδίων σιδήρου διακρίθηκαν δύο στάδια υποβάθμισης i) μια γραμμική περιοχή με σχεδόν σταθερό ρυθμό απελευθέρωσης υλικού από την επιφάνεια των δοντιών και ii) μια σύντομη αλλά έντονα μη γραμμική αύξηση στο ρυθμό αυτό πολύ κοντά στο τέλος της λειτουργίας του κιβωτίου. Tα πολύωρα πειράματα κόπωσης σε γρανάζια είναι πολύ απαιτητικά. Για να παρακαμφθεί αυτή η δυσκολία αναπτύχθηκε ένα φαινομενολογικό μοντέλο για αναπαραγωγή χρονοσειρών που ομοιάζουν σε καταγραφές γραναζιών σε κόπωση. Το μοντέλο αυτό στηρίχθηκε σε πραγματικά πειράματα κόπωσης. Έτσι έγινε δυνατό να εξεταστούν και να συγκριθούν ένας αριθμός μεθοδολογιών εκτίμησης εναπομένουσας ζωής και συγκεκριμένα i) Proportional Hazards Model (PHM), ii) ε- Support Vector Regression ε-SVR και iii) Exponential extrapolation βασισμένο σε μια διαδικασία bootstrap sampling. Στην παρούσα μελέτη προτείνεται ένα σύνολο παραμέτρων προερχόμενο από το πεδίο της συχνότητας, του χρόνου και κυματοπακέτων. Αυτό, συνδυαζόμενο με μια διαδικασία σύμπτυξης δεδομένων (ανάλυση σε πρωταρχικές και ανεξάρτητες συνιστώσες) αξιοποιείται για πρόγνωση σε γρανάζια σε κόπωση. Η τεχνική ανεξάρτητων συνιστωσών προτείνεται σαν προτιμότερη από τη σκοπιά της προγνωστικής καθώς βελτιώνει την εκτίμηση της εναπομένουσας ζωής. Η εργασία επεκτάθηκε και σε έδρανα κύλισης. Προτάθηκε μια διαδικασία wavelet denoising η οποία ενισχύει τόσο τη διαγνωστική όσο και την προγνωστική δυνατότητα του αισθητήρα κραδασμών. Τέλος, η σημασία της εξαγωγής παραμέτρων υπογραμμίζεται και στην περίπτωση της προγνωστικής σε έδρανα. Συνδυάζοντας πολλαπλές παραμέτρους και αισθητήρες κραδασμών μαζί με ένα μοντέλο ε-SVR παρέχεται ένα ολοκληρωμένο μοντέλο πιθανοτικής εκτίμησης εναπομένουσας ζωής σε έδρανα κύλισης ακόμα και σε περιπτώσεις που η τιμή RMS των κραδασμών δεν παρέχει πληροφορία. Prognostics Diagnostics Support vectors Proportional hazards Gear boxes Rolling bearings Independent components analysis 621.822 Προγνωστική Διαγνωστική Αναλογικός κίνδυνος Κιβώτια ταχυτήτων Κυλιόμενα έδρανα
5	Méthodes de classifications dynamiques et incrémentales : application à la numérisation cognitive d'images de documents / Incremental and dynamic learning for document image : application for intelligent cognitive scanning of documents Ngo Ho, Anh Khoi 19 March 2015 (has links) Cette thèse s’intéresse à la problématique de la classification dynamique en environnements stationnaires et non stationnaires, tolérante aux variations de quantités des données d’apprentissage et capable d’ajuster ses modèles selon la variabilité des données entrantes. Pour cela, nous proposons une solution faisant cohabiter des classificateurs one-class SVM indépendants ayant chacun leur propre procédure d’apprentissage incrémentale et par conséquent, ne subissant pas d’influences croisées pouvant émaner de la configuration des modèles des autres classificateurs. L’originalité de notre proposition repose sur l’exploitation des anciennes connaissances conservées dans les modèles de SVM (historique propre à chaque SVM représenté par l’ensemble des vecteurs supports trouvés) et leur combinaison avec les connaissances apportées par les nouvelles données au moment de leur arrivée. Le modèle de classification proposé (mOC-iSVM) sera exploité à travers trois variations exploitant chacune différemment l’historique des modèles. Notre contribution s’inscrit dans un état de l’art ne proposant pas à ce jour de solutions permettant de traiter à la fois la dérive de concepts, l’ajout ou la suppression de concepts, la fusion ou division de concepts, tout en offrant un cadre privilégié d’interactions avec l’utilisateur. Dans le cadre du projet ANR DIGIDOC, notre approche a été appliquée sur plusieurs scénarios de classification de flux d’images pouvant survenir dans des cas réels lors de campagnes de numérisation. Ces scénarios ont permis de valider une exploitation interactive de notre solution de classification incrémentale pour classifier des images arrivant en flux afin d’améliorer la qualité des images numérisées. / This research contributes to the field of dynamic learning and classification in case of stationary and non-stationary environments. The goal of this PhD is to define a new classification framework able to deal with very small learning dataset at the beginning of the process and with abilities to adjust itself according to the variability of the incoming data inside a stream. For that purpose, we propose a solution based on a combination of independent one-class SVM classifiers having each one their own incremental learning procedure. Consequently, each classifier is not sensitive to crossed influences which can emanate from the configuration of the models of the other classifiers. The originality of our proposal comes from the use of the former knowledge kept in the SVM models (represented by all the found support vectors) and its combination with the new data coming incrementally from the stream. The proposed classification model (mOC-iSVM) is exploited through three variations in the way of using the existing models at each step of time. Our contribution states in a state of the art where no solution is proposed today to handle at the same time, the concept drift, the addition or the deletion of concepts, the fusion or division of concepts while offering a privileged solution for interaction with the user. Inside the DIGIDOC project, our approach was applied to several scenarios of classification of images streams which can correspond to real cases in digitalization projects. These different scenarios allow validating an interactive exploitation of our solution of incremental classification to classify images coming in a stream in order to improve the quality of the digitized images. Classification dynamique Incrémentalité SVM Images de documents Numérisation Document classification Incremental learning One-class SVM classifier Selection of support vectors
6	Mineração de dados para modelagem de risco de metástase em tumor de próstata / Data mining for the modeling of metastasis risk on prostate tumor Chahine, Gabriel Jorge, 1982- 23 August 2018 (has links) Orientadores: Laercio Luis Vendite, Stanley Robson de Medeiros Oliveira / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática Estatística e Computação Científica / Made available in DSpace on 2018-08-23T23:19:05Z (GMT). No. of bitstreams: 1 Chahine_GabrielJorge_M.pdf: 1229228 bytes, checksum: fffd253696b5a9dee9870ae1910256e5 (MD5) Previous issue date: 2013 / Resumo: Dos cânceres do trato urinário, os mais comuns são os de Próstata e de Bexiga, sendo o primeiro a causa mais comum de morte por câncer e o carcinoma mais comum para homens. Nosso objetivo nesse trabalho é desenvolver modelos para determinar se um dado tumor irá aumentar e invadir outros órgãos ou se não apresenta esse risco e permanecerá contido. Para isso, coletamos dados de pacientes com câncer de próstata e analisamos quais variáveis mais impactam para ocorrência de metástase. Com isso construímos modelos de classificação, que, com os dados de um determinado paciente, detectam se naquele caso haverá ou não metástase à distância. Nesse trabalho apresentamos modelos para predição de ocorrência de metástases em câncer de próstata. As simulações foram feitas com dados cedidos pelo prof. Dr. Ubirajara Ferreira, responsável pela disciplina de Urologia da FCM da Unicamp, do Hospital das Clinicas - UNICAMP / Abstract: Of all the cancers of the urinary tract, the most common are the Prostate and Bladder. The first being the most common cause of death by cancer and the most common carcinoma in men. Our goal in this work is to develop predictive models to determine whether a given tumor will grow and invade other organs or, if it doesn't present this risk and will remain constrained. To do this, we collected data from patients with prostate cancer and assessed which variables were the most responsible for the occurrence of metastasis. Hence, we built predictive models that, with the data of a given patient, are able detect whether or not a distant metastasis would occur in. In this work we present models to predict the occurrence of metastasis in prostate cancer. The simulations were made with the data given by prof. Dr. Ubirajara Ferreira, responsible for the disciplines of Urology from Unicamp's Faculty of Medical Sciences / Mestrado / Matematica Aplicada e Computacional / Mestre em Matemática Aplicada e Computacional Mineração de dados (Computação) Próstata - Tumores Árvores de decisões Modelagem de dados Maquina de vetores de suporte Data mining Prostate - Tumors Decision trees Data modeling Support vectors machine
7	Comparative Study of Methods for Linguistic Modeling of Numerical Data Visa, Sofia January 2002 (has links) No description available. Computer Science classifier fuzzy systems neural networks support vectors machine minimum distance classifier ROC confusion matrix Yule statistic bias-variance tradeoff.
8	Robust boosting via convex optimization Rätsch, Gunnar January 2001 (has links) In dieser Arbeit werden statistische Lernprobleme betrachtet. Lernmaschinen extrahieren Informationen aus einer gegebenen Menge von Trainingsmustern, so daß sie in der Lage sind, Eigenschaften von bisher ungesehenen Mustern - z.B. eine Klassenzugehörigkeit - vorherzusagen. Wir betrachten den Fall, bei dem die resultierende Klassifikations- oder Regressionsregel aus einfachen Regeln - den Basishypothesen - zusammengesetzt ist. Die sogenannten Boosting Algorithmen erzeugen iterativ eine gewichtete Summe von Basishypothesen, die gut auf ungesehenen Mustern vorhersagen. <br /> Die Arbeit behandelt folgende Sachverhalte: <br /> <br /> o Die zur Analyse von Boosting-Methoden geeignete Statistische Lerntheorie. Wir studieren lerntheoretische Garantien zur Abschätzung der Vorhersagequalität auf ungesehenen Mustern. Kürzlich haben sich sogenannte Klassifikationstechniken mit großem Margin als ein praktisches Ergebnis dieser Theorie herausgestellt - insbesondere Boosting und Support-Vektor-Maschinen. Ein großer Margin impliziert eine hohe Vorhersagequalität der Entscheidungsregel. Deshalb wird analysiert, wie groß der Margin bei Boosting ist und ein verbesserter Algorithmus vorgeschlagen, der effizient Regeln mit maximalem Margin erzeugt.<br /> <br /> o Was ist der Zusammenhang von Boosting und Techniken der konvexen Optimierung? <br /> Um die Eigenschaften der entstehenden Klassifikations- oder Regressionsregeln zu analysieren, ist es sehr wichtig zu verstehen, ob und unter welchen Bedingungen iterative Algorithmen wie Boosting konvergieren. Wir zeigen, daß solche Algorithmen benutzt werden koennen, um sehr große Optimierungsprobleme mit Nebenbedingungen zu lösen, deren Lösung sich gut charakterisieren laesst. Dazu werden Verbindungen zum Wissenschaftsgebiet der konvexen Optimierung aufgezeigt und ausgenutzt, um Konvergenzgarantien für eine große Familie von Boosting-ähnlichen Algorithmen zu geben.<br /> <br /> o Kann man Boosting robust gegenüber Meßfehlern und Ausreissern in den Daten machen? <br /> Ein Problem bisheriger Boosting-Methoden ist die relativ hohe Sensitivität gegenüber Messungenauigkeiten und Meßfehlern in der Trainingsdatenmenge. Um dieses Problem zu beheben, wird die sogenannte 'Soft-Margin' Idee, die beim Support-Vector Lernen schon benutzt wird, auf Boosting übertragen. Das führt zu theoretisch gut motivierten, regularisierten Algorithmen, die ein hohes Maß an Robustheit aufweisen.<br /> <br /> o Wie kann man die Anwendbarkeit von Boosting auf Regressionsprobleme erweitern? <br /> Boosting-Methoden wurden ursprünglich für Klassifikationsprobleme entwickelt. Um die Anwendbarkeit auf Regressionsprobleme zu erweitern, werden die vorherigen Konvergenzresultate benutzt und neue Boosting-ähnliche Algorithmen zur Regression entwickelt. Wir zeigen, daß diese Algorithmen gute theoretische und praktische Eigenschaften haben.<br /> <br /> o Ist Boosting praktisch anwendbar? <br /> Die dargestellten theoretischen Ergebnisse werden begleitet von Simulationsergebnissen, entweder, um bestimmte Eigenschaften von Algorithmen zu illustrieren, oder um zu zeigen, daß sie in der Praxis tatsächlich gut funktionieren und direkt einsetzbar sind. Die praktische Relevanz der entwickelten Methoden wird in der Analyse chaotischer Zeitreihen und durch industrielle Anwendungen wie ein Stromverbrauch-Überwachungssystem und bei der Entwicklung neuer Medikamente illustriert. / In this work we consider statistical learning problems. A learning machine aims to extract information from a set of training examples such that it is able to predict the associated label on unseen examples. We consider the case where the resulting classification or regression rule is a combination of simple rules - also called base hypotheses. The so-called boosting algorithms iteratively find a weighted linear combination of base hypotheses that predict well on unseen data. We address the following issues:<br /> <br /> o The statistical learning theory framework for analyzing boosting methods.<br /> We study learning theoretic guarantees on the prediction performance on unseen examples. Recently, large margin classification techniques emerged as a practical result of the theory of generalization, in particular Boosting and Support Vector Machines. A large margin implies a good generalization performance. Hence, we analyze how large the margins in boosting are and find an improved algorithm that is able to generate the maximum margin solution.<br /> <br /> o How can boosting methods be related to mathematical optimization techniques?<br /> To analyze the properties of the resulting classification or regression rule, it is of high importance to understand whether and under which conditions boosting converges. We show that boosting can be used to solve large scale constrained optimization problems, whose solutions are well characterizable. To show this, we relate boosting methods to methods known from mathematical optimization, and derive convergence guarantees for a quite general family of boosting algorithms.<br /> <br /> o How to make Boosting noise robust?<br /> One of the problems of current boosting techniques is that they are sensitive to noise in the training sample. In order to make boosting robust, we transfer the soft margin idea from support vector learning to boosting. We develop theoretically motivated regularized algorithms that exhibit a high noise robustness.<br /> <br /> o How to adapt boosting to regression problems?<br /> Boosting methods are originally designed for classification problems. To extend the boosting idea to regression problems, we use the previous convergence results and relations to semi-infinite programming to design boosting-like algorithms for regression problems. We show that these leveraging algorithms have desirable theoretical and practical properties.<br /> <br /> o Can boosting techniques be useful in practice?<br /> The presented theoretical results are guided by simulation results either to illustrate properties of the proposed algorithms or to show that they work well in practice. We report on successful applications in a non-intrusive power monitoring system, chaotic time series analysis and a drug discovery process. <br><br> ---<br> Anmerkung:<br> Der Autor ist Träger des von der Mathematisch-Naturwissenschaftlichen Fakultät der Universität Potsdam vergebenen Michelson-Preises für die beste Promotion des Jahres 2001/2002. Natural sciences and mathematics
9	Máquina de vetores de suporte aplicada a dados de espectroscopia NIR de combustíveis e lubrificantes para o desenvolvimento de modelos de regressão e classificação / Support vectors machine applied to NIR spectroscopy data of fuels and lubricants for development of regression and classification models Alves, Julio Cesar Laurentino, 1978- 19 August 2018 (has links) Orientador: Ronei Jesus Poppi / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Química / Made available in DSpace on 2018-08-19T18:06:58Z (GMT). No. of bitstreams: 1 Alves_JulioCesarLaurentino_D.pdf: 19282542 bytes, checksum: 78d1bf16d9d133c488adb4bedf593b06 (MD5) Previous issue date: 2012 / Resumo: Modelos lineares de regressão e classificação por vezes proporcionam um desempenho insatisfatório no tratamento de dados de espectroscopia no infravermelho próximo de produtos derivados de petróleo. A máquina de vetores de suporte (SVM), baseada na teoria do aprendizado estatístico, possibilita o desenvolvimento de modelos de regressão e classificação não lineares que podem proporcionar uma melhor modelagem dos referidos dados, porém ainda é pouco explorada para resolução de problemas em química analítica. Nesse trabalho demonstra-se a utilização do SVM para o tratamento de dados de espectroscopia na região do infravermelho próximo de combustíveis e lubrificantes. O SVM foi utilizado para a solução de problemas de regressão e classificação e seus resultados comparados com os algoritmos de referência PLS e SIMCA. Foram abordados os seguintes problemas analíticos relacionados a controle de processos e controle de qualidade: (i) determinação de parâmetros de qualidade do óleo diesel utilizados para otimização do processo de mistura em linha na produção desse combustível; (ii) determinação de parâmetros de qualidade do óleo diesel que é carga do processo de HDT, para controle e otimização das condições de processo dessa unidade; (iii) determinação do teor de biodiesel na mistura com o óleo diesel; (iv) classificação das diferentes correntes que compõem o pool de óleo diesel na refinaria, permitindo a identificação de adulterações e controle de qualidade; (v) classificação de lubrificantes quanto ao teor de óleo naftênico e/ou presença de óleo vegetal. Demonstram-se o melhor desempenho do SVM em relação aos modelos desenvolvidos com os métodos quimiométricos de referência (métodos lineares). O desenvolvimento de métodos analíticos rápidos e de baixo custo para solução de problemas em controle de processos e controle de qualidade, com a utilização de modelos de regressão e classificação mais exatos, proporcionam o monitoramento da qualidade de forma mais eficaz e eficiente, contribuindo para o aumento das rentabilidades nas atividades econômicas de produção e comercialização dos derivados do petróleo estudados / Abstract: Linear regression and classification models can produce a poor performance in processing near-infrared spectroscopy data of petroleum products. Support vectors machine (SVM), based on statistical learning theory, provides the development of models for nonlinear regression and classification that can result in better modeling of these data but it is still little explored for solving problems in analytical chemistry. This work demonstrates the use of the SVM for treatment of near-infrared spectroscopy data of fuels and lubricants. The SVM was used to solve regression and classification problems and its results were compared with the reference algorithms PLS and SIMCA. The following analytical problems related to process control and quality control were studied: (i) quality parameters determination of diesel oil, used for optimization of in line blending process; (ii) quality parameters determination of diesel oil which is feed-stock of HDT unit for optimization of process control; (iii) quantification of biodiesel blended with diesel oil; (iv) classification of different streams that make up the pool of diesel oil in the refinery, enabling identification of adulteration and quality control; (v) classification of lubricants based on the content of naphthenic oil and/or the presence of vegetable oil. It is shown the best performance of the SVM compared to models developed with the reference algorithms. The development of fast and low cost analytical methods used in process control and quality control, with the use of more accurate regression and classification models, allows monitoring quality parameters in more effectiveness and efficient manner, making possible an increase in profitability of economic activities of production and business of petroleum derivatives studied / Doutorado / Quimica Analitica / Doutor em Ciências Maquina de vetores de suporte Espectroscopia no infravermelho próximo Combustíveis diesel Óleo lubrificante Support vectors machine Near infrared spectroscopy Diesel oil Lubricant oil
10	Modelos de classificação : aplicações no setor bancário / Classification models : applications in banking sector Caetano, Mateus, 1983- 02 June 2015 (has links) Orientadores: Antonio Carlos Moretti, Márcia Aparecida Gomes Ruggiero / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática Estatística e Computação Científica / Made available in DSpace on 2018-08-26T18:03:59Z (GMT). No. of bitstreams: 1 Caetano_Mateus_M.pdf: 1249293 bytes, checksum: f8adb755363291250261872ea756f58c (MD5) Previous issue date: 2015 / Resumo: Técnicas para solucionar problemas de classificação têm aplicações em diversas áreas, como concessão de crédito, reconhecimento de imagens, detecção de SPAM, entre outras. É uma área de intensa pesquisa, para a qual diversos métodos foram e continuam sendo desenvolvidos. Dado que não há um método que apresente o melhor desempenho para qualquer tipo de aplicação, diferentes métodos precisam ser comparados para que possamos encontrar o melhor ajuste para cada aplicação em particular. Neste trabalho estudamos seis diferentes métodos aplicados em problemas de classificação supervisionada (onde há uma resposta conhecida para o treinamento do modelo): Regressão Logística, Árvore de Decisão, Naive Bayes, KNN (k-Nearest Neighbors), Redes Neurais e Support Vector Machine. Aplicamos os métodos em três conjuntos de dados referentes à problemas de concessão de crédito e seleção de clientes para campanha de marketing bancário. Realizamos o pré-processamento dos dados para lidar com observações faltantes e classes desbalanceadas. Utilizamos técnicas de particionamento do conjunto de dados e diversas métricas, como acurácia, F1 e curva ROC, com o objetivo de avaliar os desempenhos dos métodos/técnicas. Comparamos, para cada problema, o desempenho dos diferentes métodos considerando as métricas selecionadas. Os resultados obtidos pelos melhores modelos de cada aplicação foram compatíveis com outros estudos que utilizaram os mesmos bancos de dados / Abstract: Techniques for classification problems have applications on many areas, such as credit risk evaluation, image recognition, SPAM detection, among others. It is an area of intense research, for which many methods were and continue to be developed. Given that there is not a method whose performance is better across any type of problems, different methods need to be compared in order to select the one that provides the best adjustment for each application in particular. In this work, we studied six different methods applied to supervised classification problems (when there is a known response for the model training): Logistic Regression, Decision Tree, Naive Bayes, KNN (k-Nearest Neighbors), Neural Networks and Support Vector Machine. We applied these methods on three data sets related to credit evaluation and customer selection for a banking marketing campaign. We made the data pre-processing to cope with missing data and unbalanced classes. We used data partitioning techniques and several metrics, as accuracy, F1 and ROC curve, in order to evaluate the methods/techniques performances. We compared, for each problem, the performances of the different methods using the selected metrics. The results obtained for the best models on each application were comparable to other studies that have used the same data sources / Mestrado / Matematica Aplicada / Mestra em Matemática Aplicada Classificação - Modelos matemáticos Análise de regressão logística Redes neurais (Computação) Maquina de vetores de suporte Árvores de decisões Classification - Mathematical models Logistic regression analysis Neural networks (Computer science) Support vectors machine Decision trees

Search results