Spelling suggestions: "subject:" codecision trees"" "subject:" bydecision trees""
71 |
Faktory ovlivňující profitabilitu zákazníka: empirický výzkum v nesmluvním prostředí / Factors influencing customer profitability: an empirical examination in noncontractual settingsHanuska, Norbert January 2014 (has links)
Understanding of how to manage relationships with customers has become an important topic for both academic and practitioners in recent years. The effectiveness of business can be greatly improved by identifying the drivers of the most profitable customers and using them to target the right customers. In this study we identify exchange characteristics such as amount of money spent per purchase, customer relationship duration with firm, ratio of cross-buying and demographic characteristics such as age and gender as important drivers of the most profitable customers. The results of the study have important implications for academicians in understanding what drives the most profitable customers in noncotractual settings as well as practitioners to help design more effective marketing strategies. Moreover, the results of knowledge discovery about customers by different data mining techniques also contribute to help researchers identifying feasibility of these methods. Powered by TCPDF (www.tcpdf.org)
|
72 |
Predikce prodejností magazínů / Magazine sales predictionRajčan, Šimon January 2013 (has links)
Today, many magazine publishing houses faces the problem of future predictions of their products. In many cases, these predictions are made by employees based on their personal experiences and guesses. The problems of this attitude are high expanses on making the predictions and increased expanses when those predictions are wrong. The aim of this work is to study existing regression methods of automatic prediction and create a solution for predicting the magazine sales in Russian publishing house Burda.
|
73 |
On the automatic design of decision-tree induction algorithms / Sobre o projeto automático de algoritmos de indução de árvores de decisãoBarros, Rodrigo Coelho 06 December 2013 (has links)
Decision-tree induction is one of the most employed methods to extract knowledge from data. There are several distinct strategies for inducing decision trees from data, each one presenting advantages and disadvantages according to its corresponding inductive bias. These strategies have been continuously improved by researchers over the last 40 years. This thesis, following recent breakthroughs in the automatic design of machine learning algorithms, proposes to automatically generate decision-tree induction algorithms. Our proposed approach, namely HEAD-DT, is based on the evolutionary algorithms paradigm, which improves solutions based on metaphors of biological processes. HEAD-DT works over several manually-designed decision-tree components and combines the most suitable components for the task at hand. It can operate according to two different frameworks: i) evolving algorithms tailored to one single data set (specific framework); and ii) evolving algorithms from multiple data sets (general framework). The specific framework aims at generating one decision-tree algorithm per data set, so the resulting algorithm does not need to generalise beyond its target data set. The general framework has a more ambitious goal, which is to generate a single decision-tree algorithm capable of being effectively applied to several data sets. The specific framework is tested over 20 UCI data sets, and results show that HEAD-DTs specific algorithms outperform algorithms like CART and C4.5 with statistical significance. The general framework, in turn, is executed under two different scenarios: i) designing a domain-specific algorithm; and ii) designing a robust domain-free algorithm. The first scenario is tested over 35 microarray gene expression data sets, and results show that HEAD-DTs algorithms consistently outperform C4.5 and CART in different experimental configurations. The second scenario is tested over 67 UCI data sets, and HEAD-DTs algorithms were shown to be competitive with C4.5 and CART. Nevertheless, we show that HEAD-DT is prone to a special case of overfitting when it is executed under the second scenario of the general framework, and we point to possible alternatives for solving this problem. Finally, we perform an extensive experiment for evaluating the best single-objective fitness function for HEAD-DT, combining 5 classification performance measures with three aggregation schemes. We evaluate the 15 fitness functions in 67 UCI data sets, and the best of them are employed to generate algorithms tailored to balanced and imbalanced data. Results show that the automatically-designed algorithms outperform CART and C4.5 with statistical significance, indicating that HEAD-DT is also capable of generating custom algorithms for data with a particular kind of statistical profile / Árvores de decisão são amplamente utilizadas como estratégia para extração de conhecimento de dados. Existem muitas estratégias diferentes para indução de árvores de decisão, cada qual com suas vantagens e desvantagens tendo em vista seu bias indutivo. Tais estratégias têm sido continuamente melhoradas por pesquisadores nos últimos 40 anos. Esta tese, em sintonia com recentes descobertas no campo de projeto automático de algoritmos de aprendizado de máquina, propõe a geração automática de algoritmos de indução de árvores de decisão. A abordagem proposta, chamada de HEAD-DT, é baseada no paradigma de algoritmos evolutivos. HEAD-DT evolui componentes de árvores de decisão que foram manualmente codificados e os combina da forma mais adequada ao problema em questão. HEAD-DT funciona conforme dois diferentes frameworks: i) evolução de algoritmos customizados para uma única base de dados (framework específico); e ii) evolução de algoritmos a partir de múltiplas bases (framework geral). O framework específico tem por objetivo gerar um algoritmo por base de dados, de forma que o algoritmo projetado não necessite de poder de generalização que vá além da base alvo. O framework geral tem um objetivo mais ambicioso: gerar um único algoritmo capaz de ser efetivamente executado em várias bases de dados. O framework específico é testado em 20 bases públicas da UCI, e os resultados mostram que os algoritmos específicos gerados por HEAD-DT apresentam desempenho preditivo significativamente melhor do que algoritmos como CART e C4.5. O framework geral é executado em dois cenários diferentes: i) projeto de algoritmo específico a um domínio de aplicação; e ii) projeto de um algoritmo livre-de-domínio, robusto a bases distintas. O primeiro cenário é testado em 35 bases de expressão gênica, e os resultados mostram que o algoritmo gerado por HEAD-DT consistentemente supera CART e C4.5 em diferentes configurações experimentais. O segundo cenário é testado em 67 bases de dados da UCI, e os resultados mostram que o algoritmo gerado por HEAD-DT é competitivo com CART e C4.5. No entanto, é mostrado que HEAD-DT é vulnerável a um caso particular de overfitting quando executado sobre o segundo cenário do framework geral, e indica-se assim possíveis soluções para tal problema. Por fim, é realizado uma análise detalhada para avaliação de diferentes funções de fitness de HEAD-DT, onde 5 medidas de desempenho são combinadas com três esquemas de agregação. As 15 versões são avaliadas em 67 bases da UCI e as melhores versões são utilizadas para geração de algoritmos customizados para bases balanceadas e desbalanceadas. Os resultados mostram que os algoritmos gerados por HEAD-DT apresentam desempenho preditivo significativamente melhor que CART e C4.5, em uma clara indicação que HEAD-DT também é capaz de gerar algoritmos customizados para certo perfil estatístico dos dados de classificação
|
74 |
Classification techniques for noisy and imbalanced dataUnknown Date (has links)
Machine learning techniques allow useful insight to be distilled from the increasingly massive repositories of data being stored. As these data mining techniques can only learn patterns actually present in the data, it is important that the desired knowledge be faithfully and discernibly contained therein. Two common data quality issues that often affect important real life classification applications are class noise and class imbalance. Class noise, where dependent attribute values are recorded erroneously, misleads a classifier and reduces predictive performance. Class imbalance occurs when one class represents only a small portion of the examples in a dataset, and, in such cases, classifiers often display poor accuracy on the minority class. The reduction in classification performance becomes even worse when the two issues occur simultaneously. To address the magnified difficulty caused by this interaction, this dissertation performs thorough empirical investigations of several techniques for dealing with class noise and imbalanced data. Comprehensive experiments are performed to assess the effects of the classification techniques on classifier performance, as well as how the level of class imbalance, level of class noise, and distribution of class noise among the classes affects results. An empirical analysis of classifier based noise detection efficiency appears first. Subsequently, an intelligent data sampling technique, based on noise detection, is proposed and tested. Several hybrid classifier ensemble techniques for addressing class noise and imbalance are introduced. Finally, a detailed empirical investigation of classification filtering is performed to determine best practices. / by Amri Napolitano. / Thesis (Ph.D.)--Florida Atlantic University, 2009. / Includes bibliography. / Electronic reproduction. Boca Raton, Fla., 2009. Mode of access: World Wide Web.
|
75 |
On the automatic design of decision-tree induction algorithms / Sobre o projeto automático de algoritmos de indução de árvores de decisãoRodrigo Coelho Barros 06 December 2013 (has links)
Decision-tree induction is one of the most employed methods to extract knowledge from data. There are several distinct strategies for inducing decision trees from data, each one presenting advantages and disadvantages according to its corresponding inductive bias. These strategies have been continuously improved by researchers over the last 40 years. This thesis, following recent breakthroughs in the automatic design of machine learning algorithms, proposes to automatically generate decision-tree induction algorithms. Our proposed approach, namely HEAD-DT, is based on the evolutionary algorithms paradigm, which improves solutions based on metaphors of biological processes. HEAD-DT works over several manually-designed decision-tree components and combines the most suitable components for the task at hand. It can operate according to two different frameworks: i) evolving algorithms tailored to one single data set (specific framework); and ii) evolving algorithms from multiple data sets (general framework). The specific framework aims at generating one decision-tree algorithm per data set, so the resulting algorithm does not need to generalise beyond its target data set. The general framework has a more ambitious goal, which is to generate a single decision-tree algorithm capable of being effectively applied to several data sets. The specific framework is tested over 20 UCI data sets, and results show that HEAD-DTs specific algorithms outperform algorithms like CART and C4.5 with statistical significance. The general framework, in turn, is executed under two different scenarios: i) designing a domain-specific algorithm; and ii) designing a robust domain-free algorithm. The first scenario is tested over 35 microarray gene expression data sets, and results show that HEAD-DTs algorithms consistently outperform C4.5 and CART in different experimental configurations. The second scenario is tested over 67 UCI data sets, and HEAD-DTs algorithms were shown to be competitive with C4.5 and CART. Nevertheless, we show that HEAD-DT is prone to a special case of overfitting when it is executed under the second scenario of the general framework, and we point to possible alternatives for solving this problem. Finally, we perform an extensive experiment for evaluating the best single-objective fitness function for HEAD-DT, combining 5 classification performance measures with three aggregation schemes. We evaluate the 15 fitness functions in 67 UCI data sets, and the best of them are employed to generate algorithms tailored to balanced and imbalanced data. Results show that the automatically-designed algorithms outperform CART and C4.5 with statistical significance, indicating that HEAD-DT is also capable of generating custom algorithms for data with a particular kind of statistical profile / Árvores de decisão são amplamente utilizadas como estratégia para extração de conhecimento de dados. Existem muitas estratégias diferentes para indução de árvores de decisão, cada qual com suas vantagens e desvantagens tendo em vista seu bias indutivo. Tais estratégias têm sido continuamente melhoradas por pesquisadores nos últimos 40 anos. Esta tese, em sintonia com recentes descobertas no campo de projeto automático de algoritmos de aprendizado de máquina, propõe a geração automática de algoritmos de indução de árvores de decisão. A abordagem proposta, chamada de HEAD-DT, é baseada no paradigma de algoritmos evolutivos. HEAD-DT evolui componentes de árvores de decisão que foram manualmente codificados e os combina da forma mais adequada ao problema em questão. HEAD-DT funciona conforme dois diferentes frameworks: i) evolução de algoritmos customizados para uma única base de dados (framework específico); e ii) evolução de algoritmos a partir de múltiplas bases (framework geral). O framework específico tem por objetivo gerar um algoritmo por base de dados, de forma que o algoritmo projetado não necessite de poder de generalização que vá além da base alvo. O framework geral tem um objetivo mais ambicioso: gerar um único algoritmo capaz de ser efetivamente executado em várias bases de dados. O framework específico é testado em 20 bases públicas da UCI, e os resultados mostram que os algoritmos específicos gerados por HEAD-DT apresentam desempenho preditivo significativamente melhor do que algoritmos como CART e C4.5. O framework geral é executado em dois cenários diferentes: i) projeto de algoritmo específico a um domínio de aplicação; e ii) projeto de um algoritmo livre-de-domínio, robusto a bases distintas. O primeiro cenário é testado em 35 bases de expressão gênica, e os resultados mostram que o algoritmo gerado por HEAD-DT consistentemente supera CART e C4.5 em diferentes configurações experimentais. O segundo cenário é testado em 67 bases de dados da UCI, e os resultados mostram que o algoritmo gerado por HEAD-DT é competitivo com CART e C4.5. No entanto, é mostrado que HEAD-DT é vulnerável a um caso particular de overfitting quando executado sobre o segundo cenário do framework geral, e indica-se assim possíveis soluções para tal problema. Por fim, é realizado uma análise detalhada para avaliação de diferentes funções de fitness de HEAD-DT, onde 5 medidas de desempenho são combinadas com três esquemas de agregação. As 15 versões são avaliadas em 67 bases da UCI e as melhores versões são utilizadas para geração de algoritmos customizados para bases balanceadas e desbalanceadas. Os resultados mostram que os algoritmos gerados por HEAD-DT apresentam desempenho preditivo significativamente melhor que CART e C4.5, em uma clara indicação que HEAD-DT também é capaz de gerar algoritmos customizados para certo perfil estatístico dos dados de classificação
|
76 |
Arbre de décision temporel multi-opérateur / Multi-operator Temporal Decision TreesShalaeva, Vera 30 November 2018 (has links)
Aujourd'hui, du fait de la multiplication du nombre des capteurs et, plus généralement, de celle des données issues de dispositifs connectés, de nombreux domaines d'activité sont intéressés par la classification automatique des séries temporelles.Au-delà de la recherche théorique de nouveaux algorithmes d'apprentissage automatique capables de traiter ces données complexes, il est important de fournir aux utilisateurs des méthodes capables de construire efficacement des modèles prédictifs, mais aussi de se focaliser sur l'explicabilité des modèles générés et la transparence des processus mis en oeuvre.Ainsi, les utilisateurs qui n'ont pas forcément des connaissances en théorie d'apprentissage peuvent prendre en main plus rapidement ces méthodes et surtout valider la qualité des connaissances apprises vis à vis de leur domaine d'expertise.Dans ce travail de doctorat, nous nous sommes intéressée à la génération d'arbres de décision sur des données temporelles qui est une approche susceptible de construire des modèles assez faciles à interpréter pour un utilisateur "non-expert". Nous avons cherché à améliorer les différentes méthodes présentes dans la littérature en nous focalisant sur trois aspects liés à la construction des noeuds de l'arbre. Premièrement, nous avons introduit la notion d'arbre de décision temporel multi-opérateur (MTDT) qui consiste à utiliser, en concurrence, plusieurs méthodes pour construire chaque noeud. D'une part cela permet d'améliorer les capacités prédictives des arbres en capturant les meilleures structures géométriques discriminantes pour chaque classe et pour chaque niveau de l'arbre. D'autre part, grâce à cette approche on améliore la lisibilité des modèles en réduisant significativement la taille des arbres qui sont produits. Deuxièmement, nous avons cherché à réduire la complexité des algorithmes en utilisant une recherche locale pour explorer les opérateurs de contruction des noeuds. Cette recherche s'appuie sur la définition de bornes dans les métriques utilisées. Enfin, nous avons développé et comparé différentes méthodes automatiques de pondération des sous-séquences des séries temporelles de manière à maximiser la précision des arbres de décision produits. / Rising interest in mining and analyzing time series data in many domains motivates designing machine learning (ML) algorithms that are capable of tackling such complex data. Except of the need in modification, improvement, and creation of novel ML algorithms that initially works with static data, criteria of its interpretability, accuracy and computational efficiency have to be fulfilled. For a domain expert, it becomes crucial to extract knowledge from data and appealing when a yielded model is transparent and interpretable. So that, no preliminary knowledge of ML is required to read and understand results. Indeed, an emphasized by many recent works, it is more and more needed for domain experts to get a transparent and interpretable model from the learning tool, thus allowing them to use it, even if they have few knowledge about ML's theories. Decision Tree is an algorithm that focuses on providing interpretable and quite accurate classification model.More precisely, in this research we address the problem of interpretable time series classification by Decision Tree (DT) method. Firstly, we present Temporal Decision Tree, which is the modification of classical DT algorithm. The gist of this change is the definition of a node's split. Secondly, we propose an extension, called Multi-operator Temporal Decision Tree (MTDT), of the modified algorithm for temporal data that is able to capture different geometrical classes structures. The resulting algorithm improves model readability while preserving the classification accuracy.Furthermore, we explore two complementary issues: computational efficiency of extended algorithm and its classification accuracy. We suggest that decreasing of the former is reachable using a Local Search approach to built nodes. And preserving of the latter can be handled by discovering and weighting discriminative time stamps of time series.
|
77 |
Comparing Compound and Ordinary Diversity measures Using Decision Trees.Gangadhara, Kanthi, Reddy Dubbaka, Sai Anusha January 2011 (has links)
An ensemble of classifiers succeeds in improving the accuracy of the whole when thecomponent classifiers are both diverse and accurate. Diversity is required to ensure that theclassifiers make uncorrelated errors. Theoretical and experimental approaches from previousresearch show very low correlation between ensemble accuracy and diversity measure.Introducing Proposed Compound diversity functions by Albert Hung-Ren KO and RobertSabourin, (2009), by combining diversities and performances of individual classifiers exhibitstrong correlations between the diversities and accuracy. To be consistent with existingarguments compound diversity of measures are evaluated and compared with traditionaldiversity measures on different problems. Evaluating diversity of errors and comparison withmeasures are significant in this study. The results show that compound diversity measuresare better than ordinary diversity measures. However, the results further explain evaluation ofdiversity of errors on available data. / Program: Magisterutbildning i informatik
|
78 |
Designing a decision tree for cross-device communication technology aimed at iOS and android developersChioino, Jamil, Contreras, Ivan, Barrientos, Alfredo, Vives, Luis 09 April 2018 (has links)
El texto completo de este trabajo no está disponible en el Repositorio Académico UPC por restricciones de la casa editorial donde ha sido publicado. / This analysis proposes a decision tree for selecting cross-device communication technologies for iOS and Android mobile devices. This tree accelerates the selection of cross-device technologies by taking into account known use cases of interaction. Five different communication technologies were tested (Real-time Multiplayer, Nearby Messages, PeerJS, iBeacon and Eddystone) by means of 13 proof of concept applications distributed between both operating systems (Android-iOS, iOS-iOS, Android-Android) and the design of 20 architecture diagrams of three types: sequence (connection to services and message sending), deployment and component. The decision tree was validated by mobile development experts resulting in a maximum reduction of up to 30 days of technology selection research. The effectiveness of the tree as a tool is 60%, its usefulness 80% and its ease of comprehension 90%, according to the results obtained from the experts. / Revisión por pares
|
79 |
Tree Restructuring Approach to Mapping Problem in Cellular Architecture FPGASRamineni, Narahari 10 February 1995 (has links)
This thesis presents a new technique for mapping combinational circuits to Fine-Grain Cellular-Architecture FPGAs. We represent the netlist as the binary tree with decision variables associated with each node of the tree. The functionality of the tree nodes is chosen based on the target FPGA architecture. The proposed tree restructuring algorithms preserve local connectivity and allow direct mapping of the trees to the cellular array, thus eliminating the traditional routing phase. Also, predictability of the signal delays is a very important advantage of the developed approach. The developed bus-assignment algorithm efficiently utilizes the medium distance routing resources (buses). The method is general and can be used for any Fine Grain CA-type FPGA. To demonstrate our techniques, ATMEL 6000 series FPGA was used as a target architecture. The area and delay comparison between our methods and commercial tools is presented using a set of MCNC benchmarks. Final layouts of the implemented designs are included. Results show that the proposed techniques outperform the available commercial tools for ATMEL 6000 FPGAs, both in area and delay optimization.
|
80 |
Fault Classification and Location Identification on Electrical Transmission Network Based on Machine Learning MethodsVenkatesh, Vidya 01 January 2018 (has links)
Power transmission network is the most important link in the country’s energy system as they carry large amounts of power at high voltages from generators to substations. Modern power system is a complex network and requires high-speed, precise, and reliable protective system. Faults in power system are unavoidable and overhead transmission line faults are generally higher compare to other major components. They not only affect the reliability of the system but also cause widespread impact on the end users. Additionally, the complexity of protecting transmission line configurations increases with as the configurations get more complex. Therefore, prediction of faults (type and location) with high accuracy increases the operational stability and reliability of the power system and helps to avoid huge power failure. Furthermore, proper operation of the protective relays requires the correct determination of the fault type as quickly as possible (e.g., reclosing relays).
With advent of smart grid, digital technology is implemented allowing deployment of sensors along the transmission lines which can collect live fault data as they contain useful information which can be used for analyzing disturbances that occur in transmission lines. In this thesis, application of machine learning algorithms for fault classification and location identification on the transmission line has been explored. They have ability to “learn” from the data without explicitly programmed and can independently adapt when exposed to new data. The work presented makes following contributions:
1) Two different architectures are proposed which adapts to any N-terminal in the transmission line.
2) The models proposed do not require large dataset or high sampling frequency. Additionally, they can be trained quickly and generalize well to the problem.
3) The first architecture is based off decision trees for its simplicity, easy visualization which have not been used earlier. Fault location method uses traveling wave-based approach for location of faults. The method is tested with performance better than expected accuracy and fault location error is less than ±1%.
4) The second architecture uses single support vector machine to classify ten types of shunt faults and Regression model for fault location which eliminates manual work. The architecture was tested on real data and has proven to be better than first architecture. The regression model has fault location error less than ±1% for both three and two terminals.
5) Both the architectures are tested on real fault data which gives a substantial evidence of its application.
|
Page generated in 0.0559 seconds