Spelling suggestions: "subject:"random forest."" "subject:"random corest.""
11 |
The art of forecasting – an analysis of predictive precision of machine learning modelsKalmár, Marcus, Nilsson, Joel January 2016 (has links)
Forecasting is used for decision making and unreliable predictions can instill a false sense of condence. Traditional time series modelling is astatistical art form rather than a science and errors can occur due to lim-itations of human judgment. In minimizing the risk of falsely specifyinga process the practitioner can make use of machine learning models. Inan eort to nd out if there's a benet in using models that require lesshuman judgment, the machine learning models Random Forest and Neural Network have been used to model a VAR(1) time series. In addition,the classical time series models AR(1), AR(2), VAR(1) and VAR(2) havebeen used as comparative foundation. The Random Forest and NeuralNetwork are trained and ultimately the models are used to make pre-dictions evaluated by RMSE. All models yield scattered forecast resultsexcept for the Random Forest that steadily yields comparatively precisepredictions. The study shows that there is denitive benet in using Random Forests to eliminate the risk of falsely specifying a process and do infact provide better results than a correctly specied model.
|
12 |
Modelo de fuga y políticas de retención en una empresa de mejoramiento del hogarCastillo Beldaño, Ana Isabel January 2014 (has links)
Memoria para optar al título de Ingeniera Civil Industrial / El dinamismo que ha presentado la industria del mejoramiento del hogar en el último tiempo, ha llevado a que las empresas involucradas deban preocuparse por entender el comportamiento de compra de sus consumidores, ya que no solo deben enfocar sus recursos y estrategias en capturar nuevos clientes sino también en la retención de éstos.
El objetivo de este trabajo es estimar la fuga de clientes en una empresa de mejoramiento del hogar con el fin de generar estrategias de retención. Para ello se definirán criterios de fuga y se determinarán probabilidades para gestionar acciones sobre una fracción de clientes propensos a fugarse.
Para alcanzar los objetivos mencionados, se trabajará sólo con clientes que forman parte de la cartera de un vendedor y se hará uso de las siguientes herramientas: estadística descriptiva, técnica RFM y la comparación de los modelos predictivos Árbol de decisión y Random Forest, donde la principal diferencia de estos últimos es la cantidad de variables y árboles que se construyen para la predicción de las probabilidades de fuga.
Los resultados obtenidos entregan tres criterios de fuga, de manera que un cliente es catalogado como fugado cuando supera cualquiera de las cotas máximas, es decir, 180 días para el caso del recency, 20 para R/F o una variación de monto menores al -80%, por lo que la muestra queda definida con un 53,9% de clientes fugados versus un 46,1% de clientes activos. Con respecto a los modelos predictivos se tiene que el Árbol de decisión entrega un mejor nivel de certeza con un 84,1% versus un 74,7% del Random Forest, por lo que se eligió el primero obteniendo a través de las probabilidades de fuga 4 tipos de clientes: Leales (37,9%), Normales (7,8%), Propensos a fugarse (15,6%) y Fugados (38,7%).
Se tiene que las causas de fuga corresponden a largos períodos de inactividad, atrasos en los ciclos de compras y una disminución en los montos y números de transacciones al igual que un aumento en el monto de transacciones negativas aludidas directamente a devoluciones y notas de crédito, por lo que las principales acciones de retención serían promociones, club de fidelización, descuentos personalizados y mejorar gestión en despachos y niveles de stock para que el cliente vuelva efectuar una compra en un menor plazo.
Finalmente, a partir de este trabajo, se concluye que al retener 5% de clientes de probabilidades entre [0,5 y 0,75] y con el 50% de los mayores montos de transacciones se obtienen ingresos por USD $205 mil en 6 meses, representando el 5,5% de los clientes. Se propone validar este trabajo en nuevos clientes, generar alguna encuesta de satisfacción y mejorar el desempeño de los vendedores con una optimización de cartera.
|
13 |
The stability of host-pathogen multi-strain modelsHawkins, Susan January 2017 (has links)
Previous multi-strain mathematical models have elucidated that the degree of cross-protective responses between similar strains, acting as a form of immune selection, generates different behavioural states of the pathogen population. This thesis explores these multi-strain dynamic states, to examine their robustness and stability in the face of pathogenic intrinsic phenotypic variation, and the extrinsic force of immune selection. This is achieved in two main ways: Chapter 2 introduces phenotypic variation in pathogen transmissibility, testing the robustness of a stable pathogen population to the emergence of an introduced strain of higher transmission potential; and Chapter 3 introduces a new model with a possibility of immunity to both strain-specific and cross-strain (conserved) determinants, to investigate how heterogeneity in the specificity of a host immune response alters the pathogen population structure. A final investigation in Chapter 4 develops a method of reverse-pattern oriented modelling using a machine learning algorithm to determine which intrinsic properties of the pathogen, and their combinations, lead to particular disease-like population patterns. This research offers novel techniques to complement previous and ongoing work on multi-strain modelling, with direct applications to a range of infectious agents such as Plasmodium falciparum, influenza A, and rotavirus, but also with a wider potential for other multi-strain systems.
|
14 |
Designing energy-efficient computing systems using equalization and machine learningTakhirov, Zafar 20 February 2018 (has links)
As technology scaling slows down in the nanometer CMOS regime and mobile computing becomes more ubiquitous, designing energy-efficient hardware for mobile systems is becoming increasingly critical and challenging. Although various approaches like near-threshold computing (NTC), aggressive voltage scaling with shadow latches, etc. have been proposed to get the most out of limited battery life, there is still no “silver bullet” to increasing power-performance demands of the mobile systems. Moreover, given that a mobile system could operate in a variety of environmental conditions, like different temperatures, have varying performance requirements, etc., there is a growing need for designing tunable/reconfigurable systems in order to achieve energy-efficient operation. In this work we propose to address the energy- efficiency problem of mobile systems using two different approaches: circuit tunability and distributed adaptive algorithms.
Inspired by the communication systems, we developed feedback equalization based digital logic that changes the threshold of its gates based on the input pattern. We showed that feedback equalization in static complementary CMOS logic enabled up to 20% reduction in energy dissipation while maintaining the performance metrics. We also achieved 30% reduction in energy dissipation for pass-transistor digital logic (PTL) with equalization while maintaining performance. In addition, we proposed a mechanism that leverages feedback equalization techniques to achieve near optimal operation of static complementary CMOS logic blocks over the entire voltage range from near threshold supply voltage to nominal supply voltage. Using energy-delay product (EDP) as a metric we analyzed the use of the feedback equalizer as part of various sequential computational blocks. Our analysis shows that for near-threshold voltage operation, when equalization was used, we can improve the operating frequency by up to 30%, while the energy increase was less than 15%, with an overall EDP reduction of ≈10%. We also observe an EDP reduction of close to 5% across entire above-threshold voltage range.
On the distributed adaptive algorithm front, we explored energy-efficient hardware implementation of machine learning algorithms. We proposed an adaptive classifier that leverages the wide variability in data complexity to enable energy-efficient data classification operations for mobile systems. Our approach takes advantage of varying classification hardness across data to dynamically allocate resources and improve energy efficiency. On average, our adaptive classifier is ≈100× more energy efficient but has ≈1% higher error rate than a complex radial basis function classifier and is ≈10× less energy efficient but has ≈40% lower error rate than a simple linear classifier across a wide range of classification data sets. We also developed a field of groves (FoG) implementation of random forests (RF) that achieves an accuracy comparable to Convolutional Neural Networks (CNN) and Support Vector Machines (SVM) under tight energy budgets. The FoG architecture takes advantage of the fact that in random forests a small portion of the weak classifiers (decision trees) might be sufficient to achieve high statistical performance. By dividing the random forest into smaller forests (Groves), and conditionally executing the rest of the forest, FoG is able to achieve much higher energy efficiency levels for comparable error rates. We also take advantage of the distributed nature of the FoG to achieve high level of parallelism. Our evaluation shows that at maximum achievable accuracies FoG consumes ≈1.48×, ≈24×, ≈2.5×, and ≈34.7× lower energy per classification compared to conventional RF, SVM-RBF , Multi-Layer Perceptron Network (MLP), and CNN, respectively. FoG is 6.5× less energy efficient than SVM-LR, but achieves 18% higher accuracy on average across all considered datasets.
|
15 |
A random forest approach to segmenting and classifying gesturesJoshi, Ajjen Das 12 March 2016 (has links)
This thesis investigates a gesture segmentation and recognition scheme that employs a random forest classification model. A complete gesture recognition system should localize and classify each gesture from a given gesture vocabulary, within a continuous video stream. Thus, the system must determine the start and end points of each gesture in time, as well as accurately recognize the class label of each gesture. We propose a unified approach that performs the tasks of temporal segmentation and classification simultaneously. Our method trains a random forest classification model to recognize gestures from a given vocabulary, as presented in a training dataset of video plus 3D body joint locations, as well as out-of-vocabulary (non-gesture) instances. Given an input video stream, our trained model is applied to candidate gestures using sliding windows at multiple temporal scales. The class label with the highest classifier confidence is selected, and its corresponding scale is used to determine the segmentation boundaries in time. We evaluated our formulation in segmenting and recognizing gestures from two different benchmark datasets: the NATOPS dataset of 9,600 gesture instances from a vocabulary of 24 aircraft handling signals, and the CHALEARN dataset of 7,754 gesture instances from a vocabulary of 20 Italian communication gestures. The performance of our method compares favorably with state-of-the-art methods that employ Hidden Markov Models or Hidden Conditional Random Fields on the NATOPS dataset. We conclude with a discussion of the advantages of using our model.
|
16 |
Machine learning and statistical analysis of complex mathematical models : an application to epilepsyFerrat, L. January 2019 (has links)
The electroencephalogram (EEG) is a commonly used tool for studying the emergent electrical rhythms of the brain. It has wide utility in psychology, as well as bringing a useful diagnostic aid for neurological conditions such as epilepsy. It is of growing importance to better understand the emergence of these electrical rhythms and, in the case of diagnosis of neurological conditions, to find mechanistic differences between healthy individuals and those with a disease. Mathematical models are an important tool that offer the potential to reveal these otherwise hidden mechanisms. In particular Neural Mass Models (NMMs), which describe the macroscopic activity of large populations of neurons, are increasingly used to uncover large-scale mechanisms of brain rhythms in both health and disease. The dynamics of these models is dependent upon the choice of parameters, and therefore it is crucial to be able to understand how dynamics change when parameters are varied. Despite they are considered low-dimensional in comparison to micro-scale neural network models, with regards to understanding the relationship between parameters and dynamics NMMs are still prohibitively high dimensional for classical approaches such as numerical continuation. We need alternative methods to characterise the dynamics of NMMs in high dimensional parameter spaces. The primary aim of this thesis is to develop a method to explore and analyse the high dimensional parameter space of these mathematical models. We develop an approach based on statistics and machine learning methods called decision tree mapping (DTM). This method is used to analyse the parameter space of a mathematical model by studying all the parameters simultaneously. With this approach, the parameter space can efficiently be mapped in high dimension. We have used measures linked with this method to determine which parameters play a key role in the output of the model. This approach recursively splits the parameter space into smaller subspaces with an increasing homogeneity of dynamics. The concepts of decision tree learning, random forest, measures of importance, statistical tests and visual tools are introduced to explore and analyse the parameter space. We introduce formally the theoretical background and the methods with examples. The DTM approach is used in three distinct studies to: • Identify the role of parameters on the dynamic model. For example, which parameters have a role in the emergence of seizure dynamics? • Constrain the parameter space, such that regions of the parameter space which give implausible dynamic are removed. • Compare the parameter sets to fit different groups. How does the thalamocortical connectivity of people with and without epilepsy differ? We demonstrate that classical studies have not taken into account the complexity of the parameter space. DTM can easily be extended to other fields using mathematical models. We advocate the use of this method in the future to constrain high dimensional parameter spaces in order to enable more efficient, person-specific model calibration.
|
17 |
In silico modeling for uncertain biochemical dataGusenleitner, Daniel January 2009 (has links)
Analyzing and modeling data is a well established research area and a vast variety of different methods have been developed over the last decades. Most of these methods assume fixed positions of data points; only recently uncertainty in data has caught attention as potentially useful source of information. In order to provide a deeper insight into this subject, this thesis concerns itself with the following essential question: Can information on uncertainty of feature values be exploited to improve in silico modeling? For this reason a state-of-art random forest algorithm is developed using Matlab R. In addition, three techniques of handling uncertain numeric features are presented and incorporated in different modified versions of random forests. To test the hypothesis six realworld data sets were provided by AstraZeneca. The data describe biochemical features of chemical compounds, including the results of an Ames test; a widely used technique to determine the mutagenicity of chemical substances. Each of the datasets contains a single uncertain numeric feature, represented as an expected value and an error estimate. Themodified algorithms are then applied on the six data sets in order to obtain classifiers, able to predict the outcome of an Ames test. The hypothesis is tested using a paired t-test and the results reveal that information on uncertainty can indeed improve the performance of in silico models.
|
18 |
In silico modeling for uncertain biochemical dataGusenleitner, Daniel January 2009 (has links)
<p>Analyzing and modeling data is a well established research area and a vast variety of different methods have been developed over the last decades. Most of these methods assume fixed positions of data points; only recently uncertainty in data has caught attention as potentially useful source of information. In order to provide a deeper insight into this subject, this thesis concerns itself with the following essential question: Can information on uncertainty of feature values be exploited to improve in silico modeling? For this reason a state-of-art random forest algorithm is developed using Matlab R. In addition, three techniques of handling uncertain numeric features are presented and incorporated in different modified versions of random forests. To test the hypothesis six realworld data sets were provided by AstraZeneca. The data describe biochemical features of chemical compounds, including the results of an Ames test; a widely used technique to determine the mutagenicity of chemical substances. Each of the datasets contains a single uncertain numeric feature, represented as an expected value and an error estimate. Themodified algorithms are then applied on the six data sets in order to obtain classifiers, able to predict the outcome of an Ames test. The hypothesis is tested using a paired t-test and the results reveal that information on uncertainty can indeed improve the performance of in silico models.</p>
|
19 |
Classification of terrain using superpixel segmentation and supervised learning / Klassificering av terräng med superpixelsegmentering och övervakad inlärningRingqvist, Sanna January 2014 (has links)
The usage of 3D-modeling is expanding rapidly. Modeling from aerial imagery has become very popular due to its increasing number of both civilian and mili- tary applications like urban planning, navigation and target acquisition. This master thesis project was carried out at Vricon Systems at SAAB. The Vricon system produces high resolution geospatial 3D data based on aerial imagery from manned aircrafts, unmanned aerial vehicles (UAV) and satellites. The aim of this work was to investigate to what degree superpixel segmentation and supervised learning can be applied to a terrain classification problem using imagery and digital surface models (dsm). The aim was also to investigate how the height information from the digital surface model may contribute compared to the information from the grayscale values. The goal was to identify buildings, trees and ground. Another task was to evaluate existing methods, and compare results. The approach for solving the stated goal was divided into several parts. The first part was to segment the image using superpixel segmentation, after that features were extracted. Then the classifiers were created and trained and finally the classifiers were evaluated. The classification method that obtained the best results in this thesis had approx- imately 90 % correctly labeled superpixels. The result was equal, if not better, compared to other solutions available on the market.
|
20 |
Forecasting GDP Growth, or How Can Random Forests Improve Predictions in Economics?Adriansson, Nils, Mattsson, Ingrid January 2015 (has links)
GDP is used to measure the economic state of a country and accurate forecasts of it is therefore important. Using the Economic Tendency Survey we investigate forecasting quarterly GDP growth using the data mining technique Random Forest. Comparisons are made with a benchmark AR(1) and an ad hoc linear model built on the most important variables suggested by the Random Forest. Evaluation by forecasting shows that the Random Forest makes the most accurate forecast supporting the theory that there are benefits to using Random Forests on economic time series.
|
Page generated in 0.0383 seconds