Spelling suggestions: "subject:"random forests"" "subject:"random gorests""
1 |
THE POTENTIAL FOR MACHINE LEARNING IN MENTAL HEALTH POLICING: PREDICTING OUTCOMES OF MENTAL HEALTH RELATED CALLS FOR SERVICEPearson Hirdes, Daniel January 2019 (has links)
My objective was to predict outcomes following police interactions with PMIs, and compare the predictive accuracy of logistic regression models and Random Forests learning algorithms. Additionally I evaluated if predictive accuracy of Random Forests changed when applied to merged versus region-specific data. I conducted a retrospective cohort study of reports completed by police in 13 communities between 2015 and 2018. 13,058 reports were analyzed. Random Forests learning algorithms were compared against logistic regression models for predictive accuracy in a merged dataset (13 communities) and 3 regional datasets. Outcomes for prediction were high risk of harm to self, risk of harm to others, and risk of failure to care for self within 24 and 72 hours following police contact. Random Forests learning algorithms were trained on merged and regional datasets, and compared against merged and regional holdout datasets. Performance was compared by area under the curve. For Random Forests learning algorithms, confusion matrix statistics were calculated for each outcome and predictive utility was examined by calculating conditional probabilities.
Prediction accuracy was modest across all methods. Random Forests achieved better predictive accuracy than logistic regression. Random Forests accuracy varied between merged and regional holdout data. Sensitivity of Random Forests learning algorithms were moderate (74% average, 6 outcomes, merged holdout set). Specificity was low (53% average, 6 outcomes, merged holdout set). Conditional probabilities were modestly improved by the use of the Random Forests learning algorithm. The rareness of the target outcomes created a situation where even predictions with moderate likelihood ratios had only modest predictive value. Though the Random Forests learning algorithms did outperform the logistic regression learning algorithms, the clinical significance of those benefits were limited when conditional probabilities were calculated. These findings are limited to the outcomes considered, and may not apply to more common outcomes. / Thesis / Master of Health Sciences (MSc) / The study goal was to predict outcomes following police interactions with persons with mental illness (PMIs). Additionally we compare the predictive validity of logistic regression and Random Forests learning algorithms. Classification approaches were applied to outcomes following police interactions with PMIs, including: high risk of harm to self, high risk of harm to others, and high risk of failure to care for self within 24 hours and 72 hours of initial police contact. The study also sought to determine if the predictive accuracy of Random Forests was sensitive to the police service community. Variation in predictive accuracy was assessed between a merged data set (13 communities) and 3 community-specific data. The study found that the predictive accuracy of the classification approaches on outcomes was modest. Random Forests exhibited greater predictive validity than logistic regression. The performance of the Random Forests suggested that performance was not sensitive to police service context.
|
2 |
Klasifikační metody pro data z mikročipů / Classification Methods for Micriarrays DataHudec, Vladimír January 2011 (has links)
This paper discusses about the data obtained from gene chips and methods of their analysis. Analyzes some methods for analyzing these data and focus on the method of "Random Forests". Shows dataset that is used for specific experiments. Methods are realized in R language environment. Than they are tested, and the results are presented and compared. Results with method "Random Forests" are compared with other experiments on same dataset.
|
3 |
Customer Relationship Management: from Conversion to Churn to WinbackLi, Ke January 2013 (has links)
With the grant of a big CRM dataset from a large media company, this dissertation examines four different categories of factors that could impact three stages of customer relationship management, namely customer acquisition, retention, and winback of lost customers. Specifically, with the aid of machine learning method of random forests and text mining technique, this study identify among the factors of customer heterogeneity (e.g. in usage of self-care service channels, duration of service, responsiveness to marketing actions), firm's marketing initiatives (e.g. the volume of the marketing communications, the depth of the promotion, the different communication channels they use, and the marketing penetration in different geographical areas), customer self-reported deactivation reasons, as well as the call centers notes in text form, which factors play bigger roles than others during each of the three stages of CRM. Furthermore, the authors also examine how these factors evolve throughout these three stages of CRM in terms of their effects on shaping customers' decision making of whether to convert to paid customer, to churn, or to reactivate their service with the company. The findings help managers better allocate their resources in the processes of acquiring, retaining and winning back customers. / Business Administration/Marketing
|
4 |
Evaluation of Random Forests for Detection and Localization of Cattle EyesSandsveden, Daniel January 2015 (has links)
In a time when cattle herds grow continually larger the need for automatic methods to detect diseases is ever increasing. One possible method to discover diseases is to use thermal images and automatic head and eye detectors. In this thesis an eye detector and a head detector is implemented using the Random Forests classifier. During the implementation the classifier is evaluated using three different descriptors: Histogram of Oriented Gradients, Local Binary Patterns, and a descriptor based on pixel differences. An alternative classifier, the Support Vector Machine, is also evaluated for comparison against Random Forests. The thesis results show that Histogram of Oriented Gradients performs well as a description of cattle heads, while Local Binary Patterns performs well as a description of cattle eyes. The provided descriptor performs almost equally well in both cases. The results also show that Random Forests performs approximately as good as the Support Vector Machine, when the Support Vector Machine is paired with Local Binary Patterns for both heads and eyes. Finally the thesis results indicate that it is easier to detect and locate cattle heads than it is to detect and locate cattle eyes. For eyes, combining a head detector and an eye detector is shown to give a better result than only using an eye detector. In this combination heads are first detected in images, followed by using the eye detector in areas classified as heads.
|
5 |
Propensity Score Estimation with Random ForestsJanuary 2013 (has links)
abstract: Random Forests is a statistical learning method which has been proposed for propensity score estimation models that involve complex interactions, nonlinear relationships, or both of the covariates. In this dissertation I conducted a simulation study to examine the effects of three Random Forests model specifications in propensity score analysis. The results suggested that, depending on the nature of data, optimal specification of (1) decision rules to select the covariate and its split value in a Classification Tree, (2) the number of covariates randomly sampled for selection, and (3) methods of estimating Random Forests propensity scores could potentially produce an unbiased average treatment effect estimate after propensity scores weighting by the odds adjustment. Compared to the logistic regression estimation model using the true propensity score model, Random Forests had an additional advantage in producing unbiased estimated standard error and correct statistical inference of the average treatment effect. The relationship between the balance on the covariates' means and the bias of average treatment effect estimate was examined both within and between conditions of the simulation. Within conditions, across repeated samples there was no noticeable correlation between the covariates' mean differences and the magnitude of bias of average treatment effect estimate for the covariates that were imbalanced before adjustment. Between conditions, small mean differences of covariates after propensity score adjustment were not sensitive enough to identify the optimal Random Forests model specification for propensity score analysis. / Dissertation/Thesis / Ph.D. Psychology 2013
|
6 |
Extensions and Improvements to Random Forests for ClassificationQuach, Anna 01 December 2017 (has links)
The motivation of my dissertation is to improve two weaknesses of Random Forests. One, the failure to detect genetic interactions between two single nucleotide polymorphisms (SNPs) in higher dimensions when the interacting SNPs both have weak main effects and two, the difficulty of interpretation in comparison to parametric methods such as logistic regression, linear discriminant analysis, and linear regression.
We focus on detecting pairwise SNP interactions in genome case-control studies. We determine the best parameter settings to optimize the detection of SNP interactions and improve the efficiency of Random Forests and present an efficient filtering method. The filtering method is compared to leading methods and is shown that it is computationally faster with good detection power.
Random Forests allows us to identify clusters, outliers, and important features for subgroups of observations through the visualization of the proximities. We improve the interpretation of Random Forests through the proximities. The result of the new proximities are asymmetric, and the appropriate visualization requires an asymmetric model for interpretation. We propose a new visualization technique for asymmetric data and compare it to existing approaches.
|
7 |
Applications and extensions of Random Forests in genetic and environmental studiesMichaelson, Jacob 10 January 2011 (has links) (PDF)
Transcriptional regulation refers to the molecular systems that control the concentration of mRNA species within the cell. Variation in these controlling systems is not only responsible for many diseases, but also contributes to the vast phenotypic diversity in the biological world. There are powerful experimental approaches to probe these regulatory systems, and the focus of my doctoral research has been to develop and apply effective computational methods that exploit these rich data sets more completely. First, I present a method for mapping genetic regulators of gene expression (expression quantitative trait loci, or eQTL) using Random Forests. This approach allows for flexible modeling and feature selection, and results in eQTL that are more biologically supportable than those mapped with competing methods. Next, I present a method that finds interactions between genes that in turn regulate the expression of other genes. This is accomplished by finding recurring decision motifs in the forest structure that represent dependencies between genetic loci. Third, I present a method to use distributional differences in eQTL data to establish the regulatory roles of genes relative to other disease-associated genes. Using this method, we found that genes that are master regulators of other disease genes are more likely to be consistently associated with the disease in genetic association studies. Finally, I present a novel application of Random Forests to determine the mode of regulation of toxin-perturbed genes, using time-resolved gene expression. The results demonstrate a novel approach to supervised weighted clustering of gene expression data.
|
8 |
Active learning via Transduction in Regression ForestsHansson, Kim, Hörlin, Erik January 2015 (has links)
Context. The amount of training data required to build accurate modelsis a common problem in machine learning. Active learning is a techniquethat tries to reduce the amount of required training data by making activechoices of which training data holds the greatest value.Objectives. This thesis aims to design, implement and evaluate the Ran-dom Forests algorithm combined with active learning that is suitable forpredictive tasks with real-value data outcomes where the amount of train-ing data is small. machine learning algorithms traditionally requires largeamounts of training data to create a general model, and training data is inmany cases sparse and expensive or difficult to create.Methods.The research methods used for this thesis is implementation andscientific experiment. An approach to active learning was implementedbased on previous work for classification type problems. The approachuses the Mahalanobis distance to perform active learning via transduction.Evaluation was done using several data sets were the decrease in predictionerror was measured over several iterations. The results of the evaluationwas then analyzed using nonparametric statistical testing.Results. The statistical analysis of the evaluation results failed to detect adifference between our approach and a non active learning approach, eventhough the proposed algorithm showed irregular performance. The evalu-ation of our tree-based traversal method, and the evaluation of the Maha-lanobis distance for transduction both showed that these methods performedbetter than Euclidean distance and complete graph traversal.Conclusions. We conclude that the proposed solution did not decreasethe amount of required training data on a significant level. However, theapproach has potential and future work could lead to a working active learn-ing solution. Further work is needed on key areas of the implementation,such as the choice of instances for active learning through transduction un-certainty as well as choice of method for going from transduction model toinduction model.
|
9 |
Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a SolutionStrobl, Carolin, Boulesteix, Anne-Laure, Zeileis, Achim, Hothorn, Torsten January 2006 (has links) (PDF)
Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related scientific fields, for instance to select a subset of genetic markers relevant for the prediction of a certain disease. We show that random forest variable importance measures are a sensible means for variable selection in many applications, but are not reliable in situations where potential predictor variables vary in their scale level or their number of categories. This is particularly important in genomics and computational biology, where predictors often include variables of different types. Simulation studies are presented illustrating that, when random forest variable importance measures are used with data of varying types, the results are misleading because suboptimal predictor variables may be artificially preferred in variable selection. The two mechanisms underlying this deficiency are biased variable selection in the individual classification trees used to build the random forest on one hand, and effects induced by bootstrap sampling with replacement on the other hand. We propose to employ an alternative implementation of random forests, that provides unbiased variable selection in the individual classification trees. When this method is applied using subsampling without replacement, the resulting variable importance measures can be used reliably for variable selection even in situations where the potential predictor variables vary in their scale level or their number of categories. The usage of both random forest algorithms and their variable importance measures in the R system for statistical computing is illustrated and documented thoroughly in an application re-analysing data from a study on RNA editing. Therefore the suggested method can be applied straightforwardly by scientists in bioinformatics research. (author's abstract) / Series: Research Report Series / Department of Statistics and Mathematics
|
10 |
Caracterización de la respuesta emocional ante estímulos visuales en registros electroencefalográficosCandia Rivera, Diego Andrés January 2016 (has links)
Ingeniero Civil Eléctrico / El presente trabajo de título tiene por objetivo caracterizar la respuesta en la actividad neuronal de sujetos que han sido expuestos a estímulos visuales con contenido emocional mediante el análisis de series de tiempo de los registros electroencefalográficos (EEG). En particular, se comparan tres estados emocionales en base a sus diferencias en los valores de valencia y excitación emocional.
La hipótesis de este trabajo es que la respuesta emocional ante estímulos visuales puede ser caracterizada en registros EEG en las dimensiones de tiempo, frecuencia y topografía en el cuero cabelludo. Para esto se introduce un enfoque metodológico en el que se analizan canales individuales de EEG descompuestos en bandas de frecuencia.
La base de datos utilizada consiste en nueve sujetos, cuyos registros fueron pre-procesados para eliminar el ruido y artefactos oculares. La metodología propuesta consiste en extracción de características, y la construcción de modelos predictivos de emociones basados en Máquinas de Soporte Vectorial y Bosques Aleatorios.
De los nueve sujetos, seis fueron utilizados como conjunto de entrenamiento para construir los modelos predictivos y los tres sujetos restantes fueron usados como conjunto de prueba. Los resultados obtenidos fueron una completa discriminación entre emociones positivas y negativas. Para la distinción entre las tres emociones a la vez se obtuvo una precisión de 2/3. Las 20 características utilizadas para la clasificación incluyen canales de distintos lóbulos del cerebro y frecuencias que van desde la banda delta hasta la gamma. Se observó además una alta influencia de la actividad de la banda alfa en los estados emocionales.
Los resultados sugieren que el registro de la actividad neuronal a través de EEG permite obtener signos del estado emocional en respuesta a estímulos visuales, pero para obtener una mayor precisión se deben combinar características de múltiples canales y frecuencias.
|
Page generated in 0.0832 seconds