Spelling suggestions: "subject:"[een] RANDOM FORESTS"" "subject:"[enn] RANDOM FORESTS""
1 |
THE POTENTIAL FOR MACHINE LEARNING IN MENTAL HEALTH POLICING: PREDICTING OUTCOMES OF MENTAL HEALTH RELATED CALLS FOR SERVICEPearson Hirdes, Daniel January 2019 (has links)
My objective was to predict outcomes following police interactions with PMIs, and compare the predictive accuracy of logistic regression models and Random Forests learning algorithms. Additionally I evaluated if predictive accuracy of Random Forests changed when applied to merged versus region-specific data. I conducted a retrospective cohort study of reports completed by police in 13 communities between 2015 and 2018. 13,058 reports were analyzed. Random Forests learning algorithms were compared against logistic regression models for predictive accuracy in a merged dataset (13 communities) and 3 regional datasets. Outcomes for prediction were high risk of harm to self, risk of harm to others, and risk of failure to care for self within 24 and 72 hours following police contact. Random Forests learning algorithms were trained on merged and regional datasets, and compared against merged and regional holdout datasets. Performance was compared by area under the curve. For Random Forests learning algorithms, confusion matrix statistics were calculated for each outcome and predictive utility was examined by calculating conditional probabilities.
Prediction accuracy was modest across all methods. Random Forests achieved better predictive accuracy than logistic regression. Random Forests accuracy varied between merged and regional holdout data. Sensitivity of Random Forests learning algorithms were moderate (74% average, 6 outcomes, merged holdout set). Specificity was low (53% average, 6 outcomes, merged holdout set). Conditional probabilities were modestly improved by the use of the Random Forests learning algorithm. The rareness of the target outcomes created a situation where even predictions with moderate likelihood ratios had only modest predictive value. Though the Random Forests learning algorithms did outperform the logistic regression learning algorithms, the clinical significance of those benefits were limited when conditional probabilities were calculated. These findings are limited to the outcomes considered, and may not apply to more common outcomes. / Thesis / Master of Health Sciences (MSc) / The study goal was to predict outcomes following police interactions with persons with mental illness (PMIs). Additionally we compare the predictive validity of logistic regression and Random Forests learning algorithms. Classification approaches were applied to outcomes following police interactions with PMIs, including: high risk of harm to self, high risk of harm to others, and high risk of failure to care for self within 24 hours and 72 hours of initial police contact. The study also sought to determine if the predictive accuracy of Random Forests was sensitive to the police service community. Variation in predictive accuracy was assessed between a merged data set (13 communities) and 3 community-specific data. The study found that the predictive accuracy of the classification approaches on outcomes was modest. Random Forests exhibited greater predictive validity than logistic regression. The performance of the Random Forests suggested that performance was not sensitive to police service context.
|
2 |
Klasifikační metody pro data z mikročipů / Classification Methods for Micriarrays DataHudec, Vladimír January 2011 (has links)
This paper discusses about the data obtained from gene chips and methods of their analysis. Analyzes some methods for analyzing these data and focus on the method of "Random Forests". Shows dataset that is used for specific experiments. Methods are realized in R language environment. Than they are tested, and the results are presented and compared. Results with method "Random Forests" are compared with other experiments on same dataset.
|
3 |
Customer Relationship Management: from Conversion to Churn to WinbackLi, Ke January 2013 (has links)
With the grant of a big CRM dataset from a large media company, this dissertation examines four different categories of factors that could impact three stages of customer relationship management, namely customer acquisition, retention, and winback of lost customers. Specifically, with the aid of machine learning method of random forests and text mining technique, this study identify among the factors of customer heterogeneity (e.g. in usage of self-care service channels, duration of service, responsiveness to marketing actions), firm's marketing initiatives (e.g. the volume of the marketing communications, the depth of the promotion, the different communication channels they use, and the marketing penetration in different geographical areas), customer self-reported deactivation reasons, as well as the call centers notes in text form, which factors play bigger roles than others during each of the three stages of CRM. Furthermore, the authors also examine how these factors evolve throughout these three stages of CRM in terms of their effects on shaping customers' decision making of whether to convert to paid customer, to churn, or to reactivate their service with the company. The findings help managers better allocate their resources in the processes of acquiring, retaining and winning back customers. / Business Administration/Marketing
|
4 |
Propensity Score Estimation with Random ForestsJanuary 2013 (has links)
abstract: Random Forests is a statistical learning method which has been proposed for propensity score estimation models that involve complex interactions, nonlinear relationships, or both of the covariates. In this dissertation I conducted a simulation study to examine the effects of three Random Forests model specifications in propensity score analysis. The results suggested that, depending on the nature of data, optimal specification of (1) decision rules to select the covariate and its split value in a Classification Tree, (2) the number of covariates randomly sampled for selection, and (3) methods of estimating Random Forests propensity scores could potentially produce an unbiased average treatment effect estimate after propensity scores weighting by the odds adjustment. Compared to the logistic regression estimation model using the true propensity score model, Random Forests had an additional advantage in producing unbiased estimated standard error and correct statistical inference of the average treatment effect. The relationship between the balance on the covariates' means and the bias of average treatment effect estimate was examined both within and between conditions of the simulation. Within conditions, across repeated samples there was no noticeable correlation between the covariates' mean differences and the magnitude of bias of average treatment effect estimate for the covariates that were imbalanced before adjustment. Between conditions, small mean differences of covariates after propensity score adjustment were not sensitive enough to identify the optimal Random Forests model specification for propensity score analysis. / Dissertation/Thesis / Ph.D. Psychology 2013
|
5 |
Extensions and Improvements to Random Forests for ClassificationQuach, Anna 01 December 2017 (has links)
The motivation of my dissertation is to improve two weaknesses of Random Forests. One, the failure to detect genetic interactions between two single nucleotide polymorphisms (SNPs) in higher dimensions when the interacting SNPs both have weak main effects and two, the difficulty of interpretation in comparison to parametric methods such as logistic regression, linear discriminant analysis, and linear regression.
We focus on detecting pairwise SNP interactions in genome case-control studies. We determine the best parameter settings to optimize the detection of SNP interactions and improve the efficiency of Random Forests and present an efficient filtering method. The filtering method is compared to leading methods and is shown that it is computationally faster with good detection power.
Random Forests allows us to identify clusters, outliers, and important features for subgroups of observations through the visualization of the proximities. We improve the interpretation of Random Forests through the proximities. The result of the new proximities are asymmetric, and the appropriate visualization requires an asymmetric model for interpretation. We propose a new visualization technique for asymmetric data and compare it to existing approaches.
|
6 |
Applications and extensions of Random Forests in genetic and environmental studiesMichaelson, Jacob 10 January 2011 (has links) (PDF)
Transcriptional regulation refers to the molecular systems that control the concentration of mRNA species within the cell. Variation in these controlling systems is not only responsible for many diseases, but also contributes to the vast phenotypic diversity in the biological world. There are powerful experimental approaches to probe these regulatory systems, and the focus of my doctoral research has been to develop and apply effective computational methods that exploit these rich data sets more completely. First, I present a method for mapping genetic regulators of gene expression (expression quantitative trait loci, or eQTL) using Random Forests. This approach allows for flexible modeling and feature selection, and results in eQTL that are more biologically supportable than those mapped with competing methods. Next, I present a method that finds interactions between genes that in turn regulate the expression of other genes. This is accomplished by finding recurring decision motifs in the forest structure that represent dependencies between genetic loci. Third, I present a method to use distributional differences in eQTL data to establish the regulatory roles of genes relative to other disease-associated genes. Using this method, we found that genes that are master regulators of other disease genes are more likely to be consistently associated with the disease in genetic association studies. Finally, I present a novel application of Random Forests to determine the mode of regulation of toxin-perturbed genes, using time-resolved gene expression. The results demonstrate a novel approach to supervised weighted clustering of gene expression data.
|
7 |
Active learning via Transduction in Regression ForestsHansson, Kim, Hörlin, Erik January 2015 (has links)
Context. The amount of training data required to build accurate modelsis a common problem in machine learning. Active learning is a techniquethat tries to reduce the amount of required training data by making activechoices of which training data holds the greatest value.Objectives. This thesis aims to design, implement and evaluate the Ran-dom Forests algorithm combined with active learning that is suitable forpredictive tasks with real-value data outcomes where the amount of train-ing data is small. machine learning algorithms traditionally requires largeamounts of training data to create a general model, and training data is inmany cases sparse and expensive or difficult to create.Methods.The research methods used for this thesis is implementation andscientific experiment. An approach to active learning was implementedbased on previous work for classification type problems. The approachuses the Mahalanobis distance to perform active learning via transduction.Evaluation was done using several data sets were the decrease in predictionerror was measured over several iterations. The results of the evaluationwas then analyzed using nonparametric statistical testing.Results. The statistical analysis of the evaluation results failed to detect adifference between our approach and a non active learning approach, eventhough the proposed algorithm showed irregular performance. The evalu-ation of our tree-based traversal method, and the evaluation of the Maha-lanobis distance for transduction both showed that these methods performedbetter than Euclidean distance and complete graph traversal.Conclusions. We conclude that the proposed solution did not decreasethe amount of required training data on a significant level. However, theapproach has potential and future work could lead to a working active learn-ing solution. Further work is needed on key areas of the implementation,such as the choice of instances for active learning through transduction un-certainty as well as choice of method for going from transduction model toinduction model.
|
8 |
Caracterización de la respuesta emocional ante estímulos visuales en registros electroencefalográficosCandia Rivera, Diego Andrés January 2016 (has links)
Ingeniero Civil Eléctrico / El presente trabajo de título tiene por objetivo caracterizar la respuesta en la actividad neuronal de sujetos que han sido expuestos a estímulos visuales con contenido emocional mediante el análisis de series de tiempo de los registros electroencefalográficos (EEG). En particular, se comparan tres estados emocionales en base a sus diferencias en los valores de valencia y excitación emocional.
La hipótesis de este trabajo es que la respuesta emocional ante estímulos visuales puede ser caracterizada en registros EEG en las dimensiones de tiempo, frecuencia y topografía en el cuero cabelludo. Para esto se introduce un enfoque metodológico en el que se analizan canales individuales de EEG descompuestos en bandas de frecuencia.
La base de datos utilizada consiste en nueve sujetos, cuyos registros fueron pre-procesados para eliminar el ruido y artefactos oculares. La metodología propuesta consiste en extracción de características, y la construcción de modelos predictivos de emociones basados en Máquinas de Soporte Vectorial y Bosques Aleatorios.
De los nueve sujetos, seis fueron utilizados como conjunto de entrenamiento para construir los modelos predictivos y los tres sujetos restantes fueron usados como conjunto de prueba. Los resultados obtenidos fueron una completa discriminación entre emociones positivas y negativas. Para la distinción entre las tres emociones a la vez se obtuvo una precisión de 2/3. Las 20 características utilizadas para la clasificación incluyen canales de distintos lóbulos del cerebro y frecuencias que van desde la banda delta hasta la gamma. Se observó además una alta influencia de la actividad de la banda alfa en los estados emocionales.
Los resultados sugieren que el registro de la actividad neuronal a través de EEG permite obtener signos del estado emocional en respuesta a estímulos visuales, pero para obtener una mayor precisión se deben combinar características de múltiples canales y frecuencias.
|
9 |
Predicting Patient Satisfaction With Ensemble MethodsRosales, Elisa Renee 30 April 2015 (has links)
Health plans are constantly seeking ways to assess and improve the quality of patient experience in various ambulatory and institutional settings. Standardized surveys are a common tool used to gather data about patient experience, and a useful measurement taken from these surveys is known as the Net Promoter Score (NPS). This score represents the extent to which a patient would, or would not, recommend his or her physician on a scale from 0 to 10, where 0 corresponds to "Extremely unlikely" and 10 to "Extremely likely". A large national health plan utilized automated calls to distribute such a survey to its members and was interested in understanding what factors contributed to a patient's satisfaction. Additionally, they were interested in whether or not NPS could be predicted using responses from other questions on the survey, along with demographic data. When the distribution of various predictors was compared between the less satisfied and highly satisfied members, there was significant overlap, indicating that not even the Bayes Classifier could successfully differentiate between these members. Moreover, the highly imbalanced proportion of NPS responses resulted in initial poor prediction accuracy. Thus, due to the non-linear structure of the data, and high number of categorical predictors, we have leveraged flexible methods, such as decision trees, bagging, and random forests, for modeling and prediction. We further altered the prediction step in the random forest algorithm in order to account for the imbalanced structure of the data.
|
10 |
To HAVE and to BE: Function Word Reduction in Child Speech, Child Directed Speech and Inter-adult SpeechBarth, Danielle 23 February 2016 (has links)
Function words are known to be shorter than content words. I investigate the function words BE and HAVE (with its content word homonym) and show that more reduction, operationalized as word shortening or contraction, is found in some grammaticalized meanings of these words. The difference between the words’ uses cannot be attributed to differences in frequency or semantic weight. Instead I argue that these words are often shortened and reduced when they occur in constructions in which they are highly predictable. This suggests that particular grammaticalized uses of a word are stored with their own exemplar clouds of context-specific phonetic realizations. The phonetics of any instance of a word are then jointly determined by the exemplar cloud for that word and the particular context. A given instance of an auxiliary can be reduced either because it is predictable in the current context or because that use of the auxiliary usually occurs in predictable contexts. The effects cannot be attributed to frequency or semantic weight.
The present study compares function word production in the speech of school-aged children and their caregivers and in inter-adult speech. The effects of predictability in context and average predictability across contexts are replicated across the datasets. However, I find that as children get older their function words shorten relative to content words, even when controlling for increasing speech rate, showing that as their language experience increases they spend less time where it is not needed for comprehensibility. Caregivers spend less time on function words with older children than younger children, suggesting that they expect function words to be more difficult for younger interlocutors to decode than for older interlocutors. Additionally, while adults use either word shortening or contraction to increase the efficiency of speech, children tend to either use contraction and word shortening or neither until age seven, where they start to use one strategy or the other like adults. Young children with better vocabulary employ an adult-like strategy earlier, suggesting earlier onset of efficient yet effective speech behavior, namely allocating less signal to function words when they are especially easy for the listener to decode.
|
Page generated in 0.0324 seconds