Spelling suggestions: "subject:"random forests"" "subject:"random gorests""
11 |
Predicting Patient Satisfaction With Ensemble MethodsRosales, Elisa Renee 30 April 2015 (has links)
Health plans are constantly seeking ways to assess and improve the quality of patient experience in various ambulatory and institutional settings. Standardized surveys are a common tool used to gather data about patient experience, and a useful measurement taken from these surveys is known as the Net Promoter Score (NPS). This score represents the extent to which a patient would, or would not, recommend his or her physician on a scale from 0 to 10, where 0 corresponds to "Extremely unlikely" and 10 to "Extremely likely". A large national health plan utilized automated calls to distribute such a survey to its members and was interested in understanding what factors contributed to a patient's satisfaction. Additionally, they were interested in whether or not NPS could be predicted using responses from other questions on the survey, along with demographic data. When the distribution of various predictors was compared between the less satisfied and highly satisfied members, there was significant overlap, indicating that not even the Bayes Classifier could successfully differentiate between these members. Moreover, the highly imbalanced proportion of NPS responses resulted in initial poor prediction accuracy. Thus, due to the non-linear structure of the data, and high number of categorical predictors, we have leveraged flexible methods, such as decision trees, bagging, and random forests, for modeling and prediction. We further altered the prediction step in the random forest algorithm in order to account for the imbalanced structure of the data.
|
12 |
To HAVE and to BE: Function Word Reduction in Child Speech, Child Directed Speech and Inter-adult SpeechBarth, Danielle 23 February 2016 (has links)
Function words are known to be shorter than content words. I investigate the function words BE and HAVE (with its content word homonym) and show that more reduction, operationalized as word shortening or contraction, is found in some grammaticalized meanings of these words. The difference between the words’ uses cannot be attributed to differences in frequency or semantic weight. Instead I argue that these words are often shortened and reduced when they occur in constructions in which they are highly predictable. This suggests that particular grammaticalized uses of a word are stored with their own exemplar clouds of context-specific phonetic realizations. The phonetics of any instance of a word are then jointly determined by the exemplar cloud for that word and the particular context. A given instance of an auxiliary can be reduced either because it is predictable in the current context or because that use of the auxiliary usually occurs in predictable contexts. The effects cannot be attributed to frequency or semantic weight.
The present study compares function word production in the speech of school-aged children and their caregivers and in inter-adult speech. The effects of predictability in context and average predictability across contexts are replicated across the datasets. However, I find that as children get older their function words shorten relative to content words, even when controlling for increasing speech rate, showing that as their language experience increases they spend less time where it is not needed for comprehensibility. Caregivers spend less time on function words with older children than younger children, suggesting that they expect function words to be more difficult for younger interlocutors to decode than for older interlocutors. Additionally, while adults use either word shortening or contraction to increase the efficiency of speech, children tend to either use contraction and word shortening or neither until age seven, where they start to use one strategy or the other like adults. Young children with better vocabulary employ an adult-like strategy earlier, suggesting earlier onset of efficient yet effective speech behavior, namely allocating less signal to function words when they are especially easy for the listener to decode.
|
13 |
Dimensionality Reduction in the Creation of Classifiers and the Effects of Correlation, Cluster Overlap, and Modelling Assumptions.Petrcich, William 31 August 2011 (has links)
Discriminant analysis and random forests are used to create models for classification. The number of variables to be tested for inclusion in a model can be large. The goal of this work was to create an efficient and effective selection program. The first method used was based on the work of others. The resulting models were underperforming, so another approach was adopted. Models were built by adding the variable that maximized new-model accuracy. The two programs were used to generate discriminant-analysis and random forest models for three data sets. An existing software package was also used. The second program outperformed the alternatives. For the small number of runs produced in this study, it outperformed the method that inspired this work. The data sets were studied to identify determinants of performance. No definite conclusions were reached, but the results suggest topics for future study.
|
14 |
Improved detection and quantisation of keypoints in the complex wavelet domainGale, Timothy Edward January 2018 (has links)
An algorithm which is able to consistently identify features in an image is a basic building block of many object recognition systems. Attaining sufficient consistency is challenging, because factors such as pose and lighting can dramatically change a feature’s appearance. Effective feature identification therefore requires both a reliable and accurate keypoint detector and a discriminative categoriser (or quantiser). The Dual Tree Complex Wavelet Transform (DTCWT) decomposes an image into oriented subbands at a range of scales. The resulting domain is arguably well suited for further image analysis tasks such as feature identification. This thesis develops feature identification in the complex wavelet domain, building on previous keypoint detection work and exploring the use of random forests for descriptor quantisation. Firstly, we extended earlier work on keypoint detection energy functions. Existing complex wavelet based detectors were observed to suffer from two defects: a tendency to produce keypoints on straight edges at particular orientations and sensitivity to small translations of the image. We introduced a new corner energy function based on the Same Level Product (SLP) transform. This function performed well compared to previous ones, combining competitive edge rejection and positional stability properties. Secondly, we investigated the effect of changing the resolution at which the energy function is sampled. We used the undecimated DTCWT to calculate energy maps at the same resolution as the original images. This revealed the presence of fine details which could not be accurately interpolated from an energy map at the standard resolution. As a result, doubling the resolution of the map along each axis significantly improved both the reliability and posi-tional accuracy of detections. However, calculating the map using interpolated coefficients resulted in artefacts introduced by inaccuracies in the interpolation. We therefore proposed a modification to the standard DTCWT structure which doubles its output resolution for a modest computational cost. Thirdly, we developed a random forest based quantiser which operates on complex wavelet polar matching descriptors, with optional rotational invariance. Trees were evaluated on the basis of how consistently they quantised features into the same bins, and several examples of each feature were obtained by means of tracking. We found that the trees produced the most consistent quantisations when they were trained with a second set of tracked keypoints. Detecting keypoints using the the higher resolution energy maps also resulted in more consistent quantiser outputs, indicating the importance of the choice of detector on quantiser performance. Finally, we introduced a fast implementation of the DTCWT, keypoint detection and descriptor extraction algorithms for OpenCL-capable GPUs. Several aspects were optimised to enable it to run more efficiently on modern hardware, allowing it to process HD footage in faster than real time. This particularly aided the development of the detector algorithms by permitting interactive exploration of their failure modes using a live camera feed.
|
15 |
Automatic generation of hardware Tree ClassifiersThanjavur Bhaaskar, Kiran Vishal 10 July 2017 (has links)
Machine Learning is growing in popularity and spreading across different fields for various applications. Due to this trend, machine learning algorithms use different hardware platforms and are being experimented to obtain high test accuracy and throughput. FPGAs are well-suited hardware platform for machine learning because of its re-programmability and lower power consumption. Programming using FPGAs for machine learning algorithms requires substantial engineering time and effort compared to software implementation. We propose a software assisted design flow to program FPGA for machine learning algorithms using our hardware library. The hardware library is highly parameterized and it accommodates Tree Classifiers. As of now, our library consists of the components required to implement decision trees and random forests. The whole automation is wrapped around using a python script which takes you from the first step of having a dataset and design choices to the last step of having a hardware descriptive code for the trained machine learning model.
|
16 |
Prévision multi-échelle par agrégation de forêts aléatoires. Application à la consommation électrique. / Multi-scale forecasting by aggregation of random forests. Application to load forecasting.Goehry, Benjamin 10 December 2019 (has links)
Cette thèse comporte deux objectifs. Un premier objectif concerne la prévision d’une charge totale dans le contexte des Smart Grids par des approches qui reposent sur la méthode de prévision ascendante. Le deuxième objectif repose quant à lui sur l’étude des forêts aléatoires dans le cadre d’observations dépendantes, plus précisément des séries temporelles. Nous étendons dans ce cadre les résultats de consistance des forêts aléatoires originelles de Breiman ainsi que des vitesses de convergence pour une forêt aléatoire simplifiée qui ont été tout deux jusqu’ici uniquement établis pour des observations indépendantes et identiquement distribuées. La dernière contribution sur les forêts aléatoires décrit une nouvelle méthodologie qui permet d’incorporer la structure dépendante des données dans la construction des forêts et permettre ainsi un gain en performance dans le cas des séries temporelles, avec une application à la prévision de la consommation d’un bâtiment. / This thesis has two objectives. A first objective concerns the forecast of a total load in the context of Smart Grids using approaches that are based on the bottom-up forecasting method. The second objective is based on the study of random forests when observations are dependent, more precisely on time series. In this context, we are extending the consistency results of Breiman’s random forests as well as the convergence rates for a simplified random forest that have both been hitherto only established for independent and identically distributed observations. The last contribution on random forests describes a new methodology that incorporates the time-dependent structure in the construction of forests and thus have a gain in performance in the case of time series, illustrated with an application of load forecasting of a building.
|
17 |
Application of Machine Learning and Statistical Learning Methods for Prediction in a Large-Scale Vegetation MapBrookey, Carla M. 01 December 2017 (has links)
Original analyses of a large vegetation cover dataset from Roosevelt National Forest in northern Colorado were carried out by Blackard (1998) and Blackard and Dean (1998; 2000). They compared the classification accuracies of linear and quadratic discriminant analysis (LDA and QDA) with artificial neural networks (ANN) and obtained an overall classification accuracy of 70.58% for a tuned ANN compared to 58.38% for LDA and 52.76% for QDA. Because there has been tremendous development of machine learning classification methods over the last 35 years in both computer science and statistics, as well as substantial improvements in the speed of computer hardware, I applied five modern machine learning algorithms to the data to determine whether significant improvements in the classification accuracy were possible using one or more of these methods. I found that only a tuned gradient boosting machine had a higher accuracy (71.62%) that the ANN of Blackard and Dean (1998), and the difference in accuracies was only about 1%. Of the other four methods, Random Forests (RF), Support Vector Machines (SVM), Classification Trees (CT), and adaboosted trees (ADA), a tuned SVM and RF had accuracies of 67.17% and 67.57%, respectively. The partition of the data by Blackard and Dean (1998) was unusual in that the training and validation datasets had equal representation of the seven vegetation classes, even though 85% of the data fell into classes 1 and 2. For the second part of my analyses I randomly selected 60% of the data for the training data and 20% for each of the validation data and test data. On this partition of the data a single classification tree achieved an accuracy of 92.63% on the test data and the accuracy of RF is 83.98%. Unsurprisingly, most of the gains in accuracy were in classes 1 and 2, the largest classes which also had the highest misclassification rates under the original partition of the data. By decreasing the size of the training data but maintaining the same relative occurrences of the vegetation classes as in the full dataset I found that even for a training dataset of the same size as that of Blackard and Dean (1998) a single classification tree was more accurate (73.80%) that the ANN of Blackard and Dean (1998) (70.58%). The final part of my thesis was to explore the possibility that combining several of the machine learning classifiers predictions could result in higher predictive accuracies. In the analyses I carried out, the answer seems to be that increased accuracies do not occur with a simple voting of five machine learning classifiers.
|
18 |
Machine Learning Techniques as Applied to Discrete and Combinatorial StructuresSchwartz, Samuel David 01 August 2019 (has links)
Machine Learning Techniques have been used on a wide array of input types: images, sound waves, text, and so forth. In articulating these input types to the almighty machine, there have been all sorts of amazing problems that have been solved for many practical purposes.
Nevertheless, there are some input types which don’t lend themselves nicely to the standard set of machine learning tools we have. Moreover, there are some provably difficult problems which are abysmally hard to solve within a reasonable time frame.
This thesis addresses several of these difficult problems. It frames these problems such that we can then attempt to marry the allegedly powerful utility of existing machine learning techniques to the practical solvability of said problems.
|
19 |
Mapping Coastal Great Lakes Wetlands and Adjacent Land Use Through Hybrid Optical-Infrared and Radar Image Classification Techniques: A Remote Sensing and Geographic Information Science Internship with Michigan Technological Research InstituteEndres, Sarah L. 14 August 2012 (has links)
No description available.
|
20 |
Implementación de una clasificación Eco-Hidrológica de los ríos de Chile y su aplicación a la gestión ambientalPeredo Parada, Matías Manuel 12 September 2011 (has links)
En Chile existe un aumento en la preocupación por proteger y conservar los ecosistemas acuáticos debido al elevado valor ecológico y, principalmente, al alto grado de endemismo de las especies. Lamentablemente, las acciones antrópicas sobres estos ecosistemas producen presiones que han ido deteriorando sus hábitats. Sin embargo, no se ha desarrollado una herramienta que actúe como marco espacial para la planificación en la conservación y protección de estos ecosistemas. A esto, se debe agregar que la información de las especies acuáticas es insuficiente, incompleta, fragmentada y poco actualizada.
Por ello, se pensó que un sistema de clasificación, que incluyese aspectos climáticos, morfológicos, geológicos y que además permita trabajar bajo diversas escalas espaciales, podría ser herramienta adecuada para suplir esta falta de marco espacial y escasez de información.
Existen varios tipos de clasificación, entre los cuales destacan las clasificaciones desarrolladas a partir de la deducción de factores controladores de los procesos fluviales, también denominadas como clasificaciones a priori, y aquellas clasificaciones desarrolladas a partir de datos, denominadas clasificaciones a posteriori. Entre las ventajas que presentan las clasificaciones a priori se encuentran la factibilidad de implementarlas en zonas con poca información, la interpretación de sus clases y la posibilidad de extrapolar su información a zonas desprovistas de ésta.
Entendiendo que el caudal es la principal variable directora de la composición de ecosistemas fluviales, se ha desarrollado en esta tesis la clasificación Eco-Hidrológica de los ríos de Chile (REC-Chile) de tipo a priori basada en una superposición jerárquica de los factores controladores del patrón hidrológico. Esta clasificación es multiescalar dotándola de una versatilidad que permite, según los factores controladores seleccionados, clasificar los tramos de río según distintos patrones fluviales y a diversas escalas espaciales. / Peredo Parada, MM. (2010). Implementación de una clasificación Eco-Hidrológica de los ríos de Chile y su aplicación a la gestión ambiental [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/11515
|
Page generated in 0.0535 seconds