• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 8
  • 2
  • 1
  • Tagged with
  • 13
  • 13
  • 11
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Interpreting Random Forest Classification Models Using a Feature Contribution Method

Palczewska, Anna Maria, Palczewski, J., Marchese-Robinson, R.M., Neagu, Daniel 18 February 2014 (has links)
No / Model interpretation is one of the key aspects of the model evaluation process. The explanation of the relationship between model variables and outputs is relatively easy for statistical models, such as linear regressions, thanks to the availability of model parameters and their statistical significance . For “black box” models, such as random forest, this information is hidden inside the model structure. This work presents an approach for computing feature contributions for random forest classification models. It allows for the determination of the influence of each variable on the model prediction for an individual instance. By analysing feature contributions for a training dataset, the most significant variables can be determined and their typical contribution towards predictions made for individual classes, i.e., class-specific feature contribution “patterns”, are discovered. These patterns represent a standard behaviour of the model and allow for an additional assessment of the model reliability for new data. Interpretation of feature contributions for two UCI benchmark datasets shows the potential of the proposed methodology. The robustness of results is demonstrated through an extensive analysis of feature contributions calculated for a large number of generated random forest models.
2

Maskininlärningsklassificering av fordonsstatus för minskade reparationskostnader och avbrott inom kollektivtrafiken : Applicering av Random Forest-klassificering på fordonssignaler / Machine Learning Classification of Vehicle Status for Reduction of Cost and Downtime in Public Transport

Stopner, Julia, Willberg, Carl-Åke January 2022 (has links)
I takt med att den moderna och datadrivna världen fortsätter att utvecklas, så väljer många instutitioner och företag att göra en ansats att kapitalisera på dessa entiters egna strömmar av data. Parallellt med denna utveckling söker en än mer globaliserad värld efter sätt att förena en ökande befolkning och större behov av att röra sig flexibelt genom moderna städer med ett trängande behov av att mildra den klimatskada som denna mobilitet medför. Framtiden för kollektivtrafik står som potentiell lösning i gränssnittet mellan dessa två trender och det går därmed att se många fördelar med att tillåta en maskininlärningsalgoritm att finna tidigare osedda mönster och hinder i den dagliga verksamheten. Denna studie utforskar om en på historisk data tränad klassificeringsmodell av typen Random Forest kan användas för att förutspå och förebygga driftstopp i kollektivtrafiken till följd av reparationsbehov hos fordonen. Implementationen av modellen resulterade i en accuracy på 63,1% och en recall på 59,9%. Slutsatsen från undersökningen blir därmed att det finns inneboende potential i metoden, även om det krävs en ökning i kvalitet och bredd på signaldata för att höja effektiviteten i modellen. Detta implicerar, givet ytterligare forskning och förbättring av intern datahantering, att en Random Forest-modell kan ha en kommersiellt mätbar relevans sett till driftstopp och reparationskostnader. / As the modern and data-driven world continues to evolve, many institutions and corporations are eager to capitalize on their own data streams for optimizations of their operations. In tandem with this, the globalized world is searching to find ways of dealing with an ever increasing population with an urge to travel and move throughout sprawling cityscapes - all the while finding ways to mitigate the climate impact that comes with this ease of movement. The future of public transport stands in the middle of these two trends and many advantages can be gained from seizing the opportunity to let machine learning ascertain unknown patterns and obstacles in daily operations. This study explores if the use of a Random Forest classifier, having been trained on historical data, would present an effective way of predicting vehicle downtime due to repairs. The implementation of the classifier resulted in an accuracy of 63.1% and a 59.9% recall. The conclusion of the study reveals that there is potential in the method although the quality and range of possible signals need to be improved to further raise the effectiveness of the model. This implies, given further investigation and an ample adaptation of the data stream and the company technical infrastructure, that a Random Forest model would result in commercial benefits in regards to downtime and cost of repair.
3

Email Mining Classifier : The empirical study on combining the topic modelling with Random Forest classification

Halmann, Marju January 2017 (has links)
Filtering out and replying automatically to emails are of interest to many but is hard due to the complexity of the language and to dependencies of background information that is not present in the email itself. This paper investigates whether Latent Dirichlet Allocation (LDA) combined with Random Forest classifier can be used for the more general email classification task and how it compares to other existing email classifiers. The comparison is based on the literature study and on the empirical experimentation using two real-life datasets. Firstly, a literature study is performed to gain insight of the accuracy of other available email classifiers. Secondly, proposed model’s accuracy is explored with experimentation. The literature study shows that the accuracy of more general email classifiers differs greatly on different user sets. The proposed model accuracy is within the reported accuracy range, however in the lower part. It indicates that the proposed model performs poorly compared to other classifiers. On average, the classifier performance improves 15 percentage points with additional information. This indicates that Latent Dirichlet Allocation (LDA) combined with Random Forest classifier is promising, however future studies are needed to explore the model and ways to further increase the accuracy.
4

Mapping forest habitats in protected areas by integrating LiDAR and SPOT Multispectral Data

Alvarez, Manuela January 2016 (has links)
KNAS (Continuous Habitat Mapping of Protected Areas) is a Metria AB project that produces vegetation and habitat mapping in protected areas in Sweden. Vegetation and habitat mapping is challenging due to its heterogeneity, spatial variability and complex vertical and horizontal structure. Traditionally, multispectral data is used due to its ability to give information about horizontal structure of vegetation. LiDAR data contains information about vertical structure of vegetation, and therefore contributes to improve classification accuracy when used together with spectral data. The objectives of this study are to integrate LiDAR and multispectral data for KNAS and to determine the contribution of LiDAR data to the classification accuracy. To achieve these goals, two object-based classification schemes are proposed and compared: a spectral classification scheme and a spectral-LiDAR classification scheme. Spectral data consists of four SPOT-5 bands acquired in 2005 and 2006. Spectral-LiDAR includes the same four spectral bands from SPOT-5 and nine LiDAR-derived layers produced from NH point cloud data from airborne laser scanning acquired in 2011 and 2012 from The Swedish Mapping, Cadastral and Land Registration Authority. Processing of point cloud data includes: filtering, buffer and tiles creation, height normalization and rasterization. Due to the complexity of KNAS production, classification schemes are based on a simplified KNAS workflow and a selection of KNAS forest classes. Classification schemes include: segmentation, database creation, training and validation areas collection, SVM classification and accuracy assessment. Spectral-LiDAR data fusion is performed during segmentation in eCognition. Results from segmentation are used to build a database with segmented objects, and mean values of spectral or spectral-LiDAR data. Databases are used in Matlab to perform SVM classification with cross validation. Cross validation accuracy, overall accuracy, kappa coefficient, producer’s and user’s accuracy are computed. Training and validation areas are common to both classification schemes. Results show an improvement in overall classification accuracy for spectral-LiDAR classification scheme, compared to spectral classification scheme. Improvements of 21.9 %, 11.0 % and 21.1 % are obtained for the study areas of Linköping, Örnsköldsvik and Vilhelmina respectively.
5

Investigating the Performance of Random Forest Classification for Stock Trading

Nordfjell, Oscar, Ring, Gustav January 2023 (has links)
We show that with the implementation presented in this paper, the Random Forest Classification model was able to predict whether or not a stock was going to increase in value during the coming day with an accuracy higher than 50\% for all stocks included in this study. Furthermore, we show that the active trading strategy presented in this paper generated higher returns and higher risk-adjusted returns than the passive investment in the stocks underlying the strategy. Therefore, we conclude \textit{(i)} that a Random Forest Classification model can be used to provide valuable insight on publicly traded stocks, and \textit{(ii)} that it is probably possible to create a profitable trading strategy based on a Random Forest Classifier, but that this requires a more sophisticated implementation than the one presented in this paper.
6

GULF OF MAINE LAND COVER AND LAND USE CHANGE ANALYSIS UTILIZING RANDOM FOREST CLASSIFICATION: TO BE USED IN HYDROLOGICAL AND ECOLOGICAL MODELING OF TERRESTRIAL CARBON EXPORT TO THE GULF OF MAINE VIA RIVERINE SYSTEMS

Mordini, Michael B. 14 August 2013 (has links)
No description available.
7

Characterization Of Taxonomically Related Some Turkish Oak (quercus L.) Species In An Isolated Stand: A Morphometric Analysis Approach

Aktas, Caner 01 June 2010 (has links) (PDF)
The genus Quercus L. is represented with more than 400 species in the world and 18 of these species are found naturally in Turkey. Although its taxonomical, phytogeographical and dendrological importance, the genus Quercus is still taxonomically one of the most problematical woody genus in Turkish flora. In this study, multivariate morphometric approach was used to analyze oak specimens collected from an isolated forest (Beynam Forest, Ankara) where Quercus pubescens Willd., Q. infectoria Olivier subsp. boissieri (Reuter) O. Schwarz and Q. macranthera Fisch. &amp / C. A. Mey. ex Hohen. subsp. syspirensis (C.Koch) Menitsky taxa are belonging to section Quercus sensu stricto (s.s.) are found. Additional oak specimens were included in the analysis for comparison. Morphometric study was based on 52 leaf characters such as, distance, angle, and area as well as counted, descriptive and calculated variables. Morphometric variables were calculated automatically by use of landmark and outline data. Random forest classification method was used to select discriminating variables and predict unidentified specimens by use of pre-identified training group. The results of the random forest variable selection procedure and the principal component analysis (PCA) showed that the morphometric variables could distinguish the specimens of Q. pubescens and Q. macranthera subsp. syspirensis mostly based on the overall leaf size and number of intercalary veins while the specimens of Q. infectoria subsp. boissieri were separated from others based on lobe and lamina base shape. Finally, micromorphological observations of abaxial lamina surface have been performed by scanning electron microscope (SEM) on selected specimens which were found useful to differentiate, particularly the specimens of Q. macranthera subsp. syspirensis and its putative hybrids from other taxa.
8

Computational studies of biomolecules

Chen, Sih-Yu January 2017 (has links)
In modern drug discovery, lead discovery is a term used to describe the overall process from hit discovery to lead optimisation, with the goal being to identify drug candidates. This can be greatly facilitated by the use of computer-aided (or in silico) techniques, which can reduce experimentation costs along the drug discovery pipeline. The range of relevant techniques include: molecular modelling to obtain structural information, molecular dynamics (which will be covered in Chapter 2), activity or property prediction by means of quantitative structure activity/property models (QSAR/QSPR), where machine learning techniques are introduced (to be covered in Chapter 1) and quantum chemistry, used to explain chemical structure, properties and reactivity. This thesis is divided into five parts. Chapter 1 starts with an outline of the early stages of drug discovery; introducing the use of virtual screening for hit and lead identification. Such approaches may roughly be divided into structure-based (docking, by far the most often referred to) and ligand-based, leading to a set of promising compounds for further evaluation. Then, the use of machine learning techniques, the issue of which will be frequently encountered, followed by a brief review of the "no free lunch" theorem, that describes how no learning algorithm can perform optimally on all problems. This implies that validation of predictive accuracy in multiple models is required for optimal model selection. As the dimensionality of the feature space increases, the issue referred to as "the curse of dimensionality" becomes a challenge. In closing, the last sections focus on supervised classification Random Forests. Computer-based analyses are an integral part of drug discovery. Chapter 2 begins with discussions of molecular docking; including strategies incorporating protein flexibility at global and local levels, then a specific focus on an automated docking program – AutoDock, which uses a Lamarckian genetic algorithm and empirical binding free energy function. In the second part of the chapter, a brief introduction of molecular dynamics will be given. Chapter 3 describes how we constructed a dataset of known binding sites with co-crystallised ligands, used to extract features characterising the structural and chemical properties of the binding pocket. A machine learning algorithm was adopted to create a three-way predictive model, capable of assigning each case to one of the classes (regular, orthosteric and allosteric) for in silico selection of allosteric sites, and by a feature selection algorithm (Gini) to rationalize the selection of important descriptors, most influential in classifying the binding pockets. In Chapter 4, we made use of structure-based virtual screening, and we focused on docking a fluorescent sensor to a non-canonical DNA quadruplex structure. The preferred binding poses, binding site, and the interactions are scored, followed by application of an ONIOM model to re-score the binding poses of some DNA-ligand complexes, focusing on only the best pose (with the lowest binding energy) from AutoDock. The use of a pre-generated conformational ensemble using MD to account for the receptors' flexibility followed by docking methods are termed “relaxed complex” schemes. Chapter 5 concerns the BLUF domain photocycle. We will be focused on conformational preference of some critical residues in the flavin binding site after a charge redistribution has been introduced. This work provides another activation model to address controversial features of the BLUF domain.
9

Evaluating Multitemporal Sentinel-2 data for Forest Mapping using Random Forest

Nelson, Marc January 2017 (has links)
The mapping of land cover using remotely sensed data is most effective when a robust classification method is employed. Random forest is a modern machine learning algorithm that has recently gained interest in the field of remote sensing due to its non-parametric nature, which may be better suited to handle complex, high-dimensional data than conventional techniques. In this study, the random forest method is applied to remote sensing data from the European Space Agency’s new Sentinel-2 satellite program, which was launched in 2015 yet remains relatively untested in scientific literature using non-simulated data. In a study site of boreo-nemoral forest in Ekerö mulicipality, Sweden, a classification is performed for six forest classes based on CadasterENV Sweden, a multi-purpose land covermapping and change monitoring program. The performance of Sentinel-2’s Multi-SpectralImager is investigated in the context of time series to capture phenological conditions, optimal band combinations, as well as the influence of sample size and ancillary inputs.Using two images from spring and summer of 2016, an overall map accuracy of 86.0% was achieved. The red edge, short wave infrared, and visible red bands were confirmed to be of high value. Important factors contributing to the result include the timing of image acquisition, use of a feature reduction approach to decrease the correlation between spectral channels, and the addition of ancillary data that combines topographic and edaphic information. The results suggest that random forest is an effective classification technique that is particularly well suited to high-dimensional remote sensing data.
10

Detecting Lumbar Muscle Fatigue Using Nanocomposite Strain Gauges

Billmire, Darci Ann 26 June 2023 (has links) (PDF)
Introduction: Muscle fatigue can contribute to acute flare-ups of lower back pain with associated consequences such as pain, disability, lost work time, increased healthcare utilization, and increased opioid use and potential abuse. The SPINE Sense system is a wearable device with 16 high deflection nanocomposite strain gauge sensors on kinesiology tape which is adhered to the skin of the lower back. This device is used to correlate lumbar skin strains with the motion of the lumbar vertebrae and to phenotype lumbar spine motion. In this work it was hypothesized that the SPINE Sense device can be used to detect differences in biomechanical movements consequent to muscle fatigue. A human subject study was completed with 30 subjects who performed 14 functional movements before and after fatiguing their back muscles through the Biering-Sørensen endurance test with the SPINE Sense device on their lower back collecting skin strain data. Various features from the strain gauge sensors were extracted from these data and were used as inputs to a random forest classification machine learning model. The accuracy of the model was assessed under two training/validation conditions, namely a hold-out method and a leave-one-out method. The random forest classification models were able to achieve up to 84.22% and 78.37% accuracies for the hold-out and leave-one-out methods respectively. Additionally, a system usability study was performed by presenting the device to 32 potential users (clinicians and individuals with lower back pain) of their device. They received a scripted explanation of the use of the device and were then instructed to score it with the validated System Usability Score. In addition they were given the opportunity to voice concerns, questions, and offer any other additional feedback about the design and use of the device. The average System Usability Score from all participants from the system usability study was 72.03 with suggestions of improving the robustness of electrical connections and smaller profiles of accompanying electronics. Feedback from the potential users of the device was used to make more robust electrical connections and smaller wires and electronics modules. These improvements were achieved by making a two-piece design: one piece contains the sensors on kinesiology tape that is directly attached to the patient and the other one contains the wires sewn into stretch fabric to create stretchable electronic connections to the device. It is concluded that a machine-learning model of the data from the SPINE Sense device can classify lumbar motion with sufficient accuracy for clinical utility. It is also concluded that the device is usable and intuitive to use.

Page generated in 0.1307 seconds