• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 6
  • 1
  • 1
  • Tagged with
  • 10
  • 10
  • 10
  • 4
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Interpreting Random Forest Classification Models Using a Feature Contribution Method

Palczewska, Anna Maria, Palczewski, J., Marchese-Robinson, R.M., Neagu, Daniel 18 February 2014 (has links)
No / Model interpretation is one of the key aspects of the model evaluation process. The explanation of the relationship between model variables and outputs is relatively easy for statistical models, such as linear regressions, thanks to the availability of model parameters and their statistical significance . For “black box” models, such as random forest, this information is hidden inside the model structure. This work presents an approach for computing feature contributions for random forest classification models. It allows for the determination of the influence of each variable on the model prediction for an individual instance. By analysing feature contributions for a training dataset, the most significant variables can be determined and their typical contribution towards predictions made for individual classes, i.e., class-specific feature contribution “patterns”, are discovered. These patterns represent a standard behaviour of the model and allow for an additional assessment of the model reliability for new data. Interpretation of feature contributions for two UCI benchmark datasets shows the potential of the proposed methodology. The robustness of results is demonstrated through an extensive analysis of feature contributions calculated for a large number of generated random forest models.
2

Maskininlärningsklassificering av fordonsstatus för minskade reparationskostnader och avbrott inom kollektivtrafiken : Applicering av Random Forest-klassificering på fordonssignaler / Machine Learning Classification of Vehicle Status for Reduction of Cost and Downtime in Public Transport

Stopner, Julia, Willberg, Carl-Åke January 2022 (has links)
I takt med att den moderna och datadrivna världen fortsätter att utvecklas, så väljer många instutitioner och företag att göra en ansats att kapitalisera på dessa entiters egna strömmar av data. Parallellt med denna utveckling söker en än mer globaliserad värld efter sätt att förena en ökande befolkning och större behov av att röra sig flexibelt genom moderna städer med ett trängande behov av att mildra den klimatskada som denna mobilitet medför. Framtiden för kollektivtrafik står som potentiell lösning i gränssnittet mellan dessa två trender och det går därmed att se många fördelar med att tillåta en maskininlärningsalgoritm att finna tidigare osedda mönster och hinder i den dagliga verksamheten. Denna studie utforskar om en på historisk data tränad klassificeringsmodell av typen Random Forest kan användas för att förutspå och förebygga driftstopp i kollektivtrafiken till följd av reparationsbehov hos fordonen. Implementationen av modellen resulterade i en accuracy på 63,1% och en recall på 59,9%. Slutsatsen från undersökningen blir därmed att det finns inneboende potential i metoden, även om det krävs en ökning i kvalitet och bredd på signaldata för att höja effektiviteten i modellen. Detta implicerar, givet ytterligare forskning och förbättring av intern datahantering, att en Random Forest-modell kan ha en kommersiellt mätbar relevans sett till driftstopp och reparationskostnader. / As the modern and data-driven world continues to evolve, many institutions and corporations are eager to capitalize on their own data streams for optimizations of their operations. In tandem with this, the globalized world is searching to find ways of dealing with an ever increasing population with an urge to travel and move throughout sprawling cityscapes - all the while finding ways to mitigate the climate impact that comes with this ease of movement. The future of public transport stands in the middle of these two trends and many advantages can be gained from seizing the opportunity to let machine learning ascertain unknown patterns and obstacles in daily operations. This study explores if the use of a Random Forest classifier, having been trained on historical data, would present an effective way of predicting vehicle downtime due to repairs. The implementation of the classifier resulted in an accuracy of 63.1% and a 59.9% recall. The conclusion of the study reveals that there is potential in the method although the quality and range of possible signals need to be improved to further raise the effectiveness of the model. This implies, given further investigation and an ample adaptation of the data stream and the company technical infrastructure, that a Random Forest model would result in commercial benefits in regards to downtime and cost of repair.
3

Email Mining Classifier : The empirical study on combining the topic modelling with Random Forest classification

Halmann, Marju January 2017 (has links)
Filtering out and replying automatically to emails are of interest to many but is hard due to the complexity of the language and to dependencies of background information that is not present in the email itself. This paper investigates whether Latent Dirichlet Allocation (LDA) combined with Random Forest classifier can be used for the more general email classification task and how it compares to other existing email classifiers. The comparison is based on the literature study and on the empirical experimentation using two real-life datasets. Firstly, a literature study is performed to gain insight of the accuracy of other available email classifiers. Secondly, proposed model’s accuracy is explored with experimentation. The literature study shows that the accuracy of more general email classifiers differs greatly on different user sets. The proposed model accuracy is within the reported accuracy range, however in the lower part. It indicates that the proposed model performs poorly compared to other classifiers. On average, the classifier performance improves 15 percentage points with additional information. This indicates that Latent Dirichlet Allocation (LDA) combined with Random Forest classifier is promising, however future studies are needed to explore the model and ways to further increase the accuracy.
4

Investigating the Performance of Random Forest Classification for Stock Trading

Nordfjell, Oscar, Ring, Gustav January 2023 (has links)
We show that with the implementation presented in this paper, the Random Forest Classification model was able to predict whether or not a stock was going to increase in value during the coming day with an accuracy higher than 50\% for all stocks included in this study. Furthermore, we show that the active trading strategy presented in this paper generated higher returns and higher risk-adjusted returns than the passive investment in the stocks underlying the strategy. Therefore, we conclude \textit{(i)} that a Random Forest Classification model can be used to provide valuable insight on publicly traded stocks, and \textit{(ii)} that it is probably possible to create a profitable trading strategy based on a Random Forest Classifier, but that this requires a more sophisticated implementation than the one presented in this paper.
5

GULF OF MAINE LAND COVER AND LAND USE CHANGE ANALYSIS UTILIZING RANDOM FOREST CLASSIFICATION: TO BE USED IN HYDROLOGICAL AND ECOLOGICAL MODELING OF TERRESTRIAL CARBON EXPORT TO THE GULF OF MAINE VIA RIVERINE SYSTEMS

Mordini, Michael B. 14 August 2013 (has links)
No description available.
6

Characterization Of Taxonomically Related Some Turkish Oak (quercus L.) Species In An Isolated Stand: A Morphometric Analysis Approach

Aktas, Caner 01 June 2010 (has links) (PDF)
The genus Quercus L. is represented with more than 400 species in the world and 18 of these species are found naturally in Turkey. Although its taxonomical, phytogeographical and dendrological importance, the genus Quercus is still taxonomically one of the most problematical woody genus in Turkish flora. In this study, multivariate morphometric approach was used to analyze oak specimens collected from an isolated forest (Beynam Forest, Ankara) where Quercus pubescens Willd., Q. infectoria Olivier subsp. boissieri (Reuter) O. Schwarz and Q. macranthera Fisch. &amp / C. A. Mey. ex Hohen. subsp. syspirensis (C.Koch) Menitsky taxa are belonging to section Quercus sensu stricto (s.s.) are found. Additional oak specimens were included in the analysis for comparison. Morphometric study was based on 52 leaf characters such as, distance, angle, and area as well as counted, descriptive and calculated variables. Morphometric variables were calculated automatically by use of landmark and outline data. Random forest classification method was used to select discriminating variables and predict unidentified specimens by use of pre-identified training group. The results of the random forest variable selection procedure and the principal component analysis (PCA) showed that the morphometric variables could distinguish the specimens of Q. pubescens and Q. macranthera subsp. syspirensis mostly based on the overall leaf size and number of intercalary veins while the specimens of Q. infectoria subsp. boissieri were separated from others based on lobe and lamina base shape. Finally, micromorphological observations of abaxial lamina surface have been performed by scanning electron microscope (SEM) on selected specimens which were found useful to differentiate, particularly the specimens of Q. macranthera subsp. syspirensis and its putative hybrids from other taxa.
7

Computational studies of biomolecules

Chen, Sih-Yu January 2017 (has links)
In modern drug discovery, lead discovery is a term used to describe the overall process from hit discovery to lead optimisation, with the goal being to identify drug candidates. This can be greatly facilitated by the use of computer-aided (or in silico) techniques, which can reduce experimentation costs along the drug discovery pipeline. The range of relevant techniques include: molecular modelling to obtain structural information, molecular dynamics (which will be covered in Chapter 2), activity or property prediction by means of quantitative structure activity/property models (QSAR/QSPR), where machine learning techniques are introduced (to be covered in Chapter 1) and quantum chemistry, used to explain chemical structure, properties and reactivity. This thesis is divided into five parts. Chapter 1 starts with an outline of the early stages of drug discovery; introducing the use of virtual screening for hit and lead identification. Such approaches may roughly be divided into structure-based (docking, by far the most often referred to) and ligand-based, leading to a set of promising compounds for further evaluation. Then, the use of machine learning techniques, the issue of which will be frequently encountered, followed by a brief review of the "no free lunch" theorem, that describes how no learning algorithm can perform optimally on all problems. This implies that validation of predictive accuracy in multiple models is required for optimal model selection. As the dimensionality of the feature space increases, the issue referred to as "the curse of dimensionality" becomes a challenge. In closing, the last sections focus on supervised classification Random Forests. Computer-based analyses are an integral part of drug discovery. Chapter 2 begins with discussions of molecular docking; including strategies incorporating protein flexibility at global and local levels, then a specific focus on an automated docking program – AutoDock, which uses a Lamarckian genetic algorithm and empirical binding free energy function. In the second part of the chapter, a brief introduction of molecular dynamics will be given. Chapter 3 describes how we constructed a dataset of known binding sites with co-crystallised ligands, used to extract features characterising the structural and chemical properties of the binding pocket. A machine learning algorithm was adopted to create a three-way predictive model, capable of assigning each case to one of the classes (regular, orthosteric and allosteric) for in silico selection of allosteric sites, and by a feature selection algorithm (Gini) to rationalize the selection of important descriptors, most influential in classifying the binding pockets. In Chapter 4, we made use of structure-based virtual screening, and we focused on docking a fluorescent sensor to a non-canonical DNA quadruplex structure. The preferred binding poses, binding site, and the interactions are scored, followed by application of an ONIOM model to re-score the binding poses of some DNA-ligand complexes, focusing on only the best pose (with the lowest binding energy) from AutoDock. The use of a pre-generated conformational ensemble using MD to account for the receptors' flexibility followed by docking methods are termed “relaxed complex” schemes. Chapter 5 concerns the BLUF domain photocycle. We will be focused on conformational preference of some critical residues in the flavin binding site after a charge redistribution has been introduced. This work provides another activation model to address controversial features of the BLUF domain.
8

Detecting Lumbar Muscle Fatigue Using Nanocomposite Strain Gauges

Billmire, Darci Ann 26 June 2023 (has links) (PDF)
Introduction: Muscle fatigue can contribute to acute flare-ups of lower back pain with associated consequences such as pain, disability, lost work time, increased healthcare utilization, and increased opioid use and potential abuse. The SPINE Sense system is a wearable device with 16 high deflection nanocomposite strain gauge sensors on kinesiology tape which is adhered to the skin of the lower back. This device is used to correlate lumbar skin strains with the motion of the lumbar vertebrae and to phenotype lumbar spine motion. In this work it was hypothesized that the SPINE Sense device can be used to detect differences in biomechanical movements consequent to muscle fatigue. A human subject study was completed with 30 subjects who performed 14 functional movements before and after fatiguing their back muscles through the Biering-Sørensen endurance test with the SPINE Sense device on their lower back collecting skin strain data. Various features from the strain gauge sensors were extracted from these data and were used as inputs to a random forest classification machine learning model. The accuracy of the model was assessed under two training/validation conditions, namely a hold-out method and a leave-one-out method. The random forest classification models were able to achieve up to 84.22% and 78.37% accuracies for the hold-out and leave-one-out methods respectively. Additionally, a system usability study was performed by presenting the device to 32 potential users (clinicians and individuals with lower back pain) of their device. They received a scripted explanation of the use of the device and were then instructed to score it with the validated System Usability Score. In addition they were given the opportunity to voice concerns, questions, and offer any other additional feedback about the design and use of the device. The average System Usability Score from all participants from the system usability study was 72.03 with suggestions of improving the robustness of electrical connections and smaller profiles of accompanying electronics. Feedback from the potential users of the device was used to make more robust electrical connections and smaller wires and electronics modules. These improvements were achieved by making a two-piece design: one piece contains the sensors on kinesiology tape that is directly attached to the patient and the other one contains the wires sewn into stretch fabric to create stretchable electronic connections to the device. It is concluded that a machine-learning model of the data from the SPINE Sense device can classify lumbar motion with sufficient accuracy for clinical utility. It is also concluded that the device is usable and intuitive to use.
9

Spatial-temporal classification enhancement via 3-D iterative filtering for multi-temporal Very-High-Resolution satellite images

Li, Mao, Li 01 June 2018 (has links)
No description available.
10

Analýza 3D CT obrazových dat se zaměřením na detekci a klasifikaci specifických struktur tkání / Analysis of 3D CT image data aimed at detection and classification of specific tissue structures

Šalplachta, Jakub January 2017 (has links)
This thesis deals with the segmentation and classification of paraspinal muscle and subcutaneous adipose tissue in 3D CT image data in order to use them subsequently as internal calibration phantoms to measure bone mineral density of a vertebrae. Chosen methods were tested and afterwards evaluated in terms of correctness of the classification and total functionality for subsequent BMD value calculation. Algorithms were tested in programming environment Matlab® on created patient database which contains lumbar spines of twelve patients. Following sections of this thesis contain theoretical research of the issue of measuring bone mineral density, segmentation and classification methods and description of practical part of this work.

Page generated in 0.15 seconds