Global ETD Search

131	Bank Customer Churn Prediction : A comparison between classification and evaluation methods Tandan, Isabelle, Goteman, Erika January 2020 (has links) This study aims to assess which supervised statistical learning method; random forest, logistic regression or K-nearest neighbor, that is the best at predicting banks customer churn. Additionally, the study evaluates which cross-validation set approach; k-Fold cross-validation or leave-one-out cross-validation that yields the most reliable results. Predicting customer churn has increased in popularity since new technology, regulation and changed demand has led to an increase in competition for banks. Thus, with greater reason, banks acknowledge the importance of maintaining their customer base. The findings of this study are that unrestricted random forest model estimated using k-Fold is to prefer out of performance measurements, computational efficiency and a theoretical point of view. Albeit, k-Fold cross-validation and leave-one-out cross-validation yield similar results, k-Fold cross-validation is to prefer due to computational advantages. For future research, methods that generate models with both good interpretability and high predictability would be beneficial. In order to combine the knowledge of which customers end their engagement as well as understanding why. Moreover, interesting future research would be to analyze at which dataset size leave-one-out cross-validation and k-Fold cross-validation yield the same results. machine learning cross-validation k-fold leave-one-out random forest decision trees k-nearest neighbor logistic regression supervised learning supervised statistical learning binary classification customer churn bank customer churn. Probability Theory and Statistics Sannolikhetsteori och statistik
132	Vyhledávání obrazu na základě podobnosti / Image search using similarity measures Harvánek, Martin January 2014 (has links) There are these methods implemented: circular sectors, color moments, color coherence vector and Gabor filters, they are based on low-level image features. These methods were evaluated after their optimal parameters were found. The finding of optimal parameters of methods is done by measuring of classification accuracy of learning operators and usage of operator cross validation on images in program RapidMiner. Implemented methods are evaluated on these image categories - ancient, beach, bus, dinousaur, elephant, flower, food, horse, mountain and natives, based on total average precision. The classification accuracy result is increased by 8 % by implemented modification (HSB color space + statistical function median) of original method circular sectors. The combination of methods color moments, circular sectors and Gabor filters with weighted ratio gives the best total average precision at 70,48 % and is the best method among all implemented methods.
133	Klasifikace malých nekódujících RNA / Classification of Small Noncoding RNAs Žigárdi, Tomáš January 2015 (has links) This masters's thesis contains description of designed and implemented tool for classification of plant microRNA without genome. Properties of mature and star sequences in microRNA duplexes are used. Implemented method is based on clustering of RNA sequences (with CD-HIT) to mainly reduce their count. Selected representants from each clusters are classified using support vector machine. Performance of classification is more than 96% (based on cross-validation method using the training data).
134	Automatic Flight Maneuver Identification Using Machine Learning Methods Bodin, Camilla January 2020 (has links) This thesis proposes a general approach to solve the offline flight-maneuver identification problem using machine learning methods. The purpose of the study was to provide means for the aircraft professionals at the flight test and verification department of Saab Aeronautics to automate the procedure of analyzing flight test data. The suggested approach succeeded in generating binary classifiers and multiclass classifiers that identified six flight maneuvers of different complexity from real flight test data. The binary classifiers solved the problem of identifying one maneuver from flight test data at a time, while the multiclass classifiers solved the problem of identifying several maneuvers from flight test data simultaneously. To achieve these results, the difficulties that this time series classification problem entailed were simplified by using different strategies. One strategy was to develop a maneuver extraction algorithm that used handcrafted rules. Another strategy was to represent the time series data by statistical measures. There was also an issue of an imbalanced dataset, where one class far outweighed others in number of samples. This was solved by using a modified oversampling method on the dataset that was used for training. Logistic Regression, Support Vector Machines with both linear and nonlinear kernels, and Artifical Neural Networks were explored, where the hyperparameters for each machine learning algorithm were chosen during model estimation by 4-fold cross-validation and solving an optimization problem based on important performance metrics. A feature selection algorithm was also used during model estimation to evaluate how the performance changes depending on how many features were used. The machine learning models were then evaluated on test data consisting of 24 flight tests. The results given by the test data set showed that the simplifications done were reasonable, but the maneuver extraction algorithm could sometimes fail. Some maneuvers were easier to identify than others and the linear machine learning models resulted in a poor fit to the more complex classes. In conclusion, both binary classifiers and multiclass classifiers could be used to solve the flight maneuver identification problem, and solving a hyperparameter optimization problem boosted the performance of the finalized models. Nonlinear classifiers performed the best on average across all explored maneuvers. Flight Aircraft Machine Learning Flight Dynamics Classification Supervised Learning Support Vector Machines Neural Networks Logistic Regression Feature Selection Recursive Feature Elimination Feature Representation k-fold cross-validation maneuvers flight maneuvers Control Engineering Reglerteknik
135	Machine Learning Applications for Downscaling Groundwater Storage Changes Integrating Satellite Gravimetry and Other Observations Agarwal, Vibhor January 2021 (has links) No description available. Geographic Information Science Geography Remote Sensing Geophysical Geological Machine Learning GRACE Downscaling Central Valley North China Plain Random Forest Artificial Neural Network Groundwater Depletion Groundwater Storage Iterative forward modeling Leakage correction cross-validation
136	Online Non-linear Prediction of Financial Time Series Patterns da Costa, Joel 11 September 2020 (has links) We consider a mechanistic non-linear machine learning approach to learning signals in financial time series data. A modularised and decoupled algorithm framework is established and is proven on daily sampled closing time-series data for JSE equity markets. The input patterns are based on input data vectors of data windows preprocessed into a sequence of daily, weekly and monthly or quarterly sampled feature measurement changes (log feature fluctuations). The data processing is split into a batch processed step where features are learnt using a Stacked AutoEncoder (SAE) via unsupervised learning, and then both batch and online supervised learning are carried out on Feedforward Neural Networks (FNNs) using these features. The FNN output is a point prediction of measured time-series feature fluctuations (log differenced data) in the future (ex-post). Weight initializations for these networks are implemented with restricted Boltzmann machine pretraining, and variance based initializations. The validity of the FNN backtest results are shown under a rigorous assessment of backtest overfitting using both Combinatorially Symmetrical Cross Validation and Probabilistic and Deflated Sharpe Ratios. Results are further used to develop a view on the phenomenology of financial markets and the value of complex historical data under unstable dynamics. online learning feedforward neural network restricted Boltzmann machine variance weight initialization stacked autoencoder pattern prediction JSE non-linear financial time series backtest overfitting deflated Sharpe ratio probabilistic Sharpe ratio
137	Machine Learning for Exploring State Space Structure in Genetic Regulatory Networks Thomas, Rodney H. 01 January 2018 (has links) Genetic regulatory networks (GRN) offer a useful model for clinical biology. Specifically, such networks capture interactions among genes, proteins, and other metabolic factors. Unfortunately, it is difficult to understand and predict the behavior of networks that are of realistic size and complexity. In this dissertation, behavior refers to the trajectory of a state, through a series of state transitions over time, to an attractor in the network. This project assumes asynchronous Boolean networks, implying that a state may transition to more than one attractor. The goal of this project is to efficiently identify a network's set of attractors and to predict the likelihood with which an arbitrary state leads to each of the network’s attractors. These probabilities will be represented using a fuzzy membership vector. Predicting fuzzy membership vectors using machine learning techniques may address the intractability posed by networks of realistic size and complexity. Modeling and simulation can be used to provide the necessary training sets for machine learning methods to predict fuzzy membership vectors. The experiments comprise several GRNs, each represented by a set of output classes. These classes consist of thresholds τ and ¬τ, where τ = [τlaw,τhigh]; state s belongs to class τ if the probability of its transitioning to attractor 􀜣 belongs to the range [τlaw,τhigh]; otherwise it belongs to class ¬τ. Finally, each machine learning classifier was trained with the training sets that was previously collected. The objective is to explore methods to discover patterns for meaningful classification of states in realistically complex regulatory networks. The research design took a GRN and a machine learning method as input and produced output class < Ατ > and its negation ¬ < Ατ >. For each GRN, attractors were identified, data was collected by sampling each state to create fuzzy membership vectors, and machine learning methods were trained to predict whether a state is in a healthy attractor or not. For T-LGL, SVMs had the highest accuracy in predictions (between 93.6% and 96.9%) and precision (between 94.59% and 97.87%). However, naive Bayesian classifiers had the highest recall (between 94.71% and 97.78%). This study showed that all experiments have extreme significance with pvalue < 0.0001. The contribution this research offers helps clinical biologist to submit genetic states to get an initial result on their outcomes. For future work, this implementation could use other machine learning classifiers such as xgboost or deep learning methods. Other suggestions offered are developing methods that improves the performance of state transition that allow for larger training sets to be sampled. asynchronous Boolean networks attractors Boolean networks cross-validation decision trees fuzzy basins fuzzy membership vectors fuzzy vectors genetic regulatory networks Markov Chain Monte Carlo naïve Bayesian classifiers support vector machines Computer Sciences
138	Systemic Identification of Radiomic Features Resilient to Batch Effects and Acquisition Variations for Diagnosis of Active Crohn's Disease on CT Enterography Pattiam Giriprakash, Pavithran 23 August 2021 (has links) No description available. Biomedical Engineering Biomedical Research Biology Medical Imaging Radiology
139	Algorithmic Methods for Multi-Omics Biomarker Discovery Li, Yichao January 2018 (has links) No description available. Bioinformatics Computer Science Motif Diabetes Transcription Factor HiC Set Cover Machine Learning Ensemble Learning HbA1C Glycated Peptide Motif Discovery Motif Pair 3D Genome Organization DREAM challenge Python Data Analytics Hist1 Clustering Analysis Cross Validation
140	Accuracy and Reproducibility of Laboratory Diffuse Reflectance Measurements with Portable VNIR and MIR Spectrometers for Predictive Soil Organic Carbon Modeling Semella, Sebastian, Hutengs, Christopher, Seidel, Michael, Ulrich, Mathias, Schneider, Birgit, Ortner, Malte, Thiele-Bruhn, Sören, Ludwig, Bernard, Vohland, Michael 09 June 2023 (has links) Soil spectroscopy in the visible-to-near infrared (VNIR) and mid-infrared (MIR) is a cost-effective method to determine the soil organic carbon content (SOC) based on predictive spectral models calibrated to analytical-determined SOC reference data. The degree to which uncertainty in reference data and spectral measurements contributes to the estimated accuracy of VNIR and MIR predictions, however, is rarely addressed and remains unclear, in particular for current handheld MIR spectrometers. We thus evaluated the reproducibility of both the spectral reflectance measurements with portable VNIR and MIR spectrometers and the analytical dry combustion SOC reference method, with the aim to assess how varying spectral inputs and reference values impact the calibration and validation of predictive VNIR and MIR models. Soil reflectance spectra and SOC were measured in triplicate, the latter by different laboratories, for a set of 75 finely ground soil samples covering a wide range of parent materials and SOC contents. Predictive partial least-squares regression (PLSR) models were evaluated in a repeated, nested cross-validation approach with systematically varied spectral inputs and reference data, respectively. We found that SOC predictions from both VNIR and MIR spectra were equally highly reproducible on average and similar to the dry combustion method, but MIR spectra were more robust to calibration sample variation. The contributions of spectral variation (ΔRMSE < 0.4 g·kg−1) and reference SOC uncertainty (ΔRMSE < 0.3 g·kg−1) to spectral modeling errors were small compared to the difference between the VNIR and MIR spectral ranges (ΔRMSE ~1.4 g·kg−1 in favor of MIR). For reference SOC, uncertainty was limited to the case of biased reference data appearing in either the calibration or validation. Given better predictive accuracy, comparable spectral reproducibility and greater robustness against calibration sample selection, the portable MIR spectrometer was considered overall superior to the VNIR instrument for SOC analysis. Our results further indicate that random errors in SOC reference values are effectively compensated for during model calibration, while biased SOC calibration data propagates errors into model predictions. Reference data uncertainty is thus more likely to negatively impact the estimated validation accuracy in soil spectroscopy studies where archived data, e.g., from soil spectral libraries, are used for model building, but it should be negligible otherwise. info:eu-repo/classification/ddc/620 ddc:620

Search results