Global ETD Search

11	Genetic Algorighm Representation Selection Impact on Binary Classification Problems Maldonado, Stephen V 01 January 2022 (has links) In this thesis, we explore the impact of problem representation on the ability for the genetic algorithms (GA) to evolve a binary prediction model to predict whether a physical therapist is paid above or below the median amount from Medicare. We explore three different problem representations, the vector GA (VGA), the binary GA (BGA), and the proportional GA (PGA). We find that all three representations can produce models with high accuracy and low loss that are better than Scikit-Learn’s logistic regression model and that all three representations select the same features; however, the PGA representation tends to create lower weights than the VGA and BGA. We also find that mutation rate creates more of a difference in accuracy when comparing the individual with the best fitness (lowest binary cross entropy loss) and the most accurate solution when the mutation rate is higher. We then explore potential of biases in the PGA mapping functions that may encourage the lower values. We find that the PGA has biases on the values they can encode depending on the mapping function; however, since we do not find a bias towards lower values for all tested mapping functions, it is more likely that it is more difficult for the PGA to encode more extreme values given crossover tends to have an averaging effect on the PGA chromosome. Genetic Algorithms Binary Classification Optimization Machine Learning Linear Models Predictive Modeling Computer Sciences
12	Data driven driving evaluation : A supervised machine learning approach for classification of high frequency triaxial acceleration Lundberg, Henrik January 2024 (has links) The ability to navigate through a continuously changing business landscape has been a success factor for Scania to stay a competitive business, when the landscape continues to change. Digitalization has enabled data to be collected from various sources and the ability to embrace the possibilities that come with it and turn it into an advantage is crucial to make sure that Scania is driving the changing industry. Today, Scania is good at collecting and analyzing data but there is room for improvements when it comes to utilizing the data to create data-driven decision-making. This study aims to investigate the possibility of learning more about the users driving behavior through data-driven driving evaluation. This is done with a machine learning approach where a CNN-GRU neural network with an XGBoost classifier is created to classify triaxial acceleration data into normal or aggressive driving behavior. The findings show that this model architecture has a classification accuracy of 87.80 % and the result is discussed with respect to method implementation, quality of data, hyperparameter tuning, and future studies. Time series data supervised machine learning binary classification neural network decision tree Mathematics Matematik
13	Factors affecting the performance of trainable models for software defect prediction Bowes, David Hutchinson January 2013 (has links) Context. Reports suggest that defects in code cost the US in excess of $50billion per year to put right. Defect Prediction is an important part of Software Engineering. It allows developers to prioritise the code that needs to be inspected when trying to reduce the number of defects in code. A small change in the number of defects found will have a significant impact on the cost of producing software. Aims. The aim of this dissertation is to investigate the factors which a ect the performance of defect prediction models. Identifying the causes of variation in the way that variables are computed should help to improve the precision of defect prediction models and hence improve the cost e ectiveness of defect prediction. Methods. This dissertation is by published work. The first three papers examine variation in the independent variables (code metrics) and the dependent variable (number/location of defects). The fourth and fifth papers investigate the e ect that di erent learners and datasets have on the predictive performance of defect prediction models. The final paper investigates the reported use of di erent machine learning approaches in studies published between 2000 and 2010. Results. The first and second papers show that independent variables are sensitive to the measurement protocol used, this suggests that the way data is collected a ects the performance of defect prediction. The third paper shows that dependent variable data may be untrustworthy as there is no reliable method for labelling a unit of code as defective or not. The fourth and fifth papers show that the dataset and learner used when producing defect prediction models have an e ect on the performance of the models. The final paper shows that the approaches used by researchers to build defect prediction models is variable, with good practices being ignored in many papers. Conclusions. The measurement protocols for independent and dependent variables used for defect prediction need to be clearly described so that results can be compared like with like. It is possible that the predictive results of one research group have a higher performance value than another research group because of the way that they calculated the metrics rather than the method of building the model used to predict the defect prone modules. The machine learning approaches used by researchers need to be clearly reported in order to be able to improve the quality of defect prediction studies and allow a larger corpus of reliable results to be gathered. 005.3
14	Fuzzy-Granular Based Data Mining for Effective Decision Support in Biomedical Applications He, Yuanchen 04 December 2006 (has links) Due to complexity of biomedical problems, adaptive and intelligent knowledge discovery and data mining systems are highly needed to help humans to understand the inherent mechanism of diseases. For biomedical classification problems, typically it is impossible to build a perfect classifier with 100% prediction accuracy. Hence a more realistic target is to build an effective Decision Support System (DSS). In this dissertation, a novel adaptive Fuzzy Association Rules (FARs) mining algorithm, named FARM-DS, is proposed to build such a DSS for binary classification problems in the biomedical domain. Empirical studies show that FARM-DS is competitive to state-of-the-art classifiers in terms of prediction accuracy. More importantly, FARs can provide strong decision support on disease diagnoses due to their easy interpretability. This dissertation also proposes a fuzzy-granular method to select informative and discriminative genes from huge microarray gene expression data. With fuzzy granulation, information loss in the process of gene selection is decreased. As a result, more informative genes for cancer classification are selected and more accurate classifiers can be modeled. Empirical studies show that the proposed method is more accurate than traditional algorithms for cancer classification. And hence we expect that genes being selected can be more helpful for further biological studies. Granular Computing Fuzzy Association Rule Mining Decision Support System Binary Classification Bioinformatics Computational Intelligence Data Mining Knowledge Discovery Computer Sciences
15	Uma abordagem baseada em Perceptrons balanceados para geração de ensembles e redução do espaço de versões Enes, Karen Braga 08 January 2016 (has links) Submitted by Renata Lopes (renatasil82@gmail.com) on 2017-06-07T17:28:53Z No. of bitstreams: 1 karenbragaenes.pdf: 607859 bytes, checksum: f7907cc35c012dd829a223c7d46a7e6b (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-06-24T13:13:01Z (GMT) No. of bitstreams: 1 karenbragaenes.pdf: 607859 bytes, checksum: f7907cc35c012dd829a223c7d46a7e6b (MD5) / Made available in DSpace on 2017-06-24T13:13:01Z (GMT). No. of bitstreams: 1 karenbragaenes.pdf: 607859 bytes, checksum: f7907cc35c012dd829a223c7d46a7e6b (MD5) Previous issue date: 2016-01-08 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Recentemente, abordagens baseadas em ensemble de classificadores têm sido bastante exploradas por serem uma alternativa eficaz para a construção de classificadores mais acurados. A melhoria da capacidade de generalização de um ensemble está diretamente relacionada à acurácia individual e à diversidade de seus componentes. Este trabalho apresenta duas contribuições principais: um método ensemble gerado pela combinação de Perceptrons balanceados e um método para geração de uma hipótese equivalente ao voto majoritário de um ensemble. Para o método ensemble, os componentes são selecionados por medidas de diversidade, que inclui a introdução de uma medida de dissimilaridade, e avaliados segundo a média e o voto majoritário das soluções. No caso de voto majoritário, o teste de novas amostras deve ser realizado perante todas as hipóteses geradas. O método para geração da hipótese equivalente é utilizado para reduzir o custo desse teste. Essa hipótese é obtida a partir de uma estratégia iterativa de redução do espaço de versões. Um estudo experimental foi conduzido para avaliação dos métodos propostos. Os resultados mostram que os métodos propostos são capazes de superar, na maior parte dos casos, outros algoritmos testados como o SVM e o AdaBoost. Ao avaliar o método de redução do espaço de versões, os resultados obtidos mostram a equivalência da hipótese gerada com a votação de um ensemble de Perceptrons balanceados. / Recently, ensemble learning theory has received much attention in the machine learning community, since it has been demonstrated as a great alternative to generate more accurate predictors with higher generalization abilities. The improvement of generalization performance of an ensemble is directly related to the diversity and accuracy of the individual classifiers. In this work, we present two main contribuitions: we propose an ensemble method by combining Balanced Perceptrons and we also propose a method for generating a hypothesis equivalent to the majority voting of an ensemble. Considering the ensemble method, we select the components by using some diversity strategies, which include a dissimilarity measure. We also apply two strategies in view of combining the individual classifiers decisions: majority unweighted vote and the average of all components. Considering the majority vote strategy, the set of unseen samples must be evaluate towards the generated hypotheses. The method for generating a hypothesis equivalent to the majority voting of an ensemble is applied in order to reduce the costs of the test phase. The hypothesis is obtained by successive reductions of the version space. We conduct a experimental study to evaluate the proposed methods. Reported results show that our methods outperforms, on most cases, other classifiers such as SVM and AdaBoost. From the results of the reduction of the version space, we observe that the genareted hypothesis is, in fact, equivalent to the majority voting of an ensemble. Perceptron Classificação binária Métodos ensemble Espaço de versões Perceptron Binary Classification Ensemble Methods Version Space
16	Detecting gastrointestinal abnormalities with binary classification of the Kvasir-Capsule dataset : A TensorFlow deep learning study / Detektering av gastrointenstinentala abnormaliteter med binär klassificering av datasetet Kvasir-Capsule : En TensoFlow djupinlärning studie Hollstensson, Mathias January 2022 (has links) The early discovery of gastrointestinal (GI) disorders can significantly decrease the fatality rate of severe afflictions. Video capsule endoscopy (VCE) is a technique that produces an eight hour long recording of the GI tract that needs to be manually reviewed. This has led to the demand for AI-based solutions, but unfortunately, the lack of labeled data has been a major obstacle. In 2020 the Kvasir-Capsule dataset was produced which is the largest labeled dataset of GI abnormalities to date, but challenges still exist.The dataset suffers from unbalanced and very similar data created from labeled video frames. To avoid specialization to the specific data the creators of the set constructed an official split which is encouraged to use for testing. This study evaluates the use of transfer learning, Data augmentation and binary classification to detect GI abnormalities. The performance of machine learning (ML) classification is explored, with and without official split-based testing. For the performance evaluation, a specific focus will be on achieving a low rate of false negatives. The proposition behind this is that the most important aspect of an automated detection system for GI abnormalities is a low miss rate of possible lethal abnormalities. The results from the controlled experiments conducted in this study clearly show the importance of using official split-based testing. The difference in performance between a model trained and tested on the same set and a model that uses official split-based testing is significant. This enforces that without the use of official split-based testing the model will not produce reliable and generalizable results. When using official split-based testing the performance is improved compared to the initial baseline that is presented with the Kvasir-Capsule set. Some experiments in the study produced results with as low as a 1.56% rate of false negatives but with the cost of lowered performance for the normal class. TensorFlow Image classification Transfer learning Binary classification Data augmentation Video capsule endoscopy Kvasir-Capsule. Computer Sciences Datavetenskap (datalogi)
17	Search for Stop using Machine Learning : A Bachelors Project in Physics Gautam, Daniel January 2021 (has links) In this thesis the application of machine learning algorithms as a tool in the search for top squark is studied. Two neural network models are trained with simulated stop events as signal against dileptonic and semi-leptonic top pair production events as background. There is a substantial class imbalance between the number of signal and background samples that are used. The performance of the neural network models are compared to the performance of a cut and count method. None of the models outperform the standard cut and count method. Particle physics ATLAS Supersymmetry Machine learning Deep neural network Stop Top squark Binary classification Subatomic Physics Subatomär fysik
18	Winner Prediction of Blood Bowl 2 Matches with Binary Classification Gustafsson, Andreas January 2019 (has links) Being able to predict the outcome of a game is useful in many aspects. Such as,to aid designers in the process of understanding how the game is played by theplayers, as well as how to be able to balance the elements within the game aretwo of those aspects. If one could predict the outcome of games with certaintythe design process could possibly be evolved into more of an experiment basedapproach where one can observe cause and effect to some degree. It has previouslybeen shown that it is possible to predict outcomes of games to varying degrees ofsuccess. However, there is a lack of research which compares and evaluates severaldifferent models on the same domain with common aims. To narrow this identifiedgap an experiment is conducted to compare and analyze seven different classifierswithin the same domain. The classifiers are then ranked on accuracy against eachother with help of appropriate statistical methods. The classifiers compete onthe task of predicting which team will win or lose in a match of the game BloodBowl 2. For nuance three different datasets are made for the models to be trainedon. While the results vary between the models of the various datasets the general consensus has an identifiable pattern of rejections. The results also indicatea strong accuracy for Support Vector Machine and Logistic Regression across allthe datasets. Machine learning Binary classification Blood Bowl 2 Predict winner Outcome prediction Supervised learning Match prediction Engineering and Technology Teknik och teknologier
19	Stronger Together? An Ensemble of CNNs for Deepfakes Detection / Starkare Tillsammans? En Ensemble av CNNs för att Identifiera Deepfakes Gardner, Angelica January 2020 (has links) Deepfakes technology is a face swap technique that enables anyone to replace faces in a video, with highly realistic results. Despite its usefulness, if used maliciously, this technique can have a significant impact on society, for instance, through the spreading of fake news or cyberbullying. This makes the ability of deepfakes detection a problem of utmost importance. In this paper, I tackle the problem of deepfakes detection by identifying deepfakes forgeries in video sequences. Inspired by the state-of-the-art, I study the ensembling of different machine learning solutions built on convolutional neural networks (CNNs) and use these models as objects for comparison between ensemble and single model performances. Existing work in the research field of deepfakes detection suggests that escalated challenges posed by modern deepfake videos make it increasingly difficult for detection methods. I evaluate that claim by testing the detection performance of four single CNN models as well as six stacked ensembles on three modern deepfakes datasets. I compare various ensemble approaches to combine single models and in what way their predictions should be incorporated into the ensemble output. The results I found was that the best approach for deepfakes detection is to create an ensemble, though, the ensemble approach plays a crucial role in the detection performance. The final proposed solution is an ensemble of all available single models which use the concept of soft (weighted) voting to combine its base-learners’ predictions. Results show that this proposed solution significantly improved deepfakes detection performance and substantially outperformed all single models. deepfakes deepfakes detection supervised learning binary classification convolutional neural networks ensemble learning stacking Computer Sciences Datavetenskap (datalogi)
20	Network Interconnectivity Prediction from SCADA System Data : A Case Study in the Wastewater Industry / Prediktion av Nätverkssammankoppling från Data Genererat av SCADA System : En fallstudie inom avloppsindustrin Isacson, Jonas January 2019 (has links) Increased strain on incumbent wastewater distribution networks originating from population increases as well as climate change calls for enhanced resource utilization. Accurately being able to predict network interconnectivity is vital within the wastewater industry to enable operational management strategies that optimizes the performance of the wastewater system. In this thesis, an evaluation of the network interconnectivity prediction performance of two machine learning models, the multilayer perceptron (MLP) and the support vector machine (SVM), utilizing supervisory control and dataacquisition (SCADA) system data for a wastewater system is presented. Results of the thesis imply that the MLP achieves the best predictions of the network interconnectivity. The thesis concludes that the MLP is the superior model and that the highest achievable network interconnectivity accuracy is 56% which is attained by the MLP model. / Den ökade påfrestningen på nuvarande avloppsnät till följd av befolkningstillväxt och klimatförändringar medför att det finns behov för optimerad resursförbrukning. Att korrekt kunna predicera ett avloppsnät är önskvärt då det möjliggör för effektivitetshöjande operativ förvaltning av avloppssystemet. I denna avhandling evalueras hur väl två maskininlärningsmodeller kan predicera nätverketssammankoppling med data från ett system för övervakning och kontroll av data (SCADA) genererat av ett avloppsnätverk. De två modellerna som testas är en multilagersperceptron (MLP) och en stödvektormaskin (SVM). Resultaten av avhandlingen visar på att MLP modellen uppnår den bästa prediktionen av nätverketssammankoppling. Avhandlingen konkluderar att MLP modellen är den bästa modellen för att predicera nätverkets sammankoppling samt att den högsta nåbara korrektheten var 56% vilket uppnåddes av MLP modellen. MLP SVM IoT Binary Classification Random Forest Network Predicition Wastewater Distrubtion Network SCADA Industry 4.0 Engineering and Technology Teknik och teknologier

Search results