• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 36
  • 3
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 52
  • 52
  • 36
  • 29
  • 20
  • 19
  • 16
  • 13
  • 10
  • 9
  • 9
  • 9
  • 8
  • 8
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Support Vector Machines for Classification and Imputation

Rogers, Spencer David 16 May 2012 (has links) (PDF)
Support vector machines (SVMs) are a powerful tool for classification problems. SVMs have only been developed in the last 20 years with the availability of cheap and abundant computing power. SVMs are a non-statistical approach and make no assumptions about the distribution of the data. Here support vector machines are applied to a classic data set from the machine learning literature and the out-of-sample misclassification rates are compared to other classification methods. Finally, an algorithm for using support vector machines to address the difficulty in imputing missing categorical data is proposed and its performance is demonstrated under three different scenarios using data from the 1997 National Labor Survey.
12

Genetic Algorighm Representation Selection Impact on Binary Classification Problems

Maldonado, Stephen V 01 January 2022 (has links)
In this thesis, we explore the impact of problem representation on the ability for the genetic algorithms (GA) to evolve a binary prediction model to predict whether a physical therapist is paid above or below the median amount from Medicare. We explore three different problem representations, the vector GA (VGA), the binary GA (BGA), and the proportional GA (PGA). We find that all three representations can produce models with high accuracy and low loss that are better than Scikit-Learn’s logistic regression model and that all three representations select the same features; however, the PGA representation tends to create lower weights than the VGA and BGA. We also find that mutation rate creates more of a difference in accuracy when comparing the individual with the best fitness (lowest binary cross entropy loss) and the most accurate solution when the mutation rate is higher. We then explore potential of biases in the PGA mapping functions that may encourage the lower values. We find that the PGA has biases on the values they can encode depending on the mapping function; however, since we do not find a bias towards lower values for all tested mapping functions, it is more likely that it is more difficult for the PGA to encode more extreme values given crossover tends to have an averaging effect on the PGA chromosome.
13

Factors affecting the performance of trainable models for software defect prediction

Bowes, David Hutchinson January 2013 (has links)
Context. Reports suggest that defects in code cost the US in excess of $50billion per year to put right. Defect Prediction is an important part of Software Engineering. It allows developers to prioritise the code that needs to be inspected when trying to reduce the number of defects in code. A small change in the number of defects found will have a significant impact on the cost of producing software. Aims. The aim of this dissertation is to investigate the factors which a ect the performance of defect prediction models. Identifying the causes of variation in the way that variables are computed should help to improve the precision of defect prediction models and hence improve the cost e ectiveness of defect prediction. Methods. This dissertation is by published work. The first three papers examine variation in the independent variables (code metrics) and the dependent variable (number/location of defects). The fourth and fifth papers investigate the e ect that di erent learners and datasets have on the predictive performance of defect prediction models. The final paper investigates the reported use of di erent machine learning approaches in studies published between 2000 and 2010. Results. The first and second papers show that independent variables are sensitive to the measurement protocol used, this suggests that the way data is collected a ects the performance of defect prediction. The third paper shows that dependent variable data may be untrustworthy as there is no reliable method for labelling a unit of code as defective or not. The fourth and fifth papers show that the dataset and learner used when producing defect prediction models have an e ect on the performance of the models. The final paper shows that the approaches used by researchers to build defect prediction models is variable, with good practices being ignored in many papers. Conclusions. The measurement protocols for independent and dependent variables used for defect prediction need to be clearly described so that results can be compared like with like. It is possible that the predictive results of one research group have a higher performance value than another research group because of the way that they calculated the metrics rather than the method of building the model used to predict the defect prone modules. The machine learning approaches used by researchers need to be clearly reported in order to be able to improve the quality of defect prediction studies and allow a larger corpus of reliable results to be gathered.
14

Fuzzy-Granular Based Data Mining for Effective Decision Support in Biomedical Applications

He, Yuanchen 04 December 2006 (has links)
Due to complexity of biomedical problems, adaptive and intelligent knowledge discovery and data mining systems are highly needed to help humans to understand the inherent mechanism of diseases. For biomedical classification problems, typically it is impossible to build a perfect classifier with 100% prediction accuracy. Hence a more realistic target is to build an effective Decision Support System (DSS). In this dissertation, a novel adaptive Fuzzy Association Rules (FARs) mining algorithm, named FARM-DS, is proposed to build such a DSS for binary classification problems in the biomedical domain. Empirical studies show that FARM-DS is competitive to state-of-the-art classifiers in terms of prediction accuracy. More importantly, FARs can provide strong decision support on disease diagnoses due to their easy interpretability. This dissertation also proposes a fuzzy-granular method to select informative and discriminative genes from huge microarray gene expression data. With fuzzy granulation, information loss in the process of gene selection is decreased. As a result, more informative genes for cancer classification are selected and more accurate classifiers can be modeled. Empirical studies show that the proposed method is more accurate than traditional algorithms for cancer classification. And hence we expect that genes being selected can be more helpful for further biological studies.
15

Uma abordagem baseada em Perceptrons balanceados para geração de ensembles e redução do espaço de versões

Enes, Karen Braga 08 January 2016 (has links)
Submitted by Renata Lopes (renatasil82@gmail.com) on 2017-06-07T17:28:53Z No. of bitstreams: 1 karenbragaenes.pdf: 607859 bytes, checksum: f7907cc35c012dd829a223c7d46a7e6b (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-06-24T13:13:01Z (GMT) No. of bitstreams: 1 karenbragaenes.pdf: 607859 bytes, checksum: f7907cc35c012dd829a223c7d46a7e6b (MD5) / Made available in DSpace on 2017-06-24T13:13:01Z (GMT). No. of bitstreams: 1 karenbragaenes.pdf: 607859 bytes, checksum: f7907cc35c012dd829a223c7d46a7e6b (MD5) Previous issue date: 2016-01-08 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Recentemente, abordagens baseadas em ensemble de classificadores têm sido bastante exploradas por serem uma alternativa eficaz para a construção de classificadores mais acurados. A melhoria da capacidade de generalização de um ensemble está diretamente relacionada à acurácia individual e à diversidade de seus componentes. Este trabalho apresenta duas contribuições principais: um método ensemble gerado pela combinação de Perceptrons balanceados e um método para geração de uma hipótese equivalente ao voto majoritário de um ensemble. Para o método ensemble, os componentes são selecionados por medidas de diversidade, que inclui a introdução de uma medida de dissimilaridade, e avaliados segundo a média e o voto majoritário das soluções. No caso de voto majoritário, o teste de novas amostras deve ser realizado perante todas as hipóteses geradas. O método para geração da hipótese equivalente é utilizado para reduzir o custo desse teste. Essa hipótese é obtida a partir de uma estratégia iterativa de redução do espaço de versões. Um estudo experimental foi conduzido para avaliação dos métodos propostos. Os resultados mostram que os métodos propostos são capazes de superar, na maior parte dos casos, outros algoritmos testados como o SVM e o AdaBoost. Ao avaliar o método de redução do espaço de versões, os resultados obtidos mostram a equivalência da hipótese gerada com a votação de um ensemble de Perceptrons balanceados. / Recently, ensemble learning theory has received much attention in the machine learning community, since it has been demonstrated as a great alternative to generate more accurate predictors with higher generalization abilities. The improvement of generalization performance of an ensemble is directly related to the diversity and accuracy of the individual classifiers. In this work, we present two main contribuitions: we propose an ensemble method by combining Balanced Perceptrons and we also propose a method for generating a hypothesis equivalent to the majority voting of an ensemble. Considering the ensemble method, we select the components by using some diversity strategies, which include a dissimilarity measure. We also apply two strategies in view of combining the individual classifiers decisions: majority unweighted vote and the average of all components. Considering the majority vote strategy, the set of unseen samples must be evaluate towards the generated hypotheses. The method for generating a hypothesis equivalent to the majority voting of an ensemble is applied in order to reduce the costs of the test phase. The hypothesis is obtained by successive reductions of the version space. We conduct a experimental study to evaluate the proposed methods. Reported results show that our methods outperforms, on most cases, other classifiers such as SVM and AdaBoost. From the results of the reduction of the version space, we observe that the genareted hypothesis is, in fact, equivalent to the majority voting of an ensemble.
16

Oversampling Methods for Imbalanced Dataset Classification and their Application to Gynecological Disorder Diagnosis

Nekooeimehr, Iman 29 June 2016 (has links)
In many applications, the dataset for classification may be highly imbalanced where most of the instances in the training set may belong to some of the classes (majority classes), while only a few instances are from the other classes (minority classes). Conventional classifiers will strongly favor the majority class and ignore the minority instances. The imbalance problem can occur in both binary data classification and also in ordinal regression. Ordinal regression is a supervised approach for learning the ordinal relationship between classes. Extensive research has been performed for addressing imbalanced datasets for binary classification; however, current methods do not address within-class imbalance and between-class imbalance at the same time. Similarly, there has been very little research work on addressing imbalanced datasets for ordinal regression. Although current standard oversampling methods can be used to improve the dataset class distribution, they do not consider the ordinal relationship between the classes. The class imbalance problem is a big challenge in classification problems. Most of the clinical datasets are highly imbalanced, which can weaken the performance of classifiers significantly. In this research, the imbalanced dataset classification problem is also examined in the context of a clinical application, particularly pelvic organ prolapse diagnosis. Pelvic organ prolapse (POP) is a major health problem that affects between 30-50% of women in the U.S. Although clinical examination is currently used to diagnose POP, there is still little evidence on specific risk factors that are directly related to particular types of POP and their severity or stages (Stage 0-IV). Data from dynamic MRI related to the movement of pelvic organs has the potential to improve POP prediction but it is currently analyzed manually limiting its exploration and use to small datasets. Moreover, POP is a disorder with multiple stages that are ordinal and whose distribution is highly imbalanced. The main goal of this research is two-fold. The first goal is to design new oversampling methods for imbalanced datasets for both binary classification and ordinal regression. The second goal is to automatically track, segment, and classify the trajectory of multiple organs on dynamic MRI to quantitatively describe pelvic organ movement. The extracted image-based data along with the designed oversampling methods will be used to improve the diagnosis of POP. The proposed research consists of three major objectives: 1) to design a new oversampling technique for binary imbalanced dataset classification; 2) to design a novel oversampling technique for ordinal regression with imbalanced datasets; and 3) to design a two-stage method to automatically track and segment multiple pelvic organs on dynamic MRI for improving the prediction of multi-stage POP with imbalanced datasets. The proposed research aims to provide robust oversampling techniques and image processing models that can (1) effectively handle highly imbalanced datasets for both binary classification and ordinal regression, and (2) automatically track and segment multiple deformable structures for feature extraction from low contrast and nonhomogeneous images and classify them using the resulted trajectories. This research will set the foundation towards a computer-aided decision support system that can automatically extract and analyze image and clinical data to improve the prediction of disorders where the dataset is highly imbalanced through personalized and evidence-based assessment.
17

Detecting gastrointestinal abnormalities with binary classification of the Kvasir-Capsule dataset : A TensorFlow deep learning study / Detektering av gastrointenstinentala abnormaliteter med binär klassificering av datasetet Kvasir-Capsule : En TensoFlow djupinlärning studie

Hollstensson, Mathias January 2022 (has links)
The early discovery of gastrointestinal (GI) disorders can significantly decrease the fatality rate of severe afflictions. Video capsule endoscopy (VCE) is a technique that produces an eight hour long recording of the GI tract that needs to be manually reviewed. This has led to the demand for AI-based solutions, but unfortunately, the lack of labeled data has been a major obstacle. In 2020 the Kvasir-Capsule dataset was produced which is the largest labeled dataset of GI abnormalities to date, but challenges still exist.The dataset suffers from unbalanced and very similar data created from labeled video frames. To avoid specialization to the specific data the creators of the set constructed an official split which is encouraged to use for testing. This study evaluates the use of transfer learning, Data augmentation and binary classification to detect GI abnormalities. The performance of machine learning (ML) classification is explored, with and without official split-based testing. For the performance evaluation, a specific focus will be on achieving a low rate of false negatives. The proposition behind this is that the most important aspect of an automated detection system for GI abnormalities is a low miss rate of possible lethal abnormalities. The results from the controlled experiments conducted in this study clearly show the importance of using official split-based testing. The difference in performance between a model trained and tested on the same set and a model that uses official split-based testing is significant. This enforces that without the use of official split-based testing the model will not produce reliable and generalizable results. When using official split-based testing the performance is improved compared to the initial baseline that is presented with the Kvasir-Capsule set. Some experiments in the study produced results with as low as a 1.56% rate of false negatives but with the cost of lowered performance for the normal class.
18

Search for Stop using Machine Learning : A Bachelors Project in Physics

Gautam, Daniel January 2021 (has links)
In this thesis the application of machine learning algorithms as a tool in the search for top squark is studied. Two neural network models are trained with simulated stop events as signal against dileptonic and semi-leptonic top pair production events as background. There is a substantial class imbalance between the number of signal and background samples that are used. The performance of the neural network models are compared to the performance of a cut and count method. None of the models outperform the standard cut and count method.
19

Winner Prediction of Blood Bowl 2 Matches with Binary Classification

Gustafsson, Andreas January 2019 (has links)
Being able to predict the outcome of a game is useful in many aspects. Such as,to aid designers in the process of understanding how the game is played by theplayers, as well as how to be able to balance the elements within the game aretwo of those aspects. If one could predict the outcome of games with certaintythe design process could possibly be evolved into more of an experiment basedapproach where one can observe cause and effect to some degree. It has previouslybeen shown that it is possible to predict outcomes of games to varying degrees ofsuccess. However, there is a lack of research which compares and evaluates severaldifferent models on the same domain with common aims. To narrow this identifiedgap an experiment is conducted to compare and analyze seven different classifierswithin the same domain. The classifiers are then ranked on accuracy against eachother with help of appropriate statistical methods. The classifiers compete onthe task of predicting which team will win or lose in a match of the game BloodBowl 2. For nuance three different datasets are made for the models to be trainedon. While the results vary between the models of the various datasets the general consensus has an identifiable pattern of rejections. The results also indicatea strong accuracy for Support Vector Machine and Logistic Regression across allthe datasets.
20

Stronger Together? An Ensemble of CNNs for Deepfakes Detection / Starkare Tillsammans? En Ensemble av CNNs för att Identifiera Deepfakes

Gardner, Angelica January 2020 (has links)
Deepfakes technology is a face swap technique that enables anyone to replace faces in a video, with highly realistic results. Despite its usefulness, if used maliciously, this technique can have a significant impact on society, for instance, through the spreading of fake news or cyberbullying. This makes the ability of deepfakes detection a problem of utmost importance. In this paper, I tackle the problem of deepfakes detection by identifying deepfakes forgeries in video sequences. Inspired by the state-of-the-art, I study the ensembling of different machine learning solutions built on convolutional neural networks (CNNs) and use these models as objects for comparison between ensemble and single model performances. Existing work in the research field of deepfakes detection suggests that escalated challenges posed by modern deepfake videos make it increasingly difficult for detection methods. I evaluate that claim by testing the detection performance of four single CNN models as well as six stacked ensembles on three modern deepfakes datasets. I compare various ensemble approaches to combine single models and in what way their predictions should be incorporated into the ensemble output. The results I found was that the best approach for deepfakes detection is to create an ensemble, though, the ensemble approach plays a crucial role in the detection performance. The final proposed solution is an ensemble of all available single models which use the concept of soft (weighted) voting to combine its base-learners’ predictions. Results show that this proposed solution significantly improved deepfakes detection performance and substantially outperformed all single models.

Page generated in 0.105 seconds