• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 60
  • 4
  • 3
  • 1
  • 1
  • Tagged with
  • 85
  • 54
  • 50
  • 37
  • 33
  • 18
  • 15
  • 14
  • 13
  • 12
  • 12
  • 11
  • 10
  • 10
  • 9
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Optimizing ERP Recommendations Using Machine Learning Techniques

Jeremiah, Ante January 2023 (has links)
This study explores the application of a recommendation engine in collaboration with Fortnox. The primary focus of this paper is to find potential improvements for their recommendation engine in terms of accurate recommendation for users. This study evaluates the performance of various algorithms on imbalanced data without resampling, using EasyEnsemble undersampling, SMOTE oversampling, and weightedclass approaches. The results indicate that LinearSVC is the best algorithm without resampling. Decision Tree performs well when combined with EasyEnsemble, outperforming other algorithms. When using SMOTE, Decision Tree performs thebest with the default sampling strategy, while LinearSVC and MultinomialNB show similar results. Varying the threshold for SMOTE produces mixed results, with LinearSVC and MultinomialNB showing sensitivity to changes in the threshold value,while Decision Tree maintains consistent performance. Finally, when using weightedclass, Decision Tree outperforms LinearSVC in terms of accuracy and F1-Score.Overall, the findings provide insights into the performance of different algorithmson imbalanced data and highlight the effectiveness of certain techniques in addressing the class imbalance problem, and the algorithms’ sensitivity to changes with resampled data.
2

A Combined Approach to Handle Multi-class Imbalanced Data and to Adapt Concept Drifts using Machine Learning

Tumati, Saini 05 October 2021 (has links)
No description available.
3

SCUT-DS: Methodologies for Learning in Imbalanced Data Streams

Olaitan, Olubukola January 2018 (has links)
The automation of most of our activities has led to the continuous production of data that arrive in the form of fast-arriving streams. In a supervised learning setting, instances in these streams are labeled as belonging to a particular class. When the number of classes in the data stream is more than two, such a data stream is referred to as a multi-class data stream. Multi-class imbalanced data stream describes the situation where the instance distribution of the classes is skewed, such that instances of some classes occur more frequently than others. Classes with the frequently occurring instances are referred to as the majority classes, while the classes with instances that occur less frequently are denoted as the minority classes. Classification algorithms, or supervised learning techniques, use historic instances to build models, which are then used to predict the classes of unseen instances. Multi-class imbalanced data stream classification poses a great challenge to classical classification algorithms. This is due to the fact that traditional algorithms are usually biased towards the majority classes, since they have more examples of the majority classes when building the model. These traditional algorithms yield low predictive accuracy rates for the minority instances and need to be augmented, often with some form of sampling, in order to improve their overall performances. In the literature, in both static and streaming environments, most studies focus on the binary class imbalance problem. Furthermore, research in multi-class imbalance in the data stream environment is limited. A number of researchers have proceeded by transforming a multi-class imbalanced setting into multiple binary class problems. However, such a transformation does not allow the stream to be studied in the original form and may introduce bias. The research conducted in this thesis aims to address this research gap by proposing a novel online learning methodology that combines oversampling of the minority classes with cluster-based majority class under-sampling, without decomposing the data stream into multiple binary sets. Rather, sampling involves continuously selecting a balanced number of instances across all classes for model building. Our focus is on improving the rate of correctly predicting instances of the minority classes in multi-class imbalanced data streams, through the introduction of the Synthetic Minority Over-sampling Technique (SMOTE) and Cluster-based Under-sampling - Data Streams (SCUT-DS) methodologies. In this work, we dynamically balance the classes by utilizing a windowing mechanism during the incremental sampling process. Our SCUT-DS algorithms are evaluated using six different types of classification techniques, followed by comparing their results against a state-of-the-art algorithm. Our contributions are tested using both synthetic and real data sets. The experimental results show that the approaches developed in this thesis yield high prediction rates of minority instances as contained in the multiple minority classes within a non-evolving stream.
4

Induction in Hierarchical Multi-label Domains with Focus on Text Categorization

Dendamrongvit, Sareewan 02 May 2011 (has links)
Induction of classifiers from sets of preclassified training examples is one of the most popular machine learning tasks. This dissertation focuses on the techniques needed in the field of automated text categorization. Here, each document can be labeled with more than one class, sometimes with many classes. Moreover, the classes are hierarchically organized, the mutual relations being typically expressed in terms of a generalization tree. Both aspects (multi-label classification and hierarchically organized classes) have so far received inadequate attention. Existing literature work largely assumes that it is enough to induce a separate binary classifier for each class, and the question of class hierarchy is rarely addressed. This, however, ignores some serious problems. For one thing, induction of thousands of classifiers from hundreds of thousands of examples described by tens of thousands of features (a common case in automated text categorization) incurs prohibitive computational costs---even a single binary classifier in domains of this kind often takes hours, even days, to induce. For another, the circumstance that the classes are hierarchically organized affects the way we view the classification performance of the induced classifiers. The presented work proposes a technique referred to by the acronym "H-kNN-plus." The technique combines support vector machines and nearest neighbor classifiers with the intention to capitalize on the strengths of both. As for performance evaluation, a variety of measures have been used to evaluate hierarchical classifiers, including the standard non-hierarchical criteria that assign the same weight to different types of error. The author proposes a performance measure that overcomes some of their weaknesses. The dissertation begins with a study of (non-hierarchical) multi-label classification. One of the reasons for the poor performance of earlier techniques is the class-imbalance problem---a small number of positive examples being outnumbered by a great many negative examples. Another difficulty is that each of the classes tends to be characterized by a different set of characteristic features. This means that most of the binary classifiers are induced from examples described by predominantly irrelevant features. Addressing these weaknesses by majority-class undersampling and feature selection, the proposed technique significantly improves the overall classification performance. Even more challenging is the issue of hierarchical classification. Here, the dissertation introduces a new induction mechanism, H-kNN-plus, and subjects it to extensive experiments with two real-world datasets. The results indicate its superiority, in these domains, over earlier work in terms of prediction performance as well as computational costs.
5

CHRONIC PAIN A study on patients with chronic pain : What characteristics/variables lie behind the fact that a patient does not respond well to treatment?

Lindvall, Agnes, Chilaika, Ana January 2015 (has links)
The primary purpose of this study was to find out which variables lie behind the fact that patients who respond well to treatment of chronic pain differs from those who do not. We used logistic regression to predict group belonging based on the self-reported health surveys, i.e if different answers in the surveys can predict whether a patient is “responsive” or “unresponsive”. By bootstrapping 176 samples, and aggregating the results from 176 logistic regressions based on the sub-samples, we calculate an averaged model. The variables anxiety and physical health were significant in 76% and 70% of the models respectively, while depression was significant in 30% of the models. Gender was significant in 15% of the models and health status in 0,006%. The averaged model correctly classified the most unresponsive patients at cut-off value 0.5. As the cut –off value was increased, the number of correctly classified unresponsive patients decreased while the number of correctly classified responsive patients increased, as well as unresponsive patients classified as responsive. We concluded that the model did not discriminate enough between the two groups. We were also interested in finding out how the variables anxiety, depression, heath status, willingness to participate in activities as well as engagement in activities, mental and physical health relate with one another. The results from confirmatory factor analysis showed that a patient’s health status is highly related to their physical health and activity engagement while pain willingness and engagement in activity were least related. Furthermore, the analysis showed that mental health is highly related with anxiety and health status, indicating that mental health is indeed important to reflect upon when considering the health status of a patient.
6

Machine Learning Methods for High-Dimensional Imbalanced Biomedical Data

January 2013 (has links)
abstract: Learning from high dimensional biomedical data attracts lots of attention recently. High dimensional biomedical data often suffer from the curse of dimensionality and have imbalanced class distributions. Both of these features of biomedical data, high dimensionality and imbalanced class distributions, are challenging for traditional machine learning methods and may affect the model performance. In this thesis, I focus on developing learning methods for the high-dimensional imbalanced biomedical data. In the first part, a sparse canonical correlation analysis (CCA) method is presented. The penalty terms is used to control the sparsity of the projection matrices of CCA. The sparse CCA method is then applied to find patterns among biomedical data sets and labels, or to find patterns among different data sources. In the second part, I discuss several learning problems for imbalanced biomedical data. Note that traditional learning systems are often biased when the biomedical data are imbalanced. Therefore, traditional evaluations such as accuracy may be inappropriate for such cases. I then discuss several alternative evaluation criteria to evaluate the learning performance. For imbalanced binary classification problems, I use the undersampling based classifiers ensemble (UEM) strategy to obtain accurate models for both classes of samples. A small sphere and large margin (SSLM) approach is also presented to detect rare abnormal samples from a large number of subjects. In addition, I apply multiple feature selection and clustering methods to deal with high-dimensional data and data with highly correlated features. Experiments on high-dimensional imbalanced biomedical data are presented which illustrate the effectiveness and efficiency of my methods. / Dissertation/Thesis / M.S. Computer Science 2013
7

Fermion Pairing and BEC-BCS Crossover in Novel Systems

Liao, Renyuan 10 September 2008 (has links)
No description available.
8

FUZZY CLASSIFIERS FOR IMBALANCED DATA SETS

VISA, SOFIA 08 October 2007 (has links)
No description available.
9

A Segmentation and Re-balancing Approach for Classification of Imbalanced Data

Gong, Rongsheng 19 April 2011 (has links)
No description available.
10

Advanced Text Analytics and Machine Learning Approach for Document Classification

Anne, Chaitanya 19 May 2017 (has links)
Text classification is used in information extraction and retrieval from a given text, and text classification has been considered as an important step to manage a vast number of records given in digital form that is far-reaching and expanding. This thesis addresses patent document classification problem into fifteen different categories or classes, where some classes overlap with other classes for practical reasons. For the development of the classification model using machine learning techniques, useful features have been extracted from the given documents. The features are used to classify patent document as well as to generate useful tag-words. The overall objective of this work is to systematize NASA’s patent management, by developing a set of automated tools that can assist NASA to manage and market its portfolio of intellectual properties (IP), and to enable easier discovery of relevant IP by users. We have identified an array of methods that can be applied such as k-Nearest Neighbors (kNN), two variations of the Support Vector Machine (SVM) algorithms, and two tree based classification algorithms: Random Forest and J48. The major research steps in this work consist of filtering techniques for variable selection, information gain and feature correlation analysis, and training and testing potential models using effective classifiers. Further, the obstacles associated with the imbalanced data were mitigated by adding synthetic data wherever appropriate, which resulted in a superior SVM classifier based model.

Page generated in 0.0323 seconds