Global ETD Search

1	Optimizing ERP Recommendations Using Machine Learning Techniques Jeremiah, Ante January 2023 (has links) This study explores the application of a recommendation engine in collaboration with Fortnox. The primary focus of this paper is to find potential improvements for their recommendation engine in terms of accurate recommendation for users. This study evaluates the performance of various algorithms on imbalanced data without resampling, using EasyEnsemble undersampling, SMOTE oversampling, and weightedclass approaches. The results indicate that LinearSVC is the best algorithm without resampling. Decision Tree performs well when combined with EasyEnsemble, outperforming other algorithms. When using SMOTE, Decision Tree performs thebest with the default sampling strategy, while LinearSVC and MultinomialNB show similar results. Varying the threshold for SMOTE produces mixed results, with LinearSVC and MultinomialNB showing sensitivity to changes in the threshold value,while Decision Tree maintains consistent performance. Finally, when using weightedclass, Decision Tree outperforms LinearSVC in terms of accuracy and F1-Score.Overall, the findings provide insights into the performance of different algorithmson imbalanced data and highlight the effectiveness of certain techniques in addressing the class imbalance problem, and the algorithms’ sensitivity to changes with resampled data. Machine Learning Imbalanced Engineering and Technology Teknik och teknologier
2	A Combined Approach to Handle Multi-class Imbalanced Data and to Adapt Concept Drifts using Machine Learning Tumati, Saini 05 October 2021 (has links) No description available. Computer Science Imbalanced datasets Multi-class imbalanced datasets Oversampling Concept drifts Machine learning ensemble learning
3	SCUT-DS: Methodologies for Learning in Imbalanced Data Streams Olaitan, Olubukola January 2018 (has links) The automation of most of our activities has led to the continuous production of data that arrive in the form of fast-arriving streams. In a supervised learning setting, instances in these streams are labeled as belonging to a particular class. When the number of classes in the data stream is more than two, such a data stream is referred to as a multi-class data stream. Multi-class imbalanced data stream describes the situation where the instance distribution of the classes is skewed, such that instances of some classes occur more frequently than others. Classes with the frequently occurring instances are referred to as the majority classes, while the classes with instances that occur less frequently are denoted as the minority classes. Classification algorithms, or supervised learning techniques, use historic instances to build models, which are then used to predict the classes of unseen instances. Multi-class imbalanced data stream classification poses a great challenge to classical classification algorithms. This is due to the fact that traditional algorithms are usually biased towards the majority classes, since they have more examples of the majority classes when building the model. These traditional algorithms yield low predictive accuracy rates for the minority instances and need to be augmented, often with some form of sampling, in order to improve their overall performances. In the literature, in both static and streaming environments, most studies focus on the binary class imbalance problem. Furthermore, research in multi-class imbalance in the data stream environment is limited. A number of researchers have proceeded by transforming a multi-class imbalanced setting into multiple binary class problems. However, such a transformation does not allow the stream to be studied in the original form and may introduce bias. The research conducted in this thesis aims to address this research gap by proposing a novel online learning methodology that combines oversampling of the minority classes with cluster-based majority class under-sampling, without decomposing the data stream into multiple binary sets. Rather, sampling involves continuously selecting a balanced number of instances across all classes for model building. Our focus is on improving the rate of correctly predicting instances of the minority classes in multi-class imbalanced data streams, through the introduction of the Synthetic Minority Over-sampling Technique (SMOTE) and Cluster-based Under-sampling - Data Streams (SCUT-DS) methodologies. In this work, we dynamically balance the classes by utilizing a windowing mechanism during the incremental sampling process. Our SCUT-DS algorithms are evaluated using six different types of classification techniques, followed by comparing their results against a state-of-the-art algorithm. Our contributions are tested using both synthetic and real data sets. The experimental results show that the approaches developed in this thesis yield high prediction rates of minority instances as contained in the multiple minority classes within a non-evolving stream. Multi-class Imbalanced Learning Imbalanced data sets Data streams Classification Imbalanced Learning Sampling Cluster-based Under-sampling Synthetic Oversampling Augmenting Minority Examples Online Learning SMOTE-based Oversampling
4	Induction in Hierarchical Multi-label Domains with Focus on Text Categorization Dendamrongvit, Sareewan 02 May 2011 (has links) Induction of classifiers from sets of preclassified training examples is one of the most popular machine learning tasks. This dissertation focuses on the techniques needed in the field of automated text categorization. Here, each document can be labeled with more than one class, sometimes with many classes. Moreover, the classes are hierarchically organized, the mutual relations being typically expressed in terms of a generalization tree. Both aspects (multi-label classification and hierarchically organized classes) have so far received inadequate attention. Existing literature work largely assumes that it is enough to induce a separate binary classifier for each class, and the question of class hierarchy is rarely addressed. This, however, ignores some serious problems. For one thing, induction of thousands of classifiers from hundreds of thousands of examples described by tens of thousands of features (a common case in automated text categorization) incurs prohibitive computational costs---even a single binary classifier in domains of this kind often takes hours, even days, to induce. For another, the circumstance that the classes are hierarchically organized affects the way we view the classification performance of the induced classifiers. The presented work proposes a technique referred to by the acronym "H-kNN-plus." The technique combines support vector machines and nearest neighbor classifiers with the intention to capitalize on the strengths of both. As for performance evaluation, a variety of measures have been used to evaluate hierarchical classifiers, including the standard non-hierarchical criteria that assign the same weight to different types of error. The author proposes a performance measure that overcomes some of their weaknesses. The dissertation begins with a study of (non-hierarchical) multi-label classification. One of the reasons for the poor performance of earlier techniques is the class-imbalance problem---a small number of positive examples being outnumbered by a great many negative examples. Another difficulty is that each of the classes tends to be characterized by a different set of characteristic features. This means that most of the binary classifiers are induced from examples described by predominantly irrelevant features. Addressing these weaknesses by majority-class undersampling and feature selection, the proposed technique significantly improves the overall classification performance. Even more challenging is the issue of hierarchical classification. Here, the dissertation introduces a new induction mechanism, H-kNN-plus, and subjects it to extensive experiments with two real-world datasets. The results indicate its superiority, in these domains, over earlier work in terms of prediction performance as well as computational costs. Induction Text categorization Hierarchical classification Multi-label examples Imbalanced classes
5	CHRONIC PAIN A study on patients with chronic pain : What characteristics/variables lie behind the fact that a patient does not respond well to treatment? Lindvall, Agnes, Chilaika, Ana January 2015 (has links) The primary purpose of this study was to find out which variables lie behind the fact that patients who respond well to treatment of chronic pain differs from those who do not. We used logistic regression to predict group belonging based on the self-reported health surveys, i.e if different answers in the surveys can predict whether a patient is “responsive” or “unresponsive”. By bootstrapping 176 samples, and aggregating the results from 176 logistic regressions based on the sub-samples, we calculate an averaged model. The variables anxiety and physical health were significant in 76% and 70% of the models respectively, while depression was significant in 30% of the models. Gender was significant in 15% of the models and health status in 0,006%. The averaged model correctly classified the most unresponsive patients at cut-off value 0.5. As the cut –off value was increased, the number of correctly classified unresponsive patients decreased while the number of correctly classified responsive patients increased, as well as unresponsive patients classified as responsive. We concluded that the model did not discriminate enough between the two groups. We were also interested in finding out how the variables anxiety, depression, heath status, willingness to participate in activities as well as engagement in activities, mental and physical health relate with one another. The results from confirmatory factor analysis showed that a patient’s health status is highly related to their physical health and activity engagement while pain willingness and engagement in activity were least related. Furthermore, the analysis showed that mental health is highly related with anxiety and health status, indicating that mental health is indeed important to reflect upon when considering the health status of a patient. Chronic pain Logistic regression Confirmatory factor analysis imbalanced data
6	Machine Learning Methods for High-Dimensional Imbalanced Biomedical Data January 2013 (has links) abstract: Learning from high dimensional biomedical data attracts lots of attention recently. High dimensional biomedical data often suffer from the curse of dimensionality and have imbalanced class distributions. Both of these features of biomedical data, high dimensionality and imbalanced class distributions, are challenging for traditional machine learning methods and may affect the model performance. In this thesis, I focus on developing learning methods for the high-dimensional imbalanced biomedical data. In the first part, a sparse canonical correlation analysis (CCA) method is presented. The penalty terms is used to control the sparsity of the projection matrices of CCA. The sparse CCA method is then applied to find patterns among biomedical data sets and labels, or to find patterns among different data sources. In the second part, I discuss several learning problems for imbalanced biomedical data. Note that traditional learning systems are often biased when the biomedical data are imbalanced. Therefore, traditional evaluations such as accuracy may be inappropriate for such cases. I then discuss several alternative evaluation criteria to evaluate the learning performance. For imbalanced binary classification problems, I use the undersampling based classifiers ensemble (UEM) strategy to obtain accurate models for both classes of samples. A small sphere and large margin (SSLM) approach is also presented to detect rare abnormal samples from a large number of subjects. In addition, I apply multiple feature selection and clustering methods to deal with high-dimensional data and data with highly correlated features. Experiments on high-dimensional imbalanced biomedical data are presented which illustrate the effectiveness and efficiency of my methods. / Dissertation/Thesis / M.S. Computer Science 2013 Computer science Biomedical Data High-Dimensional Imbalanced Machine Learning
7	Fermion Pairing and BEC-BCS Crossover in Novel Systems Liao, Renyuan 10 September 2008 (has links) No description available. Physics Fermion pairing BEC-BCS crossover cold atom imbalanced population
8	FUZZY CLASSIFIERS FOR IMBALANCED DATA SETS VISA, SOFIA 08 October 2007 (has links) No description available. Computer Science fuzzy classifiers imbalanced data machine learning
9	A Segmentation and Re-balancing Approach for Classification of Imbalanced Data Gong, Rongsheng 19 April 2011 (has links) No description available. Industrial Engineering Imbalanced Data Classification Segmentation K-S Statistics
10	Improving Text Classification Using Graph-based Methods Karajeh, Ola Abdel-Raheem Mohammed 05 June 2024 (has links) Text classification is a fundamental natural language processing task. However, in real-world applications, class distributions are usually skewed, e.g., due to inherent class imbalance. In addition, the task difficulty changes based on the underlying language. When rich morphological structure and high ambiguity are exhibited, natural language understanding can become challenging. For example, Arabic, ranked the fifth most widely used language, has a rich morphological structure and high ambiguity that result from Arabic orthography. Thus, Arabic natural language processing is challenging. Several studies employ Long Short- Term Memory (LSTM) networks and Convolutional Neural Networks (CNNs), but Graph Convolutional Networks (GCNs) have not yet been investigated for the task. Sequence- based models can successfully capture semantics in local consecutive text sequences. On the other hand, graph-based models can preserve global co-occurrences that capture non- consecutive and long-distance semantics. A text representation approach that combines local and global information can enhance performance in practical class imbalance text classification scenarios. Yet, multi-view graph-based text representations have received limited attention. In this research, first we introduce Multi-view Minority Class Text Graph Convolutional Network (MMCT-GCN), a transductive multi-view text classification model that captures textual graph representations for the minority class alongside sequence-based text representations. Experimental results show that MMCT-GCN obtains consistent improvements over baselines. Second, we develop an Arabic Bidirectional Encoder Representations from Transformers (BERT) Graph Convolutional Network (AraBERT-GCN), a hybrid model that combines the large-scale pre-trained models that encode the local context and semantics alongside graph-based features that are capable of extracting the global word co-occurrences in non-consecutive extended semantics by only one or two hops. Experimental results show that AraBERT-GCN outperforms the state-of-the-art (SOTA) on our Arabic text datasets. Finally, we propose an Arabic Multidimensional Edge Graph Convolutional Network (AraMEGraph) designed for text classification that encapsulates richer and context-aware representations of word and phrase relationships, thus mitigating the impact of the complexity and ambiguity of the Arabic language. / Doctor of Philosophy / The text classification task is an important step in understanding natural language. However, this task has many challenges, such as uneven data distributions and language difficulty. For example, Arabic is the fifth most spoken language. It has many different word forms and meanings, which can make things harder to understand. Long Short-Term Memory (LSTM) networks and Convolutional Neural Networks (CNNs) are widely utilized for text classification. However, another kind of network called graph convolutional network (GCN) has yet to be explored for this task. Graph-based models keep track of how words are connected, even if they are not right next to each other in a sentence. This helps with better understanding the meaning of words. On the other hand, sequence-based models do well in understanding the meaning of words that are right next to each other. Mixing both types of information in text understanding can work better, especially when dealing with unevenly distributed data. In this research, we introduce a new text classification method called Multi-view Minority Class Text Graph Convolutional Network (MMCT-GCN). This model looks at text from different angles and combines information from graphs and sequence-based models. Our experiments show that this model performs better than other ones proposed in the literature. Additionally, we propose an Arabic BERT Graph Convolutional Network (AraBERT-GCN). It combines pre-trained models that understand words in context and graph features that look at how words relate to each other globally. This helps AraBERT- GCN do better than other models when working with Arabic text. Finally, we develop a special network called Arabic Multidimensional Edge Graph Convolutional Network (AraMEGraph) for Arabic text. It is designed to better understand Arabic and classify text more accurately. We do this by adding special edge features with multiple dimensions to help the network learn the relationships between words and phrases. Graph convolutional networks Text classification Tweets Imbalanced data Arabic

Search results