Global ETD Search

81	Fuzzy Classification Models Based On Tanaka Ozer, Gizem 01 July 2009 (has links) (PDF) In some classification problems where human judgments, qualitative and imprecise data exist, uncertainty comes from fuzziness rather than randomness. Limited number of fuzzy classification approaches is available for use for these classification problems to capture the effect of fuzzy uncertainty imbedded in data. The scope of this study mainly comprises two parts: new fuzzy classification approaches based on Tanaka&rsquo / s Fuzzy Linear Regression (FLR) approach, and an improvement of an existing one, Improved Fuzzy Classifier Functions (IFCF). Tanaka&rsquo / s FLR approach is a well known fuzzy regression technique used for the prediction problems including fuzzy type of uncertainty. In the first part of the study, three alternative approaches are presented, which utilize the FLR approach for a particular customer satisfaction classification problem. A comparison of their performances and their applicability in other cases are discussed. In the second part of the study, the improved IFCF method, Nonparametric Improved Fuzzy Classifier Functions (NIFCF), is presented, which proposes to use a nonparametric method, Multivariate Adaptive Regression Splines (MARS), in clustering phase of the IFCF method. NIFCF method is applied on three data sets, and compared with Fuzzy Classifier Function (FCF) and Logistic Regression (LR) methods.
82	Classification Analysis Techniques for Skewed Class Chyi, Yu-Meei 12 February 2003 (has links) Abstract Existing classification analysis techniques (e.g., decision tree induction, backpropagation neural network, k-nearest neighbor classification, etc.) generally exhibit satisfactory classification effectiveness when dealing with data with non-skewed class distribution. However, real-world applications (e.g., churn prediction and fraud detection) often involve highly skewed data in decision outcomes (e.g., 2% churners and 98% non-churners). Such a highly skewed class distribution problem, if not properly addressed, would imperil the resulting learning effectiveness and might result in a ¡§null¡¨ prediction system that simply predicts all instances as having the majority decision class as the training instances (e.g., predicting all customers as non-churners). In this study, we extended the multi-classifier class-combiner approach and proposed a clustering-based multi-classifier class-combiner technique to address the highly skewed class distribution problem in classification analysis. In addition, we proposed four distance-based methods for selecting a subset of instances having the majority decision class for lowering the degree of skewness in a data set. Using two real-world datasets (including mortality prediction for burn patients and customer loyalty prediction), empirical results suggested that the proposed clustering-based multi-classifier class-combiner technique generally outperformed the traditional multi-classifier class-combiner approach and the four distance-based methods. Keywords: Data Mining, Classification Analysis, Skewed Class Distribution Problem, Decision Tree Induction, Multi-classifier Class-combiner Approach, Clustering-based Multi-classifier Class-combiner Approach Data Mining Classification Analysis Skewed Class Distribution Problem Decision Tree Induction Multi-classifier Class-combiner Approach
83	A Semantic Triplet Based Story Classifier January 2013 (has links) abstract: Text classification, in the artificial intelligence domain, is an activity in which text documents are automatically classified into predefined categories using machine learning techniques. An example of this is classifying uncategorized news articles into different predefined categories such as "Business", "Politics", "Education", "Technology" , etc. In this thesis, supervised machine learning approach is followed, in which a module is first trained with pre-classified training data and then class of test data is predicted. Good feature extraction is an important step in the machine learning approach and hence the main component of this text classifier is semantic triplet based features in addition to traditional features like standard keyword based features and statistical features based on shallow-parsing (such as density of POS tags and named entities). Triplet {Subject, Verb, Object} in a sentence is defined as a relation between subject and object, the relation being the predicate (verb). Triplet extraction process, is a 5 step process which takes input corpus as a web text document(s), each consisting of one or many paragraphs, from RSS feeds to lists of extremist website. Input corpus feeds into the "Pronoun Resolution" step, which uses an heuristic approach to identify the noun phrases referenced by the pronouns. The next step "SRL Parser" is a shallow semantic parser and converts the incoming pronoun resolved paragraphs into annotated predicate argument format. The output of SRL parser is processed by "Triplet Extractor" algorithm which forms the triplet in the form {Subject, Verb, Object}. Generalization and reduction of triplet features is the next step. Reduced feature representation reduces computing time, yields better discriminatory behavior and handles curse of dimensionality phenomena. For training and testing, a ten- fold cross validation approach is followed. In each round SVM classifier is trained with 90% of labeled (training) data and in the testing phase, classes of remaining 10% unlabeled (testing) data are predicted. Concluding, this paper proposes a model with semantic triplet based features for story classification. The effectiveness of the model is demonstrated against other traditional features used in the literature for text classification tasks. / Dissertation/Thesis / M.S. Computer Science 2013 Computer science Machine learning Natural Language Processing Ravi Karad SVM (Support Vector Machine) classifier Text classification
84	Combined decision making with multiple agents Simpson, Edwin Daniel January 2014 (has links) In a wide range of applications, decisions must be made by combining information from multiple agents with varying levels of trust and expertise. For example, citizen science involves large numbers of human volunteers with differing skills, while disaster management requires aggregating information from multiple people and devices to make timely decisions. This thesis introduces efficient and scalable Bayesian inference for decision combination, allowing us to fuse the responses of multiple agents in large, real-world problems and account for the agents’ unreliability in a principled manner. As the behaviour of individual agents can change significantly, for example if agents move in a physical space or learn to perform an analysis task, this work proposes a novel combination method that accounts for these time variations in a fully Bayesian manner using a dynamic generalised linear model. This approach can also be used to augment agents’ responses with continuous feature data, thus permitting decision-making when agents’ responses are in limited supply. Working with information inferred using the proposed Bayesian techniques, an information-theoretic approach is developed for choosing optimal pairs of tasks and agents. This approach is demonstrated by an algorithm that maintains a trustworthy pool of workers and enables efficient learning by selecting informative tasks. The novel methods developed here are compared theoretically and empirically to a range of existing decision combination methods, using both simulated and real data. The results show that the methodology proposed in this thesis improves accuracy and computational efficiency over alternative approaches, and allows for insights to be determined into the behavioural groupings of agents. 519.5
85	Pattern Synthesis Techniques And Compact Data Representation Schemes For Efficient Nearest Neighbor Classification Pulabaigari, Viswanath 01 1900 (has links) (PDF) No description available. Nearest Neighbor Classification Pattern Synthesis Partition Pattern Synthesis Ovelap Based Pattern Synthesis Network Intrusion Detection Data Representation NN Classification NN Classifier Synthetic Patterns Nearest Neighbor Classifier Computer Science
86	Unární klasifikátor obrazových dat / Unary Classification of Image Data Beneš, Jiří January 2021 (has links) The work deals with an introduction to classification algorithms. It then divides classifiers into unary, binary and multi-class and describes the different types of classifiers. The work compares individual classifiers and their areas of use. For unary classifiers, practical examples and a list of used architectures are given in the work. The work contains a chapter focused on the comparison of the effects of hyper parameters on the quality of unary classification for individual architectures. Part of the submission is a practical example of reimplementation of the unary classifier.
87	Rozpoznávání vzorů v obraze pomocí AdaBoost / Pattern Recognition Using AdaBoost Wrhel, Vladimír January 2010 (has links) This paper deals about AdaBoost algorithm, which is used to create a strong classification function using a number of weak classifiers. We familiarize ourselves with modifications of AdaBoost, namely Real AdaBoost, WaldBoost, FloatBoost and TCAcu. These modifications improve some of the properties of algorithm AdaBoost. We discuss some properties of feature and weak classifiers. We show a class of tasks for which AdaBoost algorithm is applicable. We indicate implementation of the library containing that method and we present some tests performed on the implemented library.
88	Employee Turnover Prediction - A Comparative Study of Supervised Machine Learning Models Kovvuri, Suvoj Reddy, Dommeti, Lydia Sri Divya January 2022 (has links) Background: In every organization, employees are an essential resource. For several reasons, employees are neglected by the organizations, which leads to employee turnover. Employee turnover causes considerable losses to the organization. Using machine learning algorithms and with the data in hand, a prediction of an employee’s future in an organization is made. Objectives: The aim of this thesis is to conduct a comparison study utilizing supervised machine learning algorithms such as Logistic Regression, Naive Bayes Classifier, Random Forest Classifier, and XGBoost to predict an employee’s future in a company. Using evaluation metrics models are assessed in order to discover the best efficient model for the data in hand. Methods: The quantitative research approach is used in this thesis, and data is analyzed using statistical analysis. The labeled data set comes from Kaggle and includes information on employees at a company. The data set is used to train algorithms. The created models will be evaluated on the test set using evaluation measures including Accuracy, Precision, Recall, F1 Score, and ROC curve to determine which model performs the best at predicting employee turnover. Results: Among the studied features in the data set, there is no feature that has a significant impact on turnover. Upon analyzing the results, the XGBoost classifier has better mean accuracy with 85.3%, followed by the Random Forest classifier with 83% accuracy than the other two algorithms. XGBoost classifier has better precision with 0.88, followed by Random Forest Classifier with 0.82. Both the Random Forest classifier and XGBoost classifier showed a 0.69 Recall score. XGBoost classifier had the highest F1 Score with 0.77, followed by the Random Forest classifier with 0.75. In the ROC curve, the XGBoost classifier had a higher area under the curve(AUC) with 0.88. Conclusions: Among the studied four machine learning algorithms, Logistic Regression, Naive Bayes Classifier, Random Forest Classifier, and XGBoost, the XGBoost classifier is the most optimal with a good performance score respective to the tested performance metrics. No feature is found majorly affect employee turnover. Machine Learning Employee Turnover Prediction Supervised Learn- ing Models Logistic Regression Naive Bayes Classifier Random Forest Classifier XGBoost Computer Sciences Datavetenskap (datalogi)
89	Comparision of Machine Learning Algorithms on Identifying Autism Spectrum Disorder Aravapalli, Naga Sai Gayathri, Palegar, Manoj Kumar January 2023 (has links) Background: Autism Spectrum Disorder (ASD) is a complex neurodevelopmen-tal disorder that affects social communication, behavior, and cognitive development.Patients with autism have a variety of difficulties, such as sensory impairments, at-tention issues, learning disabilities, mental health issues like anxiety and depression,as well as motor and learning issues. The World Health Organization (WHO) es-timates that one in 100 children have ASD. Although ASD cannot be completelytreated, early identification of its symptoms might lessen its impact. Early identifi-cation of ASD can significantly improve the outcome of interventions and therapies.So, it is important to identify the disorder early. Machine learning algorithms canhelp in predicting ASD. In this thesis, Support Vector Machine (SVM) and RandomForest (RF) are the algorithms used to predict ASD. Objectives: The main objective of this thesis is to build and train the models usingmachine learning(ML) algorithms with the default parameters and with the hyper-parameter tuning and find out the most accurate model based on the comparison oftwo experiments to predict whether a person is suffering from ASD or not. Methods: Experimentation is the method chosen to answer the research questions.Experimentation helped in finding out the most accurate model to predict ASD. Ex-perimentation is followed by data preparation with splitting of data and by applyingfeature selection to the dataset. After the experimentation followed by two exper-iments, the models were trained to find the performance metrics with the defaultparameters, and the models were trained to find the performance with the hyper-parameter tuning. Based on the comparison, the most accurate model was appliedto predict ASD. Results: In this thesis, we have chosen two algorithms SVM and RF algorithms totrain the models. Upon experimentation and training of the models using algorithmswith hyperparameter tuning. SVM obtained the highest accuracy score and f1 scoresfor test data are 96% and 97% compared to other model RF which helps in predictingASD. Conclusions: The models were trained using two ML algorithms SVM and RF andconducted two experiments, in experiment-1 the models were trained using defaultparameters and obtained accuracy, f1 scores for the test data, and in experiment-2the models were trained using hyper-parameter tuning and obtained the performancemetrics such as accuracy and f1 score for the test data. By comparing the perfor-mance metrics, we came to the conclusion that SVM is the most accurate algorithmfor predicting ASD. Autism Spectrum Disorder(ASD) Classification Data pre-processing Feature selection Machine learning algorithms Random Forest Classifier Support Vector Classifier. Computer Engineering Datorteknik Computer Sciences Datavetenskap (datalogi)
90	Comparative Study of Methods for Linguistic Modeling of Numerical Data Visa, Sofia January 2002 (has links) No description available. Computer Science classifier fuzzy systems neural networks support vectors machine minimum distance classifier ROC confusion matrix Yule statistic bias-variance tradeoff.

Search results