Global ETD Search

41	An Efficient Ranking and Classification Method for Linear Functions, Kernel Functions, Decision Trees, and Ensemble Methods Glass, Jesse Miller January 2020 (has links) Structural algorithms incorporate the interdependence of outputs into the prediction, the loss, or both. Frank-Wolfe optimizations of pairwise losses and Gaussian conditional random fields for multivariate output regression are two such structural algorithms. Pairwise losses are standard 0-1 classification surrogate losses applied to pairs of features and outputs, resulting in improved ranking performance (area under the ROC curve, average precision, and F-1 score) at the cost of increased learning complexity. In this dissertation, it is proven that the balanced loss 0-1 SVM and the pairwise SVM have the same dual loss and the pairwise dual coefficient domain is a subdomain of the balanced loss 0-1 SVM with bias dual coefficient domain. This provides a theoretical advancement in the understanding of pairwise loss, which we exploit for the development of a novel ranking algorithm that is fast and memory efficient method with state the art ranking metric performance across eight benchmark data sets. Various practical advancements are also made in multivariate output regression. The learning time for Gaussian conditional random fields is greatly reduced and the parameter domain is expanded to enable repulsion between outputs. Last, a novel multivariate regression is presented that keeps the desirable elements of GCRF and infuses them into a local regression model that improves mean squared error and reduces learning complexity. / Computer and Information Science Artificial Intelligence Bipartite Ranking Frank-wolfe Algorithm Gaussian Conditional Random Fields Multivariate Output Regression Pairwise Support Vector Machine Structural Support Vector Machine
42	Solving support vector machine classification problems and their applications to supplier selection Kim, Gitae January 1900 (has links) Doctor of Philosophy / Department of Industrial & Manufacturing Systems Engineering / Chih-Hang Wu / Recently, interdisciplinary (management, engineering, science, and economics) collaboration research has been growing to achieve the synergy and to reinforce the weakness of each discipline. Along this trend, this research combines three topics: mathematical programming, data mining, and supply chain management. A new pegging algorithm is developed for solving the continuous nonlinear knapsack problem. An efficient solving approach is proposed for solving the ν-support vector machine for classification problem in the field of data mining. The new pegging algorithm is used to solve the subproblem of the support vector machine problem. For the supply chain management, this research proposes an efficient integrated solving approach for the supplier selection problem. The support vector machine is applied to solve the problem of selecting potential supplies in the procedure of the integrated solving approach. In the first part of this research, a new pegging algorithm solves the continuous nonlinear knapsack problem with box constraints. The problem is to minimize a convex and differentiable nonlinear function with one equality constraint and box constraints. Pegging algorithm needs to calculate primal variables to check bounds on variables at each iteration, which frequently is a time-consuming task. The newly proposed dual bound algorithm checks the bounds of Lagrange multipliers without calculating primal variables explicitly at each iteration. In addition, the calculation of the dual solution at each iteration can be reduced by a proposed new method for updating the solution. In the second part, this research proposes several streamlined solution procedures of ν-support vector machine for the classification. The main solving procedure is the matrix splitting method. The proposed method in this research is a specified matrix splitting method combined with the gradient projection method, line search technique, and the incomplete Cholesky decomposition method. The method proposed can use a variety of methods for line search and parameter updating. Moreover, large scale problems are solved with the incomplete Cholesky decomposition and some efficient implementation techniques. To apply the research findings in real-world problems, this research developed an efficient integrated approach for supplier selection problems using the support vector machine and the mixed integer programming. Supplier selection is an essential step in the procurement processes. For companies considering maximizing their profits and reducing costs, supplier selection requires seeking satisfactory suppliers and allocating proper orders to the selected suppliers. In the early stage of supplier selection, a company can use the support vector machine classification to choose potential qualified suppliers using specific criteria. However, the company may not need to purchase from all qualified suppliers. Once the company determines the amount of raw materials and components to purchase, the company then selects final suppliers from which to order optimal order quantities at the final stage of the process. Mixed integer programming model is then used to determine final suppliers and allocates optimal orders at this stage. Support Vector Machine Nonlinear Knapsack Problem Supplier Selection Convex Optimization Supply Chain Management Classification Engineering (0537)
43	Learning via Query Synthesis Alabdulmohsin, Ibrahim Mansour 07 May 2017 (has links) Active learning is a subfield of machine learning that has been successfully used in many applications. One of the main branches of active learning is query synthe- sis, where the learning agent constructs artificial queries from scratch in order to reveal sensitive information about the underlying decision boundary. It has found applications in areas, such as adversarial reverse engineering, automated science, and computational chemistry. Nevertheless, the existing literature on membership query synthesis has, generally, focused on finite concept classes or toy problems, with a limited extension to real-world applications. In this thesis, I develop two spectral algorithms for learning halfspaces via query synthesis. The first algorithm is a maximum-determinant convex optimization method while the second algorithm is a Markovian method that relies on Khachiyan’s classical update formulas for solving linear programs. The general theme of these methods is to construct an ellipsoidal approximation of the version space and to synthesize queries, afterward, via spectral decomposition. Moreover, I also describe how these algorithms can be extended to other settings as well, such as pool-based active learning. Having demonstrated that halfspaces can be learned quite efficiently via query synthesis, the second part of this thesis proposes strategies for mitigating the risk of reverse engineering in adversarial environments. One approach that can be used to render query synthesis algorithms ineffective is to implement a randomized response. In this thesis, I propose a semidefinite program (SDP) for learning a distribution of classifiers, subject to the constraint that any individual classifier picked at random from this distributions provides reliable predictions with a high probability. This algorithm is, then, justified both theoretically and empirically. A second approach is to use a non-parametric classification method, such as similarity-based classification. In this thesis, I argue that learning via the empirical kernel maps, also commonly referred to as 1-norm Support Vector Machine (SVM) or Linear Programming (LP) SVM, is the best method for handling indefinite similarities. The advantages of this method are established both theoretically and empirically. active learning query synthesis reverse engineering support vector machine indefinite kernels linear classification
44	A machine learning approach to fundraising success in higher education Ye, Liang 01 May 2017 (has links) New donor acquisition and current donor promotion are the two major programs in fundraising for higher education, and developing proper targeting strategies plays an important role in the both programs. This thesis presents machine learning solutions as targeting strategies for the both programs based on readily available alumni data in almost any institution. The targeting strategy for new donor acquisition is modeled as a donor identification problem. The Gaussian na ̈ıve bayes, random forest, and support vector machine algorithms are used and evaluated. The test results show that having been trained with enough samples, all three algorithms can distinguish donors from rejectors well, and big donors are identified more often than others.While there is a trade off between the cost of soliciting candidates and the success of donor acquisition, the results show that in a practical scenario where the models are properly used as the targeting strategy, more than 85% of new donors and more than 90% of new big donors can be acquired when only 40% of the candidates are solicited. The targeting strategy for donor promotion is modeled as a promising donor(i.e., those who will upgrade their pledge) prediction problem in machine learning.The Gaussian na ̈ıve bayes, random forest, and support vector machine algorithms are tested. The test results show that all the three algorithms can distinguish promising donors from non-promising donors (i.e., those who will not upgrade their pledge).When the age information is known, the best model produces an overall accuracy of 97% in the test set. The results show that in a practical scenario where the models are properly used as the targeting strategy, more than 85% of promising donors can be acquired when only 26% candidates are solicited. / Graduate / liangye714@gmail.com machine learning fundraising support vector machine random forest na ̈ıve bayes predictive analysis prospect research
45	Direct L2 Support Vector Machine Zigic, Ljiljana 01 January 2016 (has links) This dissertation introduces a novel model for solving the L2 support vector machine dubbed Direct L2 Support Vector Machine (DL2 SVM). DL2 SVM represents a new classification model that transforms the SVM's underlying quadratic programming problem into a system of linear equations with nonnegativity constraints. The devised system of linear equations has a symmetric positive definite matrix and a solution vector has to be nonnegative. Furthermore, this dissertation introduces a novel algorithm dubbed Non-Negative Iterative Single Data Algorithm (NN ISDA) which solves the underlying DL2 SVM's constrained system of equations. This solver shows significant speedup compared to several other state-of-the-art algorithms. The training time improvement is achieved at no cost, in other words, the accuracy is kept at the same level. All the experiments that support this claim were conducted on various datasets within the strict double cross-validation scheme. DL2 SVM solved with NN ISDA has faster training time on both medium and large datasets. In addition to a comprehensive DL2 SVM model we introduce and derive its three variants. Three different solvers for the DL2's system of linear equations with nonnegativity constraints were implemented, presented and compared in this dissertation. machine learning large-scale classification support vector machine non-negative least squares Computer Engineering
46	Mining Aspects through Cluster Analysis Using Support Vector Machines and Genetic Algorithms Hacoupian, Yourik 01 January 2013 (has links) The main purpose of object-oriented programming is to use encapsulation to reduce the amount of coupling within each object. However, object-oriented programming has some weaknesses in this area. To address this shortcoming, researchers have proposed an approach known as aspect-oriented programming (AOP). AOP is intended to reduce the amount of tangled code within an application by grouping similar functions into an aspect. To demonstrate the powerful aspects of AOP, it is necessary to extract aspect candidates from current object-oriented applications. Many different approaches have been proposed to accomplish this task. One of such approaches utilizes vector based clustering to identify the possible aspect candidates. In this study, two different types of vectors are applied to two different vector-based clustering techniques. In this approach, each method in a software system S is represented by a d-dimensional vector. These vectors take into account the Fan-in values of the methods as well as the number of calls made to individual methods within the classes in software system S. Then a semi-supervised clustering approach known as Support Vector Clustering is applied to the vectors. In addition, an improved K-means clustering approach which is based on Genetic Algorithms is also applied to these vectors. The results obtained from these two approaches are then evaluated using standard metrics for aspect mining. In addition to introducing two new clustering based approaches to aspect mining, this research investigates the effectiveness of the currently known metrics used in aspect mining to evaluate a given vector based approach. Many of the metrics currently used for aspect mining evaluations are singleton metrics. Such metrics evaluate a given approach by taking into account only one aspect of a clustering technique. This study, introduces two different sets of metrics by combining these singleton measures. The iDIV metric combines the Diversity of a partition (DIV), Intra-cluster distance of a partition (IntraD), and the percentage of the number of methods analyzed (PAM) values to measure the overall effectiveness of the diversity of the partitions. While the iDISP metric combines the Dispersion of crosscutting concerns (DISP) along with Inter-cluster distance of a partition (InterD) and the PAM values to measure the quality of the clusters formed by a given method. Lastly, the oDIV and oDISP metrics introduced, take into account the complexity of the algorithms in relation with the DIV and DISP values. By comparing the obtained values for each of the approaches, this study is able to identify the best performing method as it pertains to these metrics. Aspect Mining Clustering Genetic Algorithms Software Engineering Support Vector Machine Computer Sciences
47	Distributed Support Vector Machine Learning Armond, Kenneth C., Jr. 07 August 2008 (has links) Support Vector Machines (SVMs) are used for a growing number of applications. A fundamental constraint on SVM learning is the management of the training set. This is because the order of computations goes as the square of the size of the training set. Typically, training sets of 1000 (500 positives and 500 negatives, for example) can be managed on a PC without hard-drive thrashing. Training sets of 10,000 however, simply cannot be managed with PC-based resources. For this reason most SVM implementations must contend with some kind of chunking process to train parts of the data at a time (10 chunks of 1000, for example, to learn the 10,000). Sequential and multi-threaded chunking methods provide a way to run the SVM on large datasets while retaining accuracy. The multi-threaded distributed SVM described in this thesis is implemented using Java RMI, and has been developed to run on a network of multi-core/multi-processor computers. Distributed Parallel SVM Support Vector Machine Machine Learning SMO Sequential Minimization Optimization
48	A Semi-Supervised Information Extraction Framework for Large Redundant Corpora Normand, Eric 19 December 2008 (has links) The vast majority of text freely available on the Internet is not available in a form that computers can understand. There have been numerous approaches to automatically extract information from human- readable sources. The most successful attempts rely on vast training sets of data. Others have succeeded in extracting restricted subsets of the available information. These approaches have limited use and require domain knowledge to be coded into the application. The current thesis proposes a novel framework for Information Extraction. From large sets of documents, the system develops statistical models of the data the user wishes to query which generally avoid the lim- itations and complexity of most Information Extractions systems. The framework uses a semi-supervised approach to minimize human input. It also eliminates the need for external Named Entity Recognition systems by relying on freely available databases. The final result is a query-answering system which extracts information from large corpora with a high degree of accuracy. Information Extraction Natural Language Processing Support Vector Machine Machine Learn- ing Information Retrieval unstructured text
49	A Balanced Secondary Structure Predictor Islam, Md Nasrul 15 May 2015 (has links) Secondary structure (SS) refers to the local spatial organization of the polypeptide backbone atoms of a protein. Accurate prediction of SS is a vital clue to resolve the 3D structure of protein. SS has three different components- helix (H), beta (E) and coil (C). Most SS predictors are imbalanced as their accuracy in predicting helix and coil are high, however significantly low in the beta. The objective of this thesis is to develop a balanced SS predictor which achieves good accuracies in all three SS components. We proposed a novel approach to solve this problem by combining a genetic algorithm (GA) with a support vector machine. We prepared two test datasets (CB471 and N295) to compare the performance of our predictors with SPINE X. Overall accuracy of our predictor was 76.4% and 77.2% respectively on CB471 and N295 datasets, while SPINE X gave 76.5% overall accuracy on both test datasets. Protein Secondary structure MetaSSPred Support vector machine Genetic algorithm Balanced prediction Other Computer Engineering
50	Detecting exoplanets with machine learning : A comparative study between convolutional neural networks and support vector machines Tiensuu, Jacob, Linderholm, Maja, Dreborg, Sofia, Örn, Fredrik January 2019 (has links) In this project two machine learning methods, Support Vector Machine, SVM, and Convolutional Neural Network, CNN, are studied to determine which method performs best on a labeled data set containing time series of light intensity from extrasolar stars. The main difficulty is that in the data set there are a lot more non exoplanet stars than there are stars with orbiting exoplanets. This is causing a so called imbalanced data set which in this case is improved by i.e. mirroring the curves of stars with an orbiting exoplanet and adding them to the set. Trying to improve the results further, some preprocessing is done before implementing the methods on the data set. For the SVM, feature extraction and fourier transform of the time-series are important measures but further preprocessing alternatives are investigated. For the CNN-method the time-series are both detrended and smoothed, giving two inputs for the same light curve. All code is implemented in python. Of all the validation parameters recall is considered the main priority since it is more important to find all exoplanets than finding all non exoplanets. CNN turned out to be the best performing method for the chosen configurations with 1.000 in recall which exceeds SVM’s recall 0.800. Considering the second validation parameter precision CNN is also the best performing method with a precision of 0.769 over SVM's 0.571. Machine learning Exoplanet Support vector machine Convolution neuralnetwork Computer and Information Sciences Data- och informationsvetenskap

Search results