Global ETD Search

191	Spamerkennung mit Support Vector Machines Möller, Manuel 22 June 2005 (has links) (PDF) Diese Arbeit zeigt ausgehend von einer Darstellung der theoretischen Grundlagen automatischer Textklassifikation, dass die aus der Statistical Learning Theory stammenden Support Vector Machines geeignet sind, zu einer präziseren Erkennung unerwünschter E-Mail-Werbung beizutragen. In einer Testumgebung mit einem Corpus von 20 000 E-Mails wurden Testläufe verschiedene Parameter der Vorverarbeitung und der Support Vector Machine automatisch evaluiert und grafisch visualisiert. Aufbauend darauf wird eine Erweiterung für die Open-Source-Software SpamAssassin beschrieben, die die vorhandenen Klassifikationsmechanismen um eine Klassifikation per Support Vector Machine erweitert. Automatische Visualisierung Plugin SpamAssassin Support Vector Machine ddc:004 Automatische Klassifikation E-Mail Klassifikation Künstliche Intelligenz Maschinelles Lernen Spam-Mail
192	System for Identifying Plankton from the SIPPER Instrument Platform Kramer, Kurt A. 29 October 2010 (has links) Plankton imaging systems such as SIPPER produce a large quantity of data in the form of plankton images from a variety of classes. A system known as PICES was developed to quickly extract, classify and manage the millions of images produced from a single one-week research cruise. A new fast technique for parameter tuning and feature selection for Support Vector Machines using Wrappers was created. This technique allows for faster feature selection, while at the same time maintaining and sometimes improving classification accuracy. It also gives the user greater flexibility in the management of class contents in existing training libraries. Support vector machines are binary classifiers that can implement multi-class classifiers by creating a classifier for each possible combination of classes or for each class using a one class versus all strategy. Feature selection searches for a single set of features to be used by each of the binary classifiers. This ignores the fact that features that may be good discriminators for two particular classes might not do well for other class combinations. As a result, the feature selection process may not include these features in the common set to be used by all support vector machines. It is shown through experimentation that by selecting features for each binary class combination, overall classification accuracy can be improved and the time required for training a multi-class support vector machine can be reduced. Another benefit of this approach is that significantly less time is required for feature selection when additional classes are added to the training data. This is because the features selected for the existing class combinations are still valid, so that feature selection only needs to be run for the new combination added. This work resulted in a system called PICES, a GUI based user friendly system, which aids in the classification management of over 55 million images of plankton split amongst 180 classes. PICES embodies an improved means of performing Wrapper based feature selection that creates classifiers that train faster and are just as accurate and sometimes more accurate, while reducing the feature selection time. Marine Science PICES Machine Learning Feature Selection Support Vector Machine SVM Multi-Class Pair-Wise American Studies Arts and Humanities
193	Automatic Red Tide Detection using MODIS Satellite Images Cheng, Wijian 08 June 2009 (has links) Red tides pose a significant economic and environmental threat in the Gulf of Mexico. Detecting red tide is important for understanding this phenomenon. In this thesis, machine learning approaches based on Random Forests, Support Vector Machines and K-Nearest Neighbors have been evaluated for red tide detection from MODIS satellite images. Detection results using machine learning algorithms were compared to ship collected ground truth red tide data. This work has three major contributions. First, machine learning approaches outperformed two of the latest thresholding red tide detection algorithms based on bio-optical characterization by more than 10% in terms of F measure and more than 4% in terms of area under the ROC curve. Machine Learning approaches are effective in more locations on the West Florida Shelf. Second, the thresholds developed in recent thresholding methods were introduced as input attributes to the machine learning approaches and this strategy improved Random Forests and KNearest Neighbors approaches' F-measures. Third, voting the machine learning and thresholding methods could achieve the better performance compared with using machine learning alone, which implied a combination between machine learning models and biocharacterization thresholding methods can be used to obtain effective red tide detection results. karenia brevis West Florida Shelf machine learning random forest support vector machine American Studies Arts and Humanities Computer Sciences
194	Developing Predictive Models for Lung Tumor Analysis Basu, Satrajit 01 January 2012 (has links) A CT-scan of lungs has become ubiquitous as a thoracic diagnostic tool. Thus, using CT-scan images in developing predictive models for tumor types and survival time of patients afflicted with Non-Small Cell Lung Cancer (NSCLC) would provide a novel approach to non-invasive tumor analysis. It can provide an alternative to histopathological techniques such as needle biopsy. Two major tumor analysis problems were addressed in course of this study, tumor type classification and survival time prediction. CT-scan images of 109 patients with NSCLC were used in this study. The first involved classifying tumor types into two major classes of non-small cell lung tumors, Adenocarcinoma and Squamous-cell Carcinoma, each constituting 30% of all lung tumors. In a first of its kind investigation, a large group of 2D and 3D image features, which were hypothesized to be useful, are evaluated for effectiveness in classifying the tumors. Classifiers including decision trees and support vector machines (SVM) were used along with feature selection techniques (wrappers and relief-F) to build models for tumor classification. Results show that over the large feature space for both 2D and 3D features it is possible to predict tumor classes with over 63% accuracy, showing new features may be of help. The accuracy achieved using 2D and 3D features is similar, with 3D easier to use. The tumor classification study was then extended by introducing the Bronchioalveolar Carcinoma (BAC) tumor type. Following up on the hypothesis that Bronchioalveolar Carcinoma is substantially different from other NSCLC tumor types, a two-class problem was created, where an attempt was made to differentiate BAC from the other two tumor types. To make a three-class problem a two-class problem, misclassification amongst Adenocarcinoma and Squamous-cell Carcinoma were ignored. Using the same prediction models as the previous study and just 3D image features, tumor classes were predicted with around 77% accuracy. The final study involved predicting two year survival time in patients suffering from NSCLC. Using a subset of the image features and a handful of clinical features, predictive models were developed to predict two year survival time in 95 NSCLC patients. A support vector machine classifier, naive Bayes classifier and decision tree classifier were used to develop the predictive models. Using the Area Under the Curve (AUC) as a performance metric, different models were developed and analyzed for their effectiveness in predicting survival time. A novel feature selection method to group features based on a correlation measure has been proposed in this work along with feature space reduction using principal component analysis. The parameters for the support vector machine were tuned using grid search. A model based on a combination of image and clinical features, achieved the best performance with an AUC of 0.69, using dimensionality reduction by means of principal component analysis along with grid search to tune the parameters of the SVM classifier. The study showed the effectiveness of a predominantly image feature space in predicting survival time. A comparison of the performance of the models from different classifiers also indicate SVMs consistently outperformed or matched the other two classifiers for this data. Classifiers CT-scan Feature Selection Image Features Radiomics Support Vector Machine American Studies Arts and Humanities Computer Sciences
195	Prediction of antimicrobial peptides using hyperparameter optimized support vector machines Gabere, Musa Nur January 2011 (has links) <p>Antimicrobial peptides (AMPs) play a key role in the innate immune response. They can be ubiquitously found in a wide range of eukaryotes including mammals, amphibians, insects, plants, and protozoa. In lower organisms, AMPs function merely as antibiotics by permeabilizing cell membranes and lysing invading microbes. Prediction of antimicrobial peptides is important because experimental methods used in characterizing AMPs are costly, time consuming and resource intensive and identification of AMPs in insects can serve as a template for the design of novel antibiotic. In order to fulfil this, firstly, data on antimicrobial peptides is extracted from UniProt, manually curated and stored into a centralized database called dragon antimicrobial peptide database (DAMPD). Secondly, based on the curated data, models to predict antimicrobial peptides are created using support vector machine with optimized hyperparameters. In particular, global optimization methods such as grid search, pattern search and derivative-free methods are utilised to optimize the SVM hyperparameters. These models are useful in characterizing unknown antimicrobial peptides. Finally, a webserver is created that will be used to predict antimicrobial peptides in haemotophagous insects such as Glossina morsitan and Anopheles gambiae.</p>
196	Continuous detection and prediction of grasp states and kinematics from primate motor, premotor, and parietal cortex Menz, Veera Katharina 29 April 2015 (has links) No description available. 570 decoding primate motor cortex premotor cortex parietal cortex neurprosthetics spiking activity microelectrodes Kalman filter Support Vector Machine Biologie (PPN619462639)
197	A one-class object-based system for sparse geographic feature identification Fourie, Christoff 03 1900 (has links) Thesis (MSc (Geography and Environmental Studies))--University of Stellenbosch, 2011. / ENGLISH ABSTRACT: The automation of information extraction from earth observation imagery has become a field of active research. This is mainly due to the high volumes of remotely sensed data that remain unused and the possible benefits that the extracted information can provide to a wide range of interest groups. In this work an earth observation image processing system is presented and profiled that attempts to streamline the information extraction process, without degradation of the quality of the extracted information, for geographic object anomaly detection. The proposed system, implemented as a software application, combines recent research in automating image segment generation and automatically finding statistical classifier parameters and attribute subsets using evolutionary inspired search algorithms. Exploratory research was conducted on the use of an edge metric as a fitness function to an evolutionary search heuristic to automate the generation of image segments for a region merging segmentation algorithm having six control parameters. The edge metric for such an application is compared with an area based metric. The use of attribute subset selection in conjunction with a free parameter tuner for a one class support vector machine (SVM) classifier, operating on high dimensional object based data, was also investigated. For common earth observation anomaly detection problems using typical segment attributes, such a combined free parameter tuning and attribute subset selection system provided superior statistically significant results compared to a free parameter tuning only process. In some extreme cases, due to the stochastic nature of the search algorithm employed, the free parameter only strategy provided slightly better results. The developed system was used in a case study to map a single class of interest on a 22.5 x 22.5km subset of a SPOT 5 image and is compared with a multiclass classification strategy. The developed system generated slightly better classification accuracies than the multiclass classifier and only required samples from the class of interest. / AFIKAANSE OPSOMMING: Die outomatisering van die verkryging van inligting vanaf aardwaarnemingsbeelde het in sy eie reg 'n navorsingsveld geword as gevolg van die groot volumes data wat nie benut word nie, asook na aanleiding van die moontlike bydrae wat inligting wat verkry word van hierdie beelde aan verskeie belangegroepe kan bied. In hierdie tesis word 'n aardwaarneming beeldverwerkingsstelsel bekend gestel en geëvalueer. Hierdie stelsel beoog om die verkryging van inligting van aardwaarnemingsbeelde te vergemaklik deur verbruikersinteraksie te minimaliseer, sonder om die kwaliteit van die resultate te beïnvloed. Die stelsel is ontwerp vir geografiese voorwerp anomalie opsporing en is as 'n sagteware program geïmplementeer. Die program kombineer onlangse navorsing in die gebruik van evolusionêre soek-algoritmes om outomaties goeie beeldsegmente te verkry en parameters te vind, sowel as om kenmerke vir 'n statistiese klassifikasie van beeld segmente te selekteer. Verkennende navorsing is gedoen op die benutting van 'n rand metriek as 'n passings funksie in 'n evolusionêre soek heuristiek om outomaties goeie parameters te vind vir 'n streeks kombinering beeld segmentasie algoritme met ses beheer parameters. Hierdie rand metriek word vergelyk met 'n area metriek vir so 'n toepassing. Die nut van atribuut substel seleksie in samewerking met 'n vrye parameter steller vir 'n een klas steun vektor masjien (SVM) klassifiseerder is ondersoek op hoë dimensionele objek georiënteerde data. Vir algemene aardwaarneming anomalie opsporings probleme met 'n tipiese segment kenmerk versameling, het so 'n stelsel beduidend beter resultate as 'n eksklusiewe vrye parameter stel stelsel gelewer in sommige uiterste gevalle. As gevolg van die stogastiese aard van die soek algoritme het die eksklusiewe vrye parameter stel strategie effens beter resultate gelewer. Die stelsel is getoets in 'n gevallestudie waar 'n enkele klas op 'n 22.5 x 22.5km substel van 'n SPOT 5 beeld geïdentifiseer word. Die voorgestelde stelsel, wat slegs monsters van die gekose klas gebruik het, het beter klassifikasie akkuraathede genereer as die multi klas klassifiseerder. Segmentation evaluation Image classification Support vector machine Evolutionary search algorithms
198	A Model Fusion Based Framework For Imbalanced Classification Problem with Noisy Dataset January 2014 (has links) abstract: Data imbalance and data noise often coexist in real world datasets. Data imbalance affects the learning classifier by degrading the recognition power of the classifier on the minority class, while data noise affects the learning classifier by providing inaccurate information and thus misleads the classifier. Because of these differences, data imbalance and data noise have been treated separately in the data mining field. Yet, such approach ignores the mutual effects and as a result may lead to new problems. A desirable solution is to tackle these two issues jointly. Noting the complementary nature of generative and discriminative models, this research proposes a unified model fusion based framework to handle the imbalanced classification with noisy dataset. The phase I study focuses on the imbalanced classification problem. A generative classifier, Gaussian Mixture Model (GMM) is studied which can learn the distribution of the imbalance data to improve the discrimination power on imbalanced classes. By fusing this knowledge into cost SVM (cSVM), a CSG method is proposed. Experimental results show the effectiveness of CSG in dealing with imbalanced classification problems. The phase II study expands the research scope to include the noisy dataset into the imbalanced classification problem. A model fusion based framework, K Nearest Gaussian (KNG) is proposed. KNG employs a generative modeling method, GMM, to model the training data as Gaussian mixtures and form adjustable confidence regions which are less sensitive to data imbalance and noise. Motivated by the K-nearest neighbor algorithm, the neighboring Gaussians are used to classify the testing instances. Experimental results show KNG method greatly outperforms traditional classification methods in dealing with imbalanced classification problems with noisy dataset. The phase III study addresses the issues of feature selection and parameter tuning of KNG algorithm. To further improve the performance of KNG algorithm, a Particle Swarm Optimization based method (PSO-KNG) is proposed. PSO-KNG formulates model parameters and data features into the same particle vector and thus can search the best feature and parameter combination jointly. The experimental results show that PSO can greatly improve the performance of KNG with better accuracy and much lower computational cost. / Dissertation/Thesis / Doctoral Dissertation Industrial Engineering 2014 Industrial engineering Information science Gaussian mixture model Imbalanced classification K nearest Gaussian Particle swarm optimization Support vector machine
199	Predicting Demographic and Financial Attributes in a Bank Marketing Dataset January 2016 (has links) abstract: Bank institutions employ several marketing strategies to maximize new customer acquisition as well as current customer retention. Telemarketing is one such approach taken where individual customers are contacted by bank representatives with offers. These telemarketing strategies can be improved in combination with data mining techniques that allow predictability of customer information and interests. In this thesis, bank telemarketing data from a Portuguese banking institution were analyzed to determine predictability of several client demographic and financial attributes and find most contributing factors in each. Data were preprocessed to ensure quality, and then data mining models were generated for the attributes with logistic regression, support vector machine (SVM) and random forest using Orange as the data mining tool. Results were analyzed using precision, recall and F1 score. / Dissertation/Thesis / Masters Thesis Computer Science 2016 Computer science Mathematics Industrial engineering classification data mining logistic regression random forest sensitivity analysis support vector machine
200	Automated classification of bibliographic data using SVM and Naive Bayes Nordström, Jesper January 2018 (has links) Classification of scientific bibliographic data is an important and increasingly more time-consuming task in a “publish or perish” paradigm where the number of scientific publications is steadily growing. Apart from being a resource-intensive endeavor, manual classification has also been shown to be often performed with a quite high degree of inconsistency. Since many bibliographic databases contain a large number of already classified records supervised machine learning for automated classification might be a solution for handling the increasing volumes of published scientific articles. In this study automated classification of bibliographic data, based on two different machine learning methods; Naive Bayes and Support Vector Machine (SVM), were evaluated. The data used in the study were collected from the Swedish research database SwePub and the features used for training the classifiers were based on abstracts and titles in the bibliographic records. The accuracy achieved ranged between a lowest score of 0.54 and a highest score of 0.84. The classifiers based on Support Vector Machine did consistently receive higher scores than the classifiers based on Naive Bayes. Classification performed at the second level in the hierarchical classification system used clearly resulted in lower scores than classification performed at the first level. Using abstracts as the basis for feature extraction yielded overall better results than using titles, the differences were however very small. automated classification machine learning Naive Bayes Support Vector Machine SVM bibliographic data SwePub Computer and Information Sciences Data- och informationsvetenskap

Search results