Global ETD Search

71	Classification of Hate Tweets and Their Reasons using SVM Tarasova, Natalya January 2016 (has links) Denna studie fokuserar på att klassificera hat-meddelanden riktade mot mobiloperatörerna Verizon, AT&T and Sprint. Huvudsyftet är att med hjälp av maskininlärningsalgoritmen Support Vector Machines (SVM) klassificera meddelanden i fyra kategorier - Hat, Orsak, Explicit och Övrigt - för att kunna identifiera ett hat-meddelande och dess orsak. Studien resulterade i två metoder: en "naiv" metod (the Naive Method, NM) och en mer "avancerad" metod (the Partial Timeline Method, PTM). NM är en binär metod i den bemärkelsen att den ställer frågan: "Tillhör denna tweet klassen Hat?". PTM ställer samma fråga men till en begränsad mängd av tweets, dvs bara de som ligger inom ± 30 min från publiceringen av hat-tweeten. Sammanfattningsvis indikerade studiens resultat att PTM är noggrannare än NM. Dock tar den inte hänsyn till samtliga tweets på användarens tidslinje. Därför medför valet av metod en avvägning: PTM erbjuder en noggrannare klassificering och NM erbjuder en mer utförlig klassificering. / This study focused on finding the hate tweets posted by the customers of three mobileoperators Verizon, AT&T and Sprint and identifying the reasons for their dissatisfaction. The timelines with a hate tweet were collected and studied for the presence of an explanation. A machine learning approach was employed using four categories: Hate, Reason, Explanatory and Other. The classication was conducted with one-versus-all approach using Support Vector Machines algorithm implemented in a LIBSVM tool. The study resulted in two methodologies: the Naive method (NM) and the Partial Time-line Method (PTM). The Naive Method relied only on the feature space consisting of the most representative words chosen with Akaike Information Criterion. PTM utilized the fact that the majority of the explanations were posted within a one-hour time window of the posting of a hate tweet. We found that the accuracy of PTM is higher than for NM. In addition, PTM saves time and memory by analysing fewer tweets. At the same time this implies a trade-off between relevance and completeness. / <p>Opponent: Kristina Wettainen</p> Support Vector Machines classification Akaike Information Criteria machine learning Twitter hate tweets
72	A Dynamic Behavioral Biometric Approach to Authenticate Users Employing Their Fingers to Interact with Touchscreen Devices Ponce, Arturo 01 May 2015 (has links) The use of mobile devices has extended to all areas of human life and has changed the way people work and socialize. Mobile devices are susceptible to getting lost, stolen, or compromised. Several approaches have been adopted to protect the information stored on these devices. One of these approaches is user authentication. The two most popular methods of user authentication are knowledge based and token based methods but they present different kinds of problems. Biometric authentication methods have emerged in recent years as a way to deal with these problems. They use an individual’s unique characteristics for identification and have proven to be somewhat effective in authenticating users. Biometric authentication methods also present several problems. For example, they aren’t 100% effective in identifying users, some of them are not well perceived by users, others require too much computational effort, and others require special equipment or special postures by the user. Ultimately their implementation can result in unauthorized use of the devices or the user being annoyed by the implementation. New ways of interacting with mobile devices have emerged in recent years. This makes it necessary for authentication methods to adapt to these changes and take advantage of them. For example, the use of touchscreens has become prevalent in mobile devices, which means that biometric authentication methods need to adapt to it. One important aspect to consider when adopting these new methods is their acceptance of these methods by users. The Technology Acceptance Model (TAM) states that system use is a response that can be predicted by user motivation. This work presents an authentication method that can constantly verify the user’s identity which can help prevent unauthorized use of a device or access to sensitive information. The goal was to authenticate people while they used their fingers to interact with their touchscreen mobile devices doing ordinary tasks like vertical and horizontal scrolling. The approach used six biometric traits to do the authentication. The combination of those traits allowed for authentication at the beginning and at the end of a finger stroke. Support Vector Machines were employed and the best results obtained show Equal Error Rate values around 35%. Those results demonstrate the potential of the approach to verify a person’s identity. Additionally, this works tested the acceptance of the approach among participants, which can influence its eventual adoption. An acceptance level of 80% was obtained which compares favorably against other behavioral biometric approaches. authentication biometrics human-computer interaction support vector machines Bioinformatics Computer Science Bioinformatics Computer Sciences Computer Security
73	Variable selection for kernel methods with application to binary classification Oosthuizen, Surette 03 1900 (has links) Thesis (PhD (Statistics and Actuarial Science))—University of Stellenbosch, 2008. / The problem of variable selection in binary kernel classification is addressed in this thesis. Kernel methods are fairly recent additions to the statistical toolbox, having originated approximately two decades ago in machine learning and artificial intelligence. These methods are growing in popularity and are already frequently applied in regression and classification problems. Variable selection is an important step in many statistical applications. Thereby a better understanding of the problem being investigated is achieved, and subsequent analyses of the data frequently yield more accurate results if irrelevant variables have been eliminated. It is therefore obviously important to investigate aspects of variable selection for kernel methods. Chapter 2 of the thesis is an introduction to the main part presented in Chapters 3 to 6. In Chapter 2 some general background material on kernel methods is firstly provided, along with an introduction to variable selection. Empirical evidence is presented substantiating the claim that variable selection is a worthwhile enterprise in kernel classification problems. Several aspects which complicate variable selection in kernel methods are discussed. An important property of kernel methods is that the original data are effectively transformed before a classification algorithm is applied to it. The space in which the original data reside is called input space, while the transformed data occupy part of a feature space. In Chapter 3 we investigate whether variable selection should be performed in input space or rather in feature space. A new approach to selection, so-called feature-toinput space selection, is also proposed. This approach has the attractive property of combining information generated in feature space with easy interpretation in input space. An empirical study reveals that effective variable selection requires utilisation of at least some information from feature space. Having confirmed in Chapter 3 that variable selection should preferably be done in feature space, the focus in Chapter 4 is on two classes of selecion criteria operating in feature space: criteria which are independent of the specific kernel classification algorithm and criteria which depend on this algorithm. In this regard we concentrate on two kernel classifiers, viz. support vector machines and kernel Fisher discriminant analysis, both of which are described in some detail in Chapter 4. The chapter closes with a simulation study showing that two of the algorithm-independent criteria are very competitive with the more sophisticated algorithm-dependent ones. In Chapter 5 we incorporate a specific strategy for searching through the space of variable subsets into our investigation. Evidence in the literature strongly suggests that backward elimination is preferable to forward selection in this regard, and we therefore focus on recursive feature elimination. Zero- and first-order forms of the new selection criteria proposed earlier in the thesis are presented for use in recursive feature elimination and their properties are investigated in a numerical study. It is found that some of the simpler zeroorder criteria perform better than the more complicated first-order ones. Up to the end of Chapter 5 it is assumed that the number of variables to select is known. We do away with this restriction in Chapter 6 and propose a simple criterion which uses the data to identify this number when a support vector machine is used. The proposed criterion is investigated in a simulation study and compared to cross-validation, which can also be used for this purpose. We find that the proposed criterion performs well. The thesis concludes in Chapter 7 with a summary and several discussions for further research. Variable selection Support vector machines Kernel Fisher discriminant analysis
74	Detection of breast cancer microcalcifications in digitized mammograms : developing segmentation and classification techniques for the processing of MIAS database mammograms based on the wavelet decomposition transform and support vector machines Al-Osta, Husam E. I. January 2010 (has links) Mammography is used to aid early detection and diagnosis systems. It takes an x-ray image of the breast and can provide a second opinion for radiologists. The earlier detection is made, the better treatment works. Digital mammograms are dealt with by Computer Aided Diagnosis (CAD) systems that can detect and analyze abnormalities in a mammogram. The purpose of this study is to investigate how to categories cropped regions of interest (ROI) from digital mammogram images into two classes; normal and abnormal regions (which contain microcalcifications). The work proposed in this thesis is divided into three stages to provide a concept system for classification between normal and abnormal cases. The first stage is the Segmentation Process, which applies thresholding filters to separate the abnormal objects (foreground) from the breast tissue (background). Moreover, this study has been carried out on mammogram images and mainly on cropped ROI images from different sizes that represent individual microcalcification and ROI that represent a cluster of microcalcifications. The second stage in this thesis is feature extraction. This stage makes use of the segmented ROI images to extract characteristic features that would help in identifying regions of interest. The wavelet transform has been utilized for this process as it provides a variety of features that could be examined in future studies. The third and final stage is classification, where machine learning is applied to be able to distinguish between normal ROI images and ROI images that may contain microcalcifications. The result indicated was that by combining wavelet transform and SVM we can distinguish between regions with normal breast tissue and regions that include microcalcifications. 615.84
75	CONTRIBUTIONS TO K-MEANS CLUSTERING AND REGRESSION VIA CLASSIFICATION ALGORITHMS Salman, Raied 27 April 2012 (has links) The dissertation deals with clustering algorithms and transforming regression prob-lems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learn-ing environment for solving regression problems as classification tasks by using support vector machines (SVMs). An extension to the most popular unsupervised clustering meth-od, k-means algorithm, is proposed, dubbed k-means2 (k-means squared) algorithm, appli-cable to ultra large datasets. The main idea is based on using a small portion of the dataset in the first stage of the clustering. Thus, the centers of such a smaller dataset are computed much faster than if computing the centers based on the whole dataset. These final centers of the first stage are naturally much closer to the locations of the final centers rendering a great reduction in the total computational cost. For large datasets the speed up in computa-tion exhibited a trend which is shown to be high and rising with the increase in the size of the dataset. The total transient time for the fast stage was found to depend largely on the portion of the dataset selected in the stage. For medium size datasets it has been shown that an 8-10% portion of data used in the fast stage is a reasonable choice. The centers of the 8-10% samples computed during the fast stage may oscillate towards the final centers' positions of the fast stage along the centers' movement path. The slow stage will start with the final centers of the fast phase and the paths of the centers in the second stage will be much shorter than the ones of a classic k-means algorithm. Additionally, the oscillations of the slow stage centers' trajectories along the path to the final centers' positions are also greatly minimized. In the second part of the dissertation, a novel approach of posing a solution of re-gression problems as the multiclass classification tasks within the common framework of kernel machines is proposed. Based on such an approach both the nonlinear (NL) regression problems and NL multiclass classification tasks will be solved as multiclass classification problems by using SVMs. The accuracy of an approximating classification (hyper)Surface (averaged over several benchmarking data sets used in this study) to the data points over a given high-dimensional input space created by a nonlinear multiclass classifier is slightly superior to the solution obtained by regression (hyper)Surface. In terms of the CPU time needed for training (i.e. for tuning the hyperparameters of the models), the nonlinear SVM classifier also shows significant advantages. Here, the comparisons between the solutions obtained by an SVM solving given regression problem as a classic SVM regressor and as the SVM classifier have been performed. In order to transform a regression problem into a classification task, four possible discretizations of a continuous output (target) vector y are introduced and compared. A very strict double (nested) cross-validation technique has been used for measuring the performances of regression and multiclass classification SVMs. In order to carry out fair comparisons, SVMs are used for solving both tasks - regression and multiclass classification. The readily available and most popular benchmarking SVM tool, LibSVM, was used in all experiments. The results in solving twelve benchmarking regression tasks shown here will present SVM regression and classification algorithms as strongly competing models where each approach shows merits for a specific class of high-dimensional function approximation problems. Machine Learning Data Mining Support Vector machines Clustering Classification and Artificial Intelligent. Computer Sciences Physical Sciences and Mathematics
76	Analysis of Nanopore Detector Measurements using Machine Learning Methods, with Application to Single-Molecule Kinetics Landry, Matthew 18 May 2007 (has links) At its core, a nanopore detector has a nanometer-scale biological membrane across which a voltage is applied. The voltage draws a DNA molecule into an á-hemolysin channel in the membrane. Consequently, a distinctive channel current blockade signal is created as the molecule flexes and interacts with the channel. This flexing of the molecule is characterized by different blockade levels in the channel current signal. Previous experiments have shown that a nanopore detector is sufficiently sensitive such that nearly identical DNA molecules were classified successfully using machine learning techniques such as Hidden Markov Models and Support Vector Machines in a channel current based signal analysis platform [4-9]. In this paper, methods for improving feature extraction are presented to improve both classification and to provide biologists and chemists with a better understanding of the physical properties of a given molecule. Nanopore Hidden Markov Models Support Vector Machines Emission Variance Amplification Feature Extraction Channel Current Cheminformatics
77	Clustering Via Supervised Support Vector Machines Merat, Sepehr 07 August 2008 (has links) An SVM-based clustering algorithm is introduced that clusters data with no a priori knowledge of input classes. The algorithm initializes by first running a binary SVM classifier against a data set with each vector in the set randomly labeled. Once this initialization step is complete, the SVM confidence parameters for classification on each of the training instances can be accessed. The lowest confidence data (e.g., the worst of the mislabeled data) then has its labels switched to the other class label. The SVM is then re-run on the data set (with partly re-labeled data). The repetition of the above process improves the separability until there is no misclassification. Variations on this type of clustering approach are shown. clustering machine learning pattern recognition support vector machines supervised learning unsupervised learning
78	Reconstructing Textual File Fragments Using Unsupervised Machine Learning Techniques Roux, Brian 19 December 2008 (has links) This work is an investigation into reconstructing fragmented ASCII files based on content analysis motivated by a desire to demonstrate machine learning's applicability to Digital Forensics. Using a categorized corpus of Usenet, Bulletin Board Systems, and other assorted documents a series of experiments are conducted using machine learning techniques to train classifiers which are able to identify fragments belonging to the same original file. The primary machine learning method used is the Support Vector Machine with a variety of feature extractions to train from. Additional work is done in training committees of SVMs to boost the classification power over the individual SVMs, as well as the development of a method to tune SVM kernel parameters using a genetic algorithm. Attention is given to the applicability of Information Retrieval techniques to file fragments, as well as an analysis of textual artifacts which are not present in standard dictionaries. Machine Learning File Carving Fragmented Files Support Vector Machines SVM Digital Forensics Information Retrieval
79	Application of Support Vector Machines for Damage Detection in Structures Sharma, Siddharth 05 January 2009 (has links) Support vector machines (SVMs) are a set of supervised learning methods that have recently been applied for structural damage detection due to their ability to form an accurate boundary from a small amount of training data. During training, they require data from the undamaged and damaged structure. The unavailability of data from the damaged structure is a major challenge in such methods due to the irreversibility of damage. Recent methods create data for the damaged structure from finite element models. In this thesis we propose a new method to derive the dataset representing the damage structure from the dataset measured on the undamaged structure without using a detailed structural finite element model. The basic idea is to reduce the values of a copy of the data from the undamaged structure to create the data representing the damaged structure. The performance of the method in the presence of measurement noise, ambient base excitation, wind loading is investigated. We find that SVMs can be used to detect small amounts of damage in the structure in the presence of noise. The ability of the method to detect damage at different locations in a structure and the effect of measurement location on the sensitivity of the method has been investigated. An online structural health monitoring method has also been proposed to use the SVM boundary, trained on data measured from the damaged structure, as an indicator of the structural health condition. Statistical Pattern Recognition Online Health Monitoring Support Vector Machines Structural analysis (Engineering) Machine learning
80	Aquisição e processamento de biosinais de eletromiografia de superfície e eletroencelografia para caracterização de comandos verbais ou intenção de fala mediante seu processamento matemático em pacientes com disartria Sánchez Galego, Juliet January 2016 (has links) Sistemas para assistência de pessoas com sequelas de Acidente Vascular Cerebral (AVC) como, por exemplo, a Disartria apresenta interesse crescente devido ao aumento da parcela da população com esses distúrbios. Este trabalho propõe a aquisição e o processamento dos biosinais de Eletromiografia de Superficie (sEMG) no músculos do rosto ligados ao processo da fala e de Eletroencefalografia (EEG), sincronizados no tempo mediante um arquivo de áudio. Para isso realizaram-se coletas em voluntários saudáveis no Laboratório IEE e com voluntários com Disartria, previamente diagnosticados com AVC, no departamento de Fisioterapia do Hospital de Clínicas de Porto Alegre. O objetivo principal é classificar esses biosinais frente a comandos verbais estabelecidos, mediante o método computacional Support Vector Machine (SVM) para o sinal de sEMG e Naive Bayes (NB) para o sinal de EEG, visando o futuro estudo e classificação do grau de Disartria do paciente. Estes métodos foram comparados com o Linear Discriminant Analysis (LDA), que foi implementado para os sinais de sEMG e EEG. As características extraídas do sinal de sEMG foram: desvio padrão, média aritmética, skewness, kurtosis e RMS; para o sinal de EEG as características extraídas na frequência foram: Mínimo, Máximo, Média e Desvio padrão e Skewness e Kurtosis, no domínio do tempo. Como parte do pré-processamento também foi empregado o filtro espacial Common Spatial Pattern (CSP) de forma a aumentar a atividade discriminativa entre as classes de movimento no sinal de EEG. Foi avaliado através de um Projeto de Experimentos Fatorial, a natureza das coletas, o sujeito, o método computacional, o estado do sujeito e a banda de frequência filtrada para EEG. Os comandos verbais definidos: “Direita”, “Esquerda”, “Para Frente” e “Para Trás”, possibilitaram a identificação de tarefas mentais em sujeitos saudáveis e com Disartria, atingindo-se Accuracy de 77,6% - 80,8%. / Assistive technology for people with Cerebrovascular Accident (CVA) aftereffects, such as Dysarthria, is gaining interest due to the increasing proportion of the population with these disorders. This work proposes the acquisition and processing of Surface Electromyography (sEMG) signal from the speech process face muscles and Electroencephalography (EEG) signal, synchronized in time by an audio file. For that reason assays were carried out with healthy volunteers at IEE Laboratory and with dysarthric volunteers, previously diagnosed with CVA, at the physiotherapy department of the Porto Alegre University Hospital. The main objective is to classify these biosignals in front of verbal commands established, by computational method of Support Vector Machine (SVM) for the sEMG and Naive Bayes (NB) for EEG, regarding the future study and classification of pacient degree of Dysarthria. These methods were compared with Linear Discriminant Analysis (LDA), who was implemented for sEMG and EEG. The extracted features of sEMG signal were: standard deviation, arithmetic mean, skewness, kurtosis and RMS; for EEG signal extracted features in frequency domain were: minimum, maximum, average and standard deviation, skewness and kurtosis, were used for time domain extraction. As part of pre-processing, Common Spatial Pattern (CSP) filter was also employed, in order to increase the discriminating activity between motion classes in the EEG signal. Data were evaluated in a factorial experiment project, with nature of assays, subject, computational method, subject health state and specifically for EEG were evaluated frequency band filtered. Defined verbal commands, "Right", "Left", "Forward" and "Back", allowed the identification of mental tasks in healthy subjects and dysarthric subjects, reaching Accuracy of 77.6% - 80.8%. Eletromiografia Eletroencefalografia Tecnologia assistiva Máquinas de vetores de suporte Electromyography Electroencephalography Assistive technology Bayesian networks Support vector machines

Search results