Spelling suggestions: "subject:"8upport vector machines"" "subject:"6upport vector machines""
181 |
Uma investigação empírica e comparativa da aplicação de RNAs ao problema de mineração de opiniões e análise de sentimentosMoraes, Rodrigo de 26 March 2013 (has links)
Submitted by Silvana Teresinha Dornelles Studzinski (sstudzinski) on 2015-05-04T17:25:43Z
No. of bitstreams: 1
Rodrigo Morais.pdf: 5083865 bytes, checksum: 69563cc7178422ac20ff08fe38ee97de (MD5) / Made available in DSpace on 2015-05-04T17:25:43Z (GMT). No. of bitstreams: 1
Rodrigo Morais.pdf: 5083865 bytes, checksum: 69563cc7178422ac20ff08fe38ee97de (MD5)
Previous issue date: 2013 / Nenhuma / A área de Mineração de Opiniões e Análise de Sentimentos surgiu da necessidade de processamento automatizado de informações textuais referentes a opiniões postadas na web. Como principal motivação está o constante crescimento do volume desse tipo de informação, proporcionado pelas tecnologia trazidas pela Web 2.0, que torna inviável o acompanhamento e análise dessas opiniões úteis tanto para usuários com pretensão de compra de novos produtos quanto para empresas para a identificação de demanda de mercado. Atualmente, a maioria dos estudos em Mineração de Opiniões e Análise de Sentimentos que fazem o uso de mineração de dados se voltam para o desenvolvimentos de técnicas que procuram uma melhor representação do conhecimento e acabam utilizando técnicas de classificação comumente aplicadas, não explorando outras que apresentam bons resultados em outros problemas. Sendo assim, este trabalho tem como objetivo uma investigação empírica e comparativa da aplicação do modelo clássico de Redes Neurais Artificiais (RNAs), o multilayer perceptron , no problema de Mineração de Opiniões e Análise de Sentimentos. Para isso, bases de dados de opiniões são definidas e técnicas de representação de conhecimento textual são aplicadas sobre essas objetivando uma igual representação dos textos para os classificadores através de unigramas. A partir dessa reresentação, os classificadores Support Vector Machines (SVM), Naïve Bayes (NB) e RNAs são aplicados considerandos três diferentes contextos de base de dados: (i) bases de dados balanceadas, (ii) bases com diferentes níveis de desbalanceamento e (iii) bases em que a técnica para o tratamento do desbalanceamento undersampling randômico é aplicada. A investigação do contexto desbalanceado e de outros originados dele se mostra relevante uma vez que bases de opiniões disponíveis na web normalmente apresentam mais opiniões positivas do que negativas. Para a avaliação dos classificadores são utilizadas métricas tanto para a mensuração de desempenho de classificação quanto para a de tempo de execução. Os resultados obtidos sobre o contexto balanceado indicam que as RNAs conseguem superar significativamente os resultados dos demais classificadores e, apesar de apresentarem um grande custo computacional para treinamento, proporcionam tempos de classificação significantemente inferiores aos do classificador que apresentou os resultados de classificação mais próximos aos dos resultados das RNAs. Já para o contexto desbalanceado, as RNAs se mostram sensíveis ao aumento de ruído na representação dos dados e ao aumento do desbalanceamento, se destacando nestes experimentos, o classificador NB. Com a aplicação de undersampling as RNAs conseguem ser equivalentes aos demais classificadores apresentando resultados competitivos. Porém, podem não ser o classificador mais adequado de se adotar nesse contexto quando considerados os tempos de treinamento e classificação, e também a diferença pouco expressiva de acerto de classificação. / The area of Opinion Mining and Sentiment Analysis emerges from the need for automated processing of textual information about reviews posted in the web. The main motivation of this area is the constant volume growth of such information, provided by the technologies brought by Web 2.0, that makes impossible the monitoring and analysis of these reviews that are useful for users, who desire to purchase new products, and for companies to identify market demand as well. Currently, the most studies of Opinion Mining and Sentiment Analysis that make use of data mining aims to the development of techniques that seek a better knowledge representation and using classification techniques commonly applied and they not explore others classifiers that work well in other problems. Thus, this work aims a comparative empirical research of the ap-plication of the classical model of Artificial Neural Networks (ANN), the multilayer perceptron, in the Opinion Mining and Sentiment Analysis problem. For this, reviews datasets are defined and techniques for textual knowledge representation applied to these aiming an equal texts rep-resentation for the classifiers. From this representation, the classifiers Support Vector Machines (SVM), Naïve Bayes (NB) and ANN are applied considering three data context: (i) balanced datasets, (ii) datasets with different unbalanced ratio and (iii) datasets with the application of random undersampling technique for the unbalanced handling. The unbalanced context inves-tigation and of others originated from it becomes relevant once datasets available in the web ordinarily contain more positive opinions than negative. For the classifiers evaluation, metrics both for the classification perform and for run time are used. The results obtained in the bal-anced context indicate that ANN outperformed significantly the others classifiers and, although it has a large computation cost for the training fase, the ANN classifier provides classification time (real-time) significantly less than the classifier that obtained the results closer than ANN. For the unbalanced context, the ANN are sensitive to the growth of noise representation and the unbalanced growth while the NB classifier stood out. With the undersampling application, the ANN classifier is equivalent to the others classifiers attaining competitive results. However, it can not be the most appropriate classifier to this context when the training and classification time and its little advantage of classification accuracy are considered.
|
182 |
Pairwise Classification and Pairwise Support Vector MachinesBrunner, Carl 04 June 2012 (has links) (PDF)
Several modifications have been suggested to extend binary classifiers to multiclass classification, for instance the One Against All technique, the One Against One technique, or Directed Acyclic Graphs. A recent approach for multiclass classification is the pairwise classification, which relies on two input examples instead of one and predicts whether the two input examples belong to the same class or to different classes. A Support Vector Machine (SVM), which is able to handle pairwise classification tasks, is called pairwise SVM. A common pairwise classification task is face recognition. In this area, a set of images is given for training and another set of images is given for testing. Often, one is interested in the interclass setting. The latter means that any person which is represented by an image in the training set is not represented by any image in the test set. From the mentioned multiclass classification techniques only the pairwise classification technique provides meaningful results in the interclass setting.
For a pairwise classifier the order of the two examples should not influence the classification result. A common approach to enforce this symmetry is the use of selected kernels. Relations between such kernels and certain projections are provided. It is shown, that those projections can lead to an information loss. For pairwise SVMs another approach for enforcing symmetry is the symmetrization of the training sets. In other words, if the pair (a,b) of examples is a training pair then (b,a) is a training pair, too. It is proven that both approaches do lead to the same decision function for selected parameters. Empirical tests show that the approach using selected kernels is three to four times faster. For a good interclass generalization of pairwise SVMs training sets with several million training pairs are needed. A technique is presented which further speeds up the training time of pairwise SVMs by a factor of up to 130 and thus enables the learning of training sets with several million pairs. Another element affecting time is the need to select several parameters. Even with the applied speed up techniques a grid search over the set of parameters would be very expensive. Therefore, a model selection technique is introduced that is much less computationally expensive.
In machine learning, the training set and the test set are created by using some data generating process. Several pairwise data generating processes are derived from a given non pairwise data generating process. Advantages and disadvantages of the different pairwise data generating processes are evaluated.
Pairwise Bayes' Classifiers are introduced and their properties are discussed. It is shown that pairwise Bayes' Classifiers for interclass generalization tasks can differ from pairwise Bayes' Classifiers for interexample generalization tasks. In face recognition the interexample task implies that each person which is represented by an image in the test set is also represented by at least one image in the training set. Moreover, the set of images of the training set and the set of images of the test set are disjoint.
Pairwise SVMs are applied to four synthetic and to two real world datasets. One of the real world datasets is the Labeled Faces in the Wild (LFW) database while the other one is provided by Cognitec Systems GmbH. Empirical evidence for the presented model selection heuristic, the discussion about the loss of information and the provided speed up techniques is given by the synthetic databases and it is shown that classifiers of pairwise SVMs lead to a similar quality as pairwise Bayes' classifiers. Additionally, a pairwise classifier is identified for the LFW database which leads to an average equal error rate (EER) of 0.0947 with a standard error of the mean (SEM) of 0.0057. This result is better than the result of the current state of the art classifier, namely the combined probabilistic linear discriminant analysis classifier, which leads to an average EER of 0.0993 and a SEM of 0.0051. / Es gibt verschiedene Ansätze, um binäre Klassifikatoren zur Mehrklassenklassifikation zu nutzen, zum Beispiel die One Against All Technik, die One Against One Technik oder Directed Acyclic Graphs. Paarweise Klassifikation ist ein neuerer Ansatz zur Mehrklassenklassifikation. Dieser Ansatz basiert auf der Verwendung von zwei Input Examples anstelle von einem und bestimmt, ob diese beiden Examples zur gleichen Klasse oder zu unterschiedlichen Klassen gehören. Eine Support Vector Machine (SVM), die für paarweise Klassifikationsaufgaben genutzt wird, heißt paarweise SVM. Beispielsweise werden Probleme der Gesichtserkennung als paarweise Klassifikationsaufgabe gestellt. Dazu nutzt man eine Menge von Bildern zum Training und ein andere Menge von Bildern zum Testen. Häufig ist man dabei an der Interclass Generalization interessiert. Das bedeutet, dass jede Person, die auf wenigstens einem Bild der Trainingsmenge dargestellt ist, auf keinem Bild der Testmenge vorkommt. Von allen erwähnten Mehrklassenklassifikationstechniken liefert nur die paarweise Klassifikationstechnik sinnvolle Ergebnisse für die Interclass Generalization.
Die Entscheidung eines paarweisen Klassifikators sollte nicht von der Reihenfolge der zwei Input Examples abhängen. Diese Symmetrie wird häufig durch die Verwendung spezieller Kerne gesichert. Es werden Beziehungen zwischen solchen Kernen und bestimmten Projektionen hergeleitet. Zudem wird gezeigt, dass diese Projektionen zu einem Informationsverlust führen können. Für paarweise SVMs ist die Symmetrisierung der Trainingsmengen ein weiter Ansatz zur Sicherung der Symmetrie. Das bedeutet, wenn das Paar (a,b) von Input Examples zur Trainingsmenge gehört, dann muss das Paar (b,a) ebenfalls zur Trainingsmenge gehören. Es wird bewiesen, dass für bestimmte Parameter beide Ansätze zur gleichen Entscheidungsfunktion führen. Empirische Messungen zeigen, dass der Ansatz mittels spezieller Kerne drei bis viermal schneller ist. Um eine gute Interclass Generalization zu erreichen, werden bei paarweisen SVMs Trainingsmengen mit mehreren Millionen Paaren benötigt. Es wird eine Technik eingeführt, die die Trainingszeit von paarweisen SVMs um bis zum 130-fachen beschleunigt und es somit ermöglicht, Trainingsmengen mit mehreren Millionen Paaren zu verwenden. Auch die Auswahl guter Parameter für paarweise SVMs ist im Allgemeinen sehr zeitaufwendig. Selbst mit den beschriebenen Beschleunigungen ist eine Gittersuche in der Menge der Parameter sehr teuer. Daher wird eine Model Selection Technik eingeführt, die deutlich geringeren Aufwand erfordert.
Im maschinellen Lernen werden die Trainingsmenge und die Testmenge von einem Datengenerierungsprozess erzeugt. Ausgehend von einem nicht paarweisen Datengenerierungsprozess werden unterschiedliche paarweise Datengenerierungsprozesse abgeleitet und ihre Vor- und Nachteile bewertet.
Es werden paarweise Bayes-Klassifikatoren eingeführt und ihre Eigenschaften diskutiert. Es wird gezeigt, dass sich diese Bayes-Klassifikatoren für Interclass Generalization Aufgaben und für Interexample Generalization Aufgaben im Allgemeinen unterscheiden. Bei der Gesichtserkennung bedeutet die Interexample Generalization, dass jede Person, die auf einem Bild der Testmenge dargestellt ist, auch auf mindestens einem Bild der Trainingsmenge vorkommt. Außerdem ist der Durchschnitt der Menge der Bilder der Trainingsmenge mit der Menge der Bilder der Testmenge leer.
Paarweise SVMs werden an vier synthetischen und an zwei Real World Datenbanken getestet. Eine der verwendeten Real World Datenbanken ist die Labeled Faces in the Wild (LFW) Datenbank. Die andere wurde von Cognitec Systems GmbH bereitgestellt. Die Annahmen der Model Selection Technik, die Diskussion über den Informationsverlust, sowie die präsentierten Beschleunigungstechniken werden durch empirische Messungen mit den synthetischen Datenbanken belegt. Zudem wird mittels dieser Datenbanken gezeigt, dass Klassifikatoren von paarweisen SVMs zu ähnlich guten Ergebnissen wie paarweise Bayes-Klassifikatoren führen. Für die LFW Datenbank wird ein paarweiser Klassifikator bestimmt, der zu einer durchschnittlichen Equal Error Rate (EER) von 0.0947 und einem Standard Error of The Mean (SEM) von 0.0057 führt. Dieses Ergebnis ist besser als das des aktuellen State of the Art Klassifikators, dem Combined Probabilistic Linear Discriminant Analysis Klassifikator. Dieser führt zu einer durchschnittlichen EER von 0.0993 und einem SEM von 0.0051.
|
183 |
[en] SEMANTIC ROLE-LABELING FOR PORTUGUESE / [pt] ANOTADOR DE PAPEIS SEMÂNTICOS PARA PORTUGUÊSARTHUR BELTRAO CASTILHO NETO 23 June 2017 (has links)
[pt] A anotação de papeis semânticos (APS) é uma importante tarefa do processamento de linguagem natural (PLN), que possibilita estabelecer uma relação de significado entre os eventos descritos em uma sentença e seus participantes. Dessa forma, tem o potencial de melhorar o desempenho de inúmeros outros sistemas, tais como: tradução automática, correção ortográfica, extração e recuperação de informações e sistemas de perguntas e respostas, uma vez que reduz as ambiguidades existentes no texto de entrada. A grande maioria dos sistemas de APS publicados no mundo realiza a tarefa empregando técnicas de aprendizado supervisionado e, para obter melhores resultados, usam corpora manualmente revisados de tamanho considerável. No caso do Brasil, o recurso lexical que possui anotações semânticas (Propbank.br) é muito menor. Por isso, nos últimos anos, foram feitas tentativas de melhorar esse resultado utilizando técnicas de aprendizado semisupervisionado ou não-supervisionado. Embora esses trabalhos tenham contribuido direta e indiretamente para a área de PLN, não foram capazes de superar o desempenho dos sistemas puramente supervisionados. Este trabalho apresenta uma abordagem ao problema de anotação de papéis semânticos no idioma português. Utilizamos aprendizado supervisionado sobre um conjunto de 114 atributos categóricos e empregando duas técnicas de regularização de domínio, combinadas para reduzir o número de atributos binários em 96 por cento. O modelo gerado usa uma support vector machine com solver L2-loss dual support vector classification e é testado na base PropBank.br, apresentando desempenho ligeiramente superior ao estado-da-arte. O sistema é avaliado empiricamente pelo script oficial da CoNLL 2005 Shared Task, obtendo 82,17 por cento de precisão, 82,88 por cento de cobertura e 82,52 por cento de F1 ao passo que o estado-da-arte anterior atinge 83,0 por cento de precisão, 81,7 por cento de cobertura e 82,3 por cento de F1. / [en] Semantic role-labeling (SRL) is an important task of natural language processing (NLP) which allows establishing meaningful relationships between events described in a given sentence and its participants. Therefore, it can potentially improve performance on a large number of NLP systems such as automatic translation, spell correction, information extraction and retrieval and question answering, as it decreases ambiguity in the input text. The vast majority of SRL systems reported so far employed supervised learning techniques to perform the task. For better results, large sized manually reviewed corpora are used. The Brazilian semantic role labeled lexical resource (Propbank.br) is much smaller. Hence, in recent years, attempts have been made to improve performance using semi supervised and unsupervised learning. Even making several direct and indirect contributions to NLP, those studies were not able to outperform exclusively supervised systems. This paper presents an approach to the SRL task in Portuguese language using supervised learning over a set of 114 categorical features. Over those, we apply a combination of two domain regularization methods to cut binary features down to 96 percent. We test a SVM model (L2-loss dual support vector classification) on PropBank.Br dataset achieving results slightly better than state-of-the-art. We empirically evaluate the system using official CoNLL 2005 Shared Task script pulling 82.17 percent precision, 82.88 percent coverage and 82.52 percent F1. The previous state-of-the-art Portuguese SRL system scores 83.0 percent precision, 81.7 percent coverage and 82.3 percent F1.
|
184 |
Models for quantifying risk and reliability metrics via metaheuristics and support vector machinesLins, Isis Didier 27 February 2013 (has links)
Submitted by Daniella Sodre (daniella.sodre@ufpe.br) on 2015-04-10T16:15:19Z
No. of bitstreams: 2
dscidl.pdf: 3672005 bytes, checksum: 16e2ea719e96351a648acbff70be2fb0 (MD5)
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) / Made available in DSpace on 2015-04-10T16:15:19Z (GMT). No. of bitstreams: 2
dscidl.pdf: 3672005 bytes, checksum: 16e2ea719e96351a648acbff70be2fb0 (MD5)
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
Previous issue date: 2013-02-27 / CNPq / Nesse trabalho são desenvolvidos modelos de quantificação de métricas de risco e confiabilidade
para sistemas em diferentes etapas do ciclo de vida. Para sistemas na fase
de projeto, um Algoritmo Genético Multiobjetivo (MOGA) é combinado à Simulação
Discreta de Eventos (DES) a fim de prover configurações não-dominadas com relação à
disponibilidade e ao custo. O MOGA + DES proposto incorpora Processos de Renovação
Generalizados para modelagem de reparos imperfeitos e também indica o número ótimo de
equipes de manutenção. Para a fase operacional é proposto um hibridismo entre MOGA
e Inspeção Baseada no Risco para elaboração de planos de inspeção não-dominados em
termos de risco e custo que atendem às normas locais. Regressão via Support Vector Machines
(SVR) é aplicada nos casos em que a métrica relacionada à confiabilidade (variável
resposta) de um sistema operacional é função de variáveis ambientais e operacionais com
expressão analítica desconhecida. Otimização via Nuvens de Partículas é combinada à
SVR para a seleção simultânea das variáveis explicativas mais relevantes e dos valores
dos hiperparâmetros que aparecem no problema de treinamento de SVR. Com o objetivo
de avaliar a incerteza relacionada à variável resposta, métodos bootstrap são combinados
à SVR para a obtenção de intervalos de confiança e de previsão. São realizados experimentos
numéricos e são apresentados exemplos de aplicação no contexto da indústria do
petróleo. Os resultados obtidos indicam que os modelos propostos fornecem informações
importantes para o planejamento de custos e para a implementação de ações apropriadas
a fim de evitar eventos indesejados. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------This work develops models for quantifying risk and reliability-related metrics of systems
in different phases of their life cycle. For systems in the design phase, a Multi-Objective
Genetic Algorithm (MOGA) is coupled with Discrete Event Simulation (DES) to provide
non-dominated configurations with respect to availability and cost. The proposed
MOGA + DES incorporates a Generalized Renewal Process to account for imperfect
repairs and it also indicates the optimal number of maintenance teams. For the operational
phase, a hybridism between MOGA and Risk-Based Inspection is proposed for
the elaboration of non-dominated inspection plans in terms of risk and cost that comply
with local regulations. Regression via Support Vector Machines (SVR) is applied when
the reliability-related metric (response variable) of an operational system is function of a
number of environmental and operational variables with unknown analytical relationship.
A Particle Swarm Optimization is combined to SVR for the selection of the most relevant
variables along with the tuning of the SVR hyperparameters that appear in its training
problem. In order to assess the uncertainty related to the response variable, bootstrap
methods are coupled with SVR to construct confidence and prediction intervals. Numerical
experiments and application examples in the context of oil industry are provided.
The obtained results indicate that the proposed frameworks give valuable information for
budget planning and for the implementation of proper actions to avoid undesired events.
|
185 |
Detecção e classificação de arritmias em eletrocardiogramas usando transformadas wavelets, máquinas de vetores de suporte e rede BayesianaRodrigues, Luiz Carlos Ferreira 02 March 2012 (has links)
Made available in DSpace on 2016-03-15T19:37:40Z (GMT). No. of bitstreams: 1
Luiz Carlos Ferreira Rodrigues.pdf: 3281430 bytes, checksum: ce62f748aa1e8330c7d6402e06d3d41f (MD5)
Previous issue date: 2012-03-02 / The cardiopathies are currently, according the Ministério da Saúde, the second biggest cause of mortality among the Brazilians, behind only the brain vascular diseases. The
motivation for the work here presented is the identification and classification of cardiopathies registered in Electrocardiogram exams, ECG, such as premature contractions, branches blocks, tachycardia and other rhythms disturbance. Due its easy application
and low cost, the ECG is one of the resources more commonly used by researchers and health professionals in the assessment of cardiac conditions. The computational application developed in this study relies in the application of Wavelets Transforms for the digital signal processing of ECG, in extracting the morphologic characteristics, dynamics and spectral of the cycles of the signal and in the submission of these characteristics to two Support Vector Machines (SVM). The output of these two SVM's are combined as input to a Bayesian Network for the identification and classification of the cardiopathies.
The characteristic of each cycle, morphologic and spectral, has it dimensionality reduced by Principal Component Analysis (PCA). The spectral characteristics are extracted by the extractions of the Wavelets Transforms coefficients of the signal, whilst the dynamics characteristics are defined by the interval between the global maxima of each cycle. For development, testings and validations of the application we utilize the MIT-BIH Arrhythmia database, made available by Massachusetts Institute of Technology (MIT). At the end of this work we demonstrate that the application is able to recognize and classify 8 types of heart beats in ECG records, with an medium accuracy above 95,0%. / As cardiopatias são atualmente, segundo o Ministério da Saúde, a segunda maior causa de mortalidade entre brasileiros, ficando atrás apenas das doenças cerebrovasculares. A motivação do trabalho aqui apresentado é a identificação e classificação de cardiopatias registradas em exames de Eletrocardiograma, o ECG, tais como contrações prematuras, bloqueio de ramos, taquicardias e outros distúrbios de ritmo. Devido a sua fácil aplicação e baixo custo, o ECG é um dos recursos mais largamente utilizados por pesquisadores e profissionais da saúde na avaliação da saúde do coração. A aplicação computacional desenvolvida neste estudo concentra-se no uso de Transformadas Wavelets para o processamento digital dos sinais de ECG, na extração das características morfológicas, dinâmicas e espectrais de ciclos do sinal e na submissão dessas características a duas Máquinas de Vetores de Suporte (SVM). Os resultados das SVM's são combinadas em uma Rede Bayesiana para a identificação e classificação das cardiopatias. As características morfológicas de cada ciclo do sinal são extraídas através de Análise de Componentes Principais (PCA), as características espectrais são extraídas através da decomposição do sinal em coeficientes de Transformadas Wavelets enquanto as características dinâmicas são definidas pelos intervalos entre o máximo global de cada ciclo. Para desenvolvimento, testes e validação da aplicação foi utilizado o Banco de Arritmias MIT-BIH, disponibilizado pelo Massachusetts Institute of Technology (MIT). Neste trabalho demonstramos que a aplicação desenvolvida é capaz de reconhecer e classificar 8 tipos de batimentos cardíacos em registros de ECG, com uma acurácia média total de classificação superior a 95,0%.
|
186 |
[en] REMOTE SENSING IMAGE CLASSIFICATION USING SVM / [pt] CLASSIFICAÇÃO DE IMAGENS DE SENSORIAMENTO REMOTO USANDO SVMRAPHAEL BELO DA SILVA MELONI 14 September 2017 (has links)
[pt] Classificação de imagens é o processo de extração de informação em imagens digitais para reconhecimento de padrões e objetos homogêneos, que em sensoriamento remoto propõe-se a encontrar padrões entre os pixels pertencentes a uma imagem digital e áreas da superfície terrestre, para uma análise posterior por um especialista. Nesta dissertação, utilizamos a metodologia de aprendizado de máquina support vector machines para o problema de classificação de imagens, devido a possibilidade de trabalhar com grande quantidades de características. Construímos classificadores para o problema, utilizando imagens distintas que contém as informações de espaços de cores RGB e HSB, dos valores altimétricos e do canal infravermelho de uma região. Os valores de relevo ou altimétricos contribuíram de forma excelente nos
resultados, uma vez que esses valores são características fundamentais de uma região e os mesmos não tinham sido analisados em classificação de imagens de sensoriamento remoto. Destacamos o resultado final, do problema de classificação de imagens, para o problema de identificação de piscinas com vizinhança dois. Os resultados obtidos são 99 por cento de acurácia, 100 por cento de precisão, 93,75 por cento de recall, 96,77 por cento de F-Score e 96,18 por cento de índice Kappa. / [en] Image Classification is an information extraction process in digital images for pattern and homogeneous objects recognition. In remote sensing it aims to find patterns from digital images pixels, covering an area of earth surface, for subsequent analysis by a specialist. In this dissertation, to this images classification problem we employ Support Vector Machines, a machine learning methodology, due the possibility of working with large quantities of features. We built classifiers to the problem using different image information, such as RGB and HSB color spaces, altimetric values and infrared channel of a region. The altimetric values contributed to excellent results, since these values are fundamental characteristics of a region and they were not previously considered in remote sensing images classification. We highlight the final result, for the identifying swimming pools problem, when neighborhood is two. The results have 99 percent accuracy, 100 percent precision, 93.75 percent of recall, 96.77 percent F-Score and 96.18 percent of Kappa index.
|
187 |
Pairwise Classification and Pairwise Support Vector MachinesBrunner, Carl 16 May 2012 (has links)
Several modifications have been suggested to extend binary classifiers to multiclass classification, for instance the One Against All technique, the One Against One technique, or Directed Acyclic Graphs. A recent approach for multiclass classification is the pairwise classification, which relies on two input examples instead of one and predicts whether the two input examples belong to the same class or to different classes. A Support Vector Machine (SVM), which is able to handle pairwise classification tasks, is called pairwise SVM. A common pairwise classification task is face recognition. In this area, a set of images is given for training and another set of images is given for testing. Often, one is interested in the interclass setting. The latter means that any person which is represented by an image in the training set is not represented by any image in the test set. From the mentioned multiclass classification techniques only the pairwise classification technique provides meaningful results in the interclass setting.
For a pairwise classifier the order of the two examples should not influence the classification result. A common approach to enforce this symmetry is the use of selected kernels. Relations between such kernels and certain projections are provided. It is shown, that those projections can lead to an information loss. For pairwise SVMs another approach for enforcing symmetry is the symmetrization of the training sets. In other words, if the pair (a,b) of examples is a training pair then (b,a) is a training pair, too. It is proven that both approaches do lead to the same decision function for selected parameters. Empirical tests show that the approach using selected kernels is three to four times faster. For a good interclass generalization of pairwise SVMs training sets with several million training pairs are needed. A technique is presented which further speeds up the training time of pairwise SVMs by a factor of up to 130 and thus enables the learning of training sets with several million pairs. Another element affecting time is the need to select several parameters. Even with the applied speed up techniques a grid search over the set of parameters would be very expensive. Therefore, a model selection technique is introduced that is much less computationally expensive.
In machine learning, the training set and the test set are created by using some data generating process. Several pairwise data generating processes are derived from a given non pairwise data generating process. Advantages and disadvantages of the different pairwise data generating processes are evaluated.
Pairwise Bayes' Classifiers are introduced and their properties are discussed. It is shown that pairwise Bayes' Classifiers for interclass generalization tasks can differ from pairwise Bayes' Classifiers for interexample generalization tasks. In face recognition the interexample task implies that each person which is represented by an image in the test set is also represented by at least one image in the training set. Moreover, the set of images of the training set and the set of images of the test set are disjoint.
Pairwise SVMs are applied to four synthetic and to two real world datasets. One of the real world datasets is the Labeled Faces in the Wild (LFW) database while the other one is provided by Cognitec Systems GmbH. Empirical evidence for the presented model selection heuristic, the discussion about the loss of information and the provided speed up techniques is given by the synthetic databases and it is shown that classifiers of pairwise SVMs lead to a similar quality as pairwise Bayes' classifiers. Additionally, a pairwise classifier is identified for the LFW database which leads to an average equal error rate (EER) of 0.0947 with a standard error of the mean (SEM) of 0.0057. This result is better than the result of the current state of the art classifier, namely the combined probabilistic linear discriminant analysis classifier, which leads to an average EER of 0.0993 and a SEM of 0.0051. / Es gibt verschiedene Ansätze, um binäre Klassifikatoren zur Mehrklassenklassifikation zu nutzen, zum Beispiel die One Against All Technik, die One Against One Technik oder Directed Acyclic Graphs. Paarweise Klassifikation ist ein neuerer Ansatz zur Mehrklassenklassifikation. Dieser Ansatz basiert auf der Verwendung von zwei Input Examples anstelle von einem und bestimmt, ob diese beiden Examples zur gleichen Klasse oder zu unterschiedlichen Klassen gehören. Eine Support Vector Machine (SVM), die für paarweise Klassifikationsaufgaben genutzt wird, heißt paarweise SVM. Beispielsweise werden Probleme der Gesichtserkennung als paarweise Klassifikationsaufgabe gestellt. Dazu nutzt man eine Menge von Bildern zum Training und ein andere Menge von Bildern zum Testen. Häufig ist man dabei an der Interclass Generalization interessiert. Das bedeutet, dass jede Person, die auf wenigstens einem Bild der Trainingsmenge dargestellt ist, auf keinem Bild der Testmenge vorkommt. Von allen erwähnten Mehrklassenklassifikationstechniken liefert nur die paarweise Klassifikationstechnik sinnvolle Ergebnisse für die Interclass Generalization.
Die Entscheidung eines paarweisen Klassifikators sollte nicht von der Reihenfolge der zwei Input Examples abhängen. Diese Symmetrie wird häufig durch die Verwendung spezieller Kerne gesichert. Es werden Beziehungen zwischen solchen Kernen und bestimmten Projektionen hergeleitet. Zudem wird gezeigt, dass diese Projektionen zu einem Informationsverlust führen können. Für paarweise SVMs ist die Symmetrisierung der Trainingsmengen ein weiter Ansatz zur Sicherung der Symmetrie. Das bedeutet, wenn das Paar (a,b) von Input Examples zur Trainingsmenge gehört, dann muss das Paar (b,a) ebenfalls zur Trainingsmenge gehören. Es wird bewiesen, dass für bestimmte Parameter beide Ansätze zur gleichen Entscheidungsfunktion führen. Empirische Messungen zeigen, dass der Ansatz mittels spezieller Kerne drei bis viermal schneller ist. Um eine gute Interclass Generalization zu erreichen, werden bei paarweisen SVMs Trainingsmengen mit mehreren Millionen Paaren benötigt. Es wird eine Technik eingeführt, die die Trainingszeit von paarweisen SVMs um bis zum 130-fachen beschleunigt und es somit ermöglicht, Trainingsmengen mit mehreren Millionen Paaren zu verwenden. Auch die Auswahl guter Parameter für paarweise SVMs ist im Allgemeinen sehr zeitaufwendig. Selbst mit den beschriebenen Beschleunigungen ist eine Gittersuche in der Menge der Parameter sehr teuer. Daher wird eine Model Selection Technik eingeführt, die deutlich geringeren Aufwand erfordert.
Im maschinellen Lernen werden die Trainingsmenge und die Testmenge von einem Datengenerierungsprozess erzeugt. Ausgehend von einem nicht paarweisen Datengenerierungsprozess werden unterschiedliche paarweise Datengenerierungsprozesse abgeleitet und ihre Vor- und Nachteile bewertet.
Es werden paarweise Bayes-Klassifikatoren eingeführt und ihre Eigenschaften diskutiert. Es wird gezeigt, dass sich diese Bayes-Klassifikatoren für Interclass Generalization Aufgaben und für Interexample Generalization Aufgaben im Allgemeinen unterscheiden. Bei der Gesichtserkennung bedeutet die Interexample Generalization, dass jede Person, die auf einem Bild der Testmenge dargestellt ist, auch auf mindestens einem Bild der Trainingsmenge vorkommt. Außerdem ist der Durchschnitt der Menge der Bilder der Trainingsmenge mit der Menge der Bilder der Testmenge leer.
Paarweise SVMs werden an vier synthetischen und an zwei Real World Datenbanken getestet. Eine der verwendeten Real World Datenbanken ist die Labeled Faces in the Wild (LFW) Datenbank. Die andere wurde von Cognitec Systems GmbH bereitgestellt. Die Annahmen der Model Selection Technik, die Diskussion über den Informationsverlust, sowie die präsentierten Beschleunigungstechniken werden durch empirische Messungen mit den synthetischen Datenbanken belegt. Zudem wird mittels dieser Datenbanken gezeigt, dass Klassifikatoren von paarweisen SVMs zu ähnlich guten Ergebnissen wie paarweise Bayes-Klassifikatoren führen. Für die LFW Datenbank wird ein paarweiser Klassifikator bestimmt, der zu einer durchschnittlichen Equal Error Rate (EER) von 0.0947 und einem Standard Error of The Mean (SEM) von 0.0057 führt. Dieses Ergebnis ist besser als das des aktuellen State of the Art Klassifikators, dem Combined Probabilistic Linear Discriminant Analysis Klassifikator. Dieser führt zu einer durchschnittlichen EER von 0.0993 und einem SEM von 0.0051.
|
188 |
Voice Activity Detection / Voice Activity DetectionEnt, Petr January 2009 (has links)
Práce pojednává o využití support vector machines v detekci řečové aktivity. V první části jsou zkoumány různé druhy příznaků, jejich extrakce a zpracování a je nalezena jejich optimální kombinace, která podává nejlepší výsledky. Druhá část představuje samotný systém pro detekci řečové aktivity a ladění jeho parametrů. Nakonec jsou výsledky porovnány s dvěma dalšími systémy, založenými na odlišných principech. Pro testování a ladění byla použita ERT broadcast news databáze. Porovnání mezi systémy bylo pak provedeno na databázi z NIST06 Rich Test Evaluations.
|
189 |
With or without context : Automatic text categorization using semantic kernelsEklund, Johan January 2016 (has links)
In this thesis text categorization is investigated in four dimensions of analysis: theoretically as well as empirically, and as a manual as well as a machine-based process. In the first four chapters we look at the theoretical foundation of subject classification of text documents, with a certain focus on classification as a procedure for organizing documents in libraries. A working hypothesis used in the theoretical analysis is that classification of documents is a process that involves translations between statements in different languages, both natural and artificial. We further investigate the close relationships between structures in classification languages and the order relations and topological structures that arise from classification. A classification algorithm that gets a special focus in the subsequent chapters is the support vector machine (SVM), which in its original formulation is a binary classifier in linear vector spaces, but has been extended to handle classification problems for which the categories are not linearly separable. To this end the algorithm utilizes a category of functions called kernels, which induce feature spaces by means of high-dimensional and often non-linear maps. For the empirical part of this study we investigate the classification performance of semantic kernels generated by different measures of semantic similarity. One category of such measures is based on the latent semantic analysis and the random indexing methods, which generates term vectors by using co-occurrence data from text collections. Another semantic measure used in this study is pointwise mutual information. In addition to the empirical study of semantic kernels we also investigate the performance of a term weighting scheme called divergence from randomness, that has hitherto received little attention within the area of automatic text categorization. The result of the empirical part of this study shows that the semantic kernels generally outperform the “standard” (non-semantic) linear kernel, especially for small training sets. A conclusion that can be drawn with respect to the investigated datasets is therefore that semantic information in the kernel in general improves its classification performance, and that the difference between the standard kernel and the semantic kernels is particularly large for small training sets. Another clear trend in the result is that the divergence from randomness weighting scheme yields a classification performance surpassing that of the common tf-idf weighting scheme.
|
190 |
On-line monitoring of hydrocyclones by use of image analysisJanse van Vuuren, Magrieta Jeanette 03 1900 (has links)
Thesis (MScEng (Process Engineering))--University of Stellenbosch, 2011. / ENGLISH ABSTRACT: Hydrocyclones are separation devices that are widely used throughout the chemical engineering and mineral processing industries. Although simple in design, the intricate flow structure of the device complicates control. As an alternative to conventional empirical and theoretical modelling, process state monitoring methods have recently been employed as a means to control hydrocyclones. The purpose of process state monitoring methods is to distinguish between the desired operating state with favourable separation, the transition state, and the troublesome operating state of dense flow separation.
In comparison to previously employed monitoring techniques, image analysis of the underflow is regarded as a promising approach. Preliminary studies have indicated that the technique complies with hydrocyclone monitoring requirements: sensitivity, non-invasiveness, sampling times less than one second, robustness and low cost. The primary objective of this study was therefore defined as investigating the feasibility of image analysis of hydrocyclone underflow as a monitoring technique.
Data collection entailed the recording of hydrocyclone underflow for different operating states. Six case studies were performed in total: Gold, Ilmenite, Platreef, Merensky 1, Merensky 2 and Merensky 3 (with the case study names indicating the different ore types used). An image analysis technique, consisting of feature extraction through motion detection, as well as various noise reduction methods, was consequently developed and applied to the video data. Classification of the various operating states was attempted by performing modelling by one-class support vector machines (SVM).
Results indicated that the developed image analysis technique effectively addresses background noise, random noise and system vibration through image enhancement and a motion threshold. Extremely low contrast differences and foreground noise did, however, prove problematic in Ilmenite and Merensky 1 case studies respectively. For the remaining case studies, it was found that the various operating states were identified with high accuracy through one-class SVM classification. This is particularly true for the identification of the troublesome dense flow separation for which extremely low missing alarm rates were obtained (0 % in most cases). In terms of practicality, the technique proved to be sensitive, non-intrusive and economical. The sampling time of 30 frames per second and estimated processing to video time ratio of 1:1, is furthermore satisfactory. Ultimately, the results indicate that the image analysis of hydrocyclone underflow is a viable monitoring technique.
The robustness of the technique might further be improved by use of backlighting and an air-knife. It is also recommended that future work should focus on testing the monitoring technique on an industrial hydrocyclone setup. / AFRIKAANSE OPSOMMING: Hidrosiklone is skeidingsapparate wat algemeen gebruik word in chemiese ingenieurswese en mineraalprosesserings industrieë. Alhoewel die apparaat ‘n eenvoudige ontwerp het, bemoeilik die komplekse interne vloeistruktuur die beheer daarvan. Prosestoestandmoniteringsmetodes is vir hidrosikloonbeheer toegepas as alternatief vir konvensionele empiriese en teoretiese modellering. Die doel van prosestoestandmoniteringsmetodes is om te onderskei tussen die gewenste bedryfstoestand met gunstige skeiding, die oorgangstoestand, en die moeilike bedryfstoestand van digtevloeiskeiding.
In vergelyking met vorige toegepaste moniteringstegnieke, word beeldverwerking van die ondervloei beskou as ‘n belowende tegniek. Voorlopige studies het aangedui dat die tegniek voldoen aan die hidrosikloonmoniteringvereistes: sensitiwiteit, nie-indringendheid, monsternemingstydperke laer as een sekonde, robuustheid en lae koste. Die primêre doelwit van hierdie studie is daarom gedefineer as die ondersoek van die doenlikheid van beeldverwerking van hidrosikloon ondervloei as ‘n moniteringstegniek.
Die data versameling het die afneem van hidrosikloon ondervloei vir verskillende bedryfstoestande behels. Ses gevallestudies is in totaal uitgevoer: Goud, Ilmeniet, Platreef, Merensky 1, Merensky 2 en Merensky 3 (die gevallestudie name dui die verskillende erts tipes wat gebruik is aan). ‘n Beeldverwerkingstegniek, wat bestaan uit kenmerkekstraksie deur bewegingsopsporing, asook verskeie geruisverlagingsmetodes, is gevolglik ontwikkel en toegepas op die video data. Klassifikasie van die verskeie bedryfstoestande is beproef deur modellering met enkelklassteunvektormasjiene.
Resultate het aangedui dat die ontwikkelde beeldverwerkingstegniek agtergrond geruis, onreëlmatige geruis en sisteem vibrasie suksesvol aanspreek deur beeldversterking en ‘n bewegingslimiet. Beduidende lae kontrasverskille en voorgrond geruis blyk wel problematies in die Ilmeniet en Merensky 1 gevallestudies onderskeidelik. Vir die orige gevallestudies is gevind dat die verskillinde bedryfstoestande met hoë akkuraatheid geïdentifiseer is deur enkelklassteunvektormasjiene klassifisering. Dit is veral waar vir die identifisering van die moeilike digtevloeiskeiding waarvoor beduidende lae vermiste-alarmmaatstawwe behaal is (0 % in die meeste gevalle). Aangaande die praktiese aspekte, blyk die tegniek sensitief, nie-indringend en ekonomies. Die monsternemingstydperk van 30 raampies per sekonde en die beraamde prosesserings- tot videotyd verhouding van 1:1, is ook voldoende. Ten slotte dui die resultate daarop dat die beeldverwerking van hidrosikloon ondervloei ‘n uitvoerbare moniteringstegniek is.
Die robuustheid van die tegniek sou verder verbeter kon word deur gebruik te maak van agtergrondverligting en ‘n lugspuit. Dit word ook aanbeveel dat toekomstige werk op die toetsing van die moniteringstegniek op ‘n industriële hidrosikloon toestel moet fokus.
|
Page generated in 0.2075 seconds