Spelling suggestions: "subject:"8upport vector machines"" "subject:"6upport vector machines""
61 |
Monitoramento da cobertura do solo no entorno de hidrelétricas utilizando o classificador SVM (Support Vector Machines). / Land cover monitoring in hydroelectric domain area using Support Vector Machines (SVM) classifier.Albuquerque, Rafael Walter de 07 December 2011 (has links)
A classificação de imagens de satélite é muito utilizada para elaborar mapas de cobertura do solo. O objetivo principal deste trabalho consistiu no mapeamento automático da cobertura do solo no entorno da Usina de Lajeado (TO) utilizando-se o classificador SVM. Buscou-se avaliar a dimensão de áreas antropizadas presentes na represa e a acurácia da classificação gerada pelo algoritmo, que foi comparada com a acurácia da classificação obtida pelo tradicional classificador MAXVER. Esta dissertação apresentou sugestões de calibração do algoritmo SVM para a otimização do seu resultado. Verificou-se uma alta acurácia na classificação SVM, que mostrou o entorno da represa hidrelétrica em uma situação ambientalmente favorável. Os resultados obtidos pela classificação SVM foram similares aos obtidos pelo MAXVER, porém este último contextualizou espacialmente as classes de cobertura do solo com uma acurácia considerada um pouco menor. Apesar do bom estado de preservação ambiental apresentado, a represa deve ter seu entorno devidamente monitorado, pois foi diagnosticada uma grande quantidade de incêndios gerados pela população local, sendo que as ferramentas discutidas nesta dissertação auxiliam esta atividade de monitoramento. / Satellite Image Classification are very useful for building land cover maps. The aim of this study consists on an automatic land cover mapping in the domain area of Lajeados dam, at Tocantins state, using the SVM classifier. The aim of this work was to evaluate anthropic dimension areas near the dam and also to verify the algorithms classification accuracy, which was compared to the results of the standard ML (Maximum Likelihood) classifier. This work presents calibration suggestions to the SVM algorithm for optimizing its results. SVM classification presented high accuracy, suggesting a good environmental situation along Lajeados dam region. Classification results comparison between SVM and ML were quite similar, but SVMs spatial contextual mapping areas were slightly better. Although environmental situation of the study area was considered good, monitoring ecosystem is important because a significant quantity of burnt areas was noticed due to local communities activities. This fact emphasized the importance of the tools discussed in this work, which helps environmental monitoring.
|
62 |
Effets masqués en analyse prédictive / Masked effects in predictive analysisBascoul, Ganaël 27 June 2013 (has links)
L’objectif de cette thèse consiste en l’élaboration de deux méthodologies visant à révéler des effets jusqu’alors masqués en modélisation décisionnelle. Dans la première partie, nous cherchons à mettre en œuvre une méthode d’analyse locale des critères de choix dans un contexte de choix binaires. Dans une seconde partie, nous mettons en avant les effets de génération dans l’étude des comportements de choix. Dans les deux parties, notre démarche de recherche combine de nouveaux outils d’analyse prédictive (Support Vector Machines, FANOVA, PLS) aux outils traditionnels de statistique inférentielle, afin d’enrichir les résultats habituels par des informations complémentaires sur les effets masqués que constituent les effets locaux dans les fonctions de choix binaires, et les effets de génération dans l’analyse temporelle des comportement de choix. Les méthodologies proposées, respectivement nommées AEL et APC-PLS, sont appliquées sur des cas réels, afin d’en illustrer le fonctionnement et la pertinence. / The objective of this thesis is the development of two methodologies to reveal previously hidden effects in decision modeling. In the first part, we try to implement a method of local analysis in order to select criteria in the context of binary choices. In a second part, we highlight the effects of generations in the study of consumer behavior. In both parts, our research approach combines new predictive analytical tools (such as Support Vector Machines, FANOVA, PLS) to traditional tools of inferential statistics, to enrich the usual results by additional on the masked effects, which are the local effects in the binary choice functions, and the effects of generation in temporal choice behavior analysis.The proposed methodologies, respectively named AEL and APC- PLS are both applied to real cases in order to illustrate their operation and relevance.
|
63 |
Wavelets, predição linear e LS-SVM aplicados na análise e classificação de sinais de vozes patológicas / Wavelets, LPC and LS-SVM applied for analysis and identification of pathological voice signalsFonseca, Everthon Silva 24 April 2008 (has links)
Neste trabalho, foram utilizadas as vantagens da ferramenta matemática de análise temporal e espectral, a transformada wavelet discreta (DWT), além dos coeficientes de predição linear (LPC) e do algoritmo de inteligência artificial, Least Squares Support Vector Machines (LS-SVM), para aplicações em análise de sinais de voz e classificação de vozes patológicas. Inúmeros trabalhos na literatura têm demonstrado o grande interesse existente por ferramentas auxiliares ao diagnóstico de patologias da laringe. Os componentes da DWT forneceram parâmetros de medida para a análise e classificação das vozes patológicas, principalmente aquelas provenientes de pacientes com edema de Reinke e nódulo nas pregas vocais. O banco de dados com as vozes patológicas foi obtido do Departamento de Otorrinolaringologia e Cirurgia de Cabeça e Pescoço do Hospital das Clínicas da Faculdade de Medicina de Ribeirão Preto (FMRP-USP). Utilizando-se o algoritmo de reconhecimento de padrões, LS-SVM, mostrou-se que a combinação dos componentes da DWT de Daubechies com o filtro LP inverso levou a um classificador de bom desempenho alcançando mais de 90% de acerto na classificação das vozes patológicas. / The main objective of this work was to use the advantages of the time-frequency analysis mathematical tool, discrete wavelet transform (DWT), besides the linear prediction coefficients (LPC) and the artificial intelligence algorithm, Least Squares Support Vector Machines (LS-SVM), for applications in voice signal analysis and classification of pathological voices. A large number of works in the literature has been shown that there is a great interest for auxiliary tools to the diagnosis of laryngeal pathologies. DWT components gave measure parameters for the analysis and classification of pathological voices, mainly that ones from patients with Reinke\'s edema and nodule in the vocal folds. It was used a data bank with pathological voices from the Otolaryngology and the Head and Neck Surgery sector of the Clinical Hospital of the Faculty of Medicine at Ribeirão Preto, University of Sao Paulo (FMRP-USP), Brazil. Using the automatic learning algorithm applied in pattern recognition problems, LS-SVM, results have showed that the combination of Daubechies\' DWT components and inverse LP filter leads to a classifier with good performance reaching more than 90% of accuracy in the classification of the pathological voices.
|
64 |
Reconnaissance des sons de l’environnement dans un contexte domotique / Environmental sounds recognition in a domotic contextSehili, Mohamed el Amine 05 July 2013 (has links)
Dans beaucoup de pays du monde, on observe une importante augmentation du nombre de personnes âgées vivant seules. Depuis quelques années, un nombre significatif de projets de recherche sur l’assistance aux personnes âgées ont vu le jour. La plupart de ces projets utilisent plusieurs modalités (vidéo, son, détection de chute, etc.) pour surveiller l'activité de la personne et lui permettre de communiquer naturellement avec sa maison "intelligente", et, en cas de danger, lui venir en aide au plus vite. Ce travail a été réalisé dans le cadre du projet ANR VERSO de recherche industrielle, Sweet-Home. Les objectifs du projet sont de proposer un système domotique permettant une interaction naturelle (par commande vocale et tactile) avec la maison, et procurant plus de sécurité à l'habitant par la détection des situations de détresse. Dans ce cadre, l'objectif de ce travail est de proposer des solutions pour la reconnaissance des sons de la vie courante dans un contexte réaliste. La reconnaissance du son fonctionnera en amont d'un système de Reconnaissance Automatique de la Parole. Les performances de celui-ci dépendent donc de la fiabilité de la séparation entre la parole et les autres sons. Par ailleurs, une bonne reconnaissance de certains sons, complétée par d'autres sources informations (détection de présence, détection de chute, etc.) permettrait de bien suivre les activités de la personne et de détecter ainsi les situations de danger. Dans un premier temps, nous nous sommes intéressés aux méthodes en provenance de la Reconnaissance et Vérification du Locuteur. Dans cet esprit, nous avons testé des méthodes basées sur GMM et SVM. Nous avons, en particulier, testé le noyau SVM-GSL (SVM GMM Supervector Linear Kernel) utilisé pour la classification de séquences. SVM-GSL est une combinaison de SVM et GMM et consiste à transformer une séquence de vecteurs de longueur arbitraire en un seul vecteur de très grande taille, appelé Super Vecteur, et utilisé en entrée d'un SVM. Les expérimentations ont été menées en utilisant une base de données créée localement (18 classes de sons, plus de 1000 enregistrements), puis le corpus du projet Sweet-Home, en intégrant notre système dans un système plus complet incluant la détection multi-canaux du son et la reconnaissance de la parole. Ces premières expérimentations ont toutes été réalisées en utilisant un seul type de coefficients acoustiques, les MFCC. Par la suite, nous nous sommes penchés sur l'étude d'autres familles de coefficients en vue d'en évaluer l'utilisabilité en reconnaissance des sons de l'environnement. Notre motivation fut de trouver des représentations plus simples et/ou plus efficaces que les MFCC. En utilisant 15 familles différentes de coefficients, nous avons également expérimenté deux approches pour transformer une séquence de vecteurs en un seul vecteur, à utiliser avec un SVM linéaire. Dans le première approche, on calcule un nombre fixe de coefficients statistiques qui remplaceront toute la séquence de vecteurs. La seconde approche (une des contributions de ce travail) utilise une méthode de discrétisation pour trouver, pour chaque caractéristique d'un vecteur acoustique, les meilleurs points de découpage permettant d'associer une classe donnée à un ou plusieurs intervalles de valeurs. La probabilité de la séquence est estimée par rapport à chaque intervalle. Les probabilités obtenues ainsi sont utilisées pour construire un seul vecteur qui remplacera la séquence de vecteurs acoustiques. Les résultats obtenus montrent que certaines familles de coefficients sont effectivement plus adaptées pour reconnaître certaines classes de sons. En effet, pour la plupart des classes, les meilleurs taux de reconnaissance ont été observés avec une ou plusieurs familles de coefficients différentes des MFCC. Certaines familles sont, de surcroît, moins complexes et comptent une seule caractéristique par fenêtre d'analyse contre 16 caractéristiques pour les MFCC / In many countries around the world, the number of elderly people living alone has been increasing. In the last few years, a significant number of research projects on elderly people monitoring have been launched. Most of them make use of several modalities such as video streams, sound, fall detection and so on, in order to monitor the activities of an elderly person, to supply them with a natural way to communicate with their “smart-home”, and to render assistance in case of an emergency. This work is part of the Industrial Research ANR VERSO project, Sweet-Home. The goals of the project are to propose a domotic system that enables a natural interaction (using touch and voice command) between an elderly person and their house and to provide them a higher safety level through the detection of distress situations. Thus, the goal of this work is to come up with solutions for sound recognition of daily life in a realistic context. Sound recognition will run prior to an Automatic Speech Recognition system. Therefore, the speech recognition’s performances rely on the reliability of the speech/non-speech separation. Furthermore, a good recognition of a few kinds of sounds, complemented by other sources of information (presence detection, fall detection, etc.) could allow for a better monitoring of the person's activities that leads to a better detection of dangerous situations. We first had been interested in methods from the Speaker Recognition and Verification field. As part of this, we have experimented methods based on GMM and SVM. We had particularly tested a Sequence Discriminant SVM kernel called SVM-GSL (SVM GMM Super Vector Linear Kernel). SVM-GSL is a combination of GMM and SVM whose basic idea is to map a sequence of vectors of an arbitrary length into one high dimensional vector called a Super Vector and used as an input of an SVM. Experiments had been carried out using a locally created sound database (containing 18 sound classes for over 1000 records), then using the Sweet-Home project's corpus. Our daily sounds recognition system was integrated into a more complete system that also performs a multi-channel sound detection and speech recognition. These first experiments had all been performed using one kind of acoustical coefficients, MFCC coefficients. Thereafter, we focused on the study of other families of acoustical coefficients. The aim of this study was to assess the usability of other acoustical coefficients for environmental sounds recognition. Our motivation was to find a few representations that are simpler and/or more effective than the MFCC coefficients. Using 15 different acoustical coefficients families, we have also experimented two approaches to map a sequence of vectors into one vector, usable with a linear SVM. The first approach consists of computing a set of a fixed number of statistical coefficients and use them instead of the whole sequence. The second one, which is one of the novel contributions of this work, makes use of a discretization method to find, for each feature within an acoustical vector, the best cut points that associates a given class with one or many intervals of values. The likelihood of the sequence is estimated for each interval. The obtained likelihood values are used to build one single vector that replaces the sequence of acoustical vectors. The obtained results show that a few families of coefficients are actually more appropriate to the recognition of some sound classes. For most sound classes, we noticed that the best recognition performances were obtained with one or many families other than MFCC. Moreover, a number of these families are less complex than MFCC. They are actually a one-feature per frame acoustical families, whereas MFCC coefficients contain 16 features per frame
|
65 |
[en] MACHINE LEARNING FOR SENTIMENT CLASSIFICATION / [pt] APRENDIZADO DE MÁQUINA PARA O PROBLEMA DE SENTIMENT CLASSIFICATIONPEDRO OGURI 18 May 2007 (has links)
[pt] Sentiment Analysis é um problema de categorização de texto
no qual deseja-se identificar opiniões favoráveis e
desfavoráveis com relação a um tópico.
Um exemplo destes tópicos de interesse são organizações e
seus produtos. Neste problema, documentos são
classificados pelo sentimento, conotação,
atitudes e opiniões ao invés de se restringir aos fatos
descritos neste. O principal desafio em Sentiment
Classification é identificar como sentimentos são
expressados em textos e se tais sentimentos indicam uma
opinião positiva (favorável) ou negativa (desfavorável)
com relação a um tópico. Devido ao crescente volume de
dados disponível na Web, onde todos tendem
a ser geradores de conteúdo e expressarem opiniões sobre
os mais variados assuntos, técnicas de Aprendizado de
Máquina vem se tornando cada vez mais atraentes.
Nesta dissertação investigamos métodos de Aprendizado de
Máquina para Sentiment Analysis. Apresentamos alguns
modelos de representação de documentos como saco de
palavras e N-grama. Testamos os classificadores
SVM (Máquina de Vetores Suporte) e Naive Bayes com
diferentes modelos de representação textual e comparamos
seus desempenhos. / [en] Sentiment Analysis is a text categorization problem in
which we want to
identify favorable and unfavorable opinions towards a
given topic. Examples
of such topics are organizations and its products. In this
problem, docu-
ments are classifed according to their sentiment,
connotation, attitudes and
opinions instead of being limited to the facts described
in it.
The main challenge in Sentiment Classification is
identifying how sentiments
are expressed in texts and whether they indicate a
positive (favorable) or
negative (unfavorable) opinion towards a topic. Due to the
growing volume
of information available online in an environment where we
all tend to be
content generators and express opinions on a variety of
subjects, Machine
Learning techniques have become more and more attractive.
In this dissertation, we investigate Machine Learning
methods applied to
Sentiment Analysis. We present document representation
models such as
bag-of-words and N-grams.We compare the performance of the
Naive Bayes
and the Support Vector Machine classifiers for each
proposed model
|
66 |
Monitoramento da cobertura do solo no entorno de hidrelétricas utilizando o classificador SVM (Support Vector Machines). / Land cover monitoring in hydroelectric domain area using Support Vector Machines (SVM) classifier.Rafael Walter de Albuquerque 07 December 2011 (has links)
A classificação de imagens de satélite é muito utilizada para elaborar mapas de cobertura do solo. O objetivo principal deste trabalho consistiu no mapeamento automático da cobertura do solo no entorno da Usina de Lajeado (TO) utilizando-se o classificador SVM. Buscou-se avaliar a dimensão de áreas antropizadas presentes na represa e a acurácia da classificação gerada pelo algoritmo, que foi comparada com a acurácia da classificação obtida pelo tradicional classificador MAXVER. Esta dissertação apresentou sugestões de calibração do algoritmo SVM para a otimização do seu resultado. Verificou-se uma alta acurácia na classificação SVM, que mostrou o entorno da represa hidrelétrica em uma situação ambientalmente favorável. Os resultados obtidos pela classificação SVM foram similares aos obtidos pelo MAXVER, porém este último contextualizou espacialmente as classes de cobertura do solo com uma acurácia considerada um pouco menor. Apesar do bom estado de preservação ambiental apresentado, a represa deve ter seu entorno devidamente monitorado, pois foi diagnosticada uma grande quantidade de incêndios gerados pela população local, sendo que as ferramentas discutidas nesta dissertação auxiliam esta atividade de monitoramento. / Satellite Image Classification are very useful for building land cover maps. The aim of this study consists on an automatic land cover mapping in the domain area of Lajeados dam, at Tocantins state, using the SVM classifier. The aim of this work was to evaluate anthropic dimension areas near the dam and also to verify the algorithms classification accuracy, which was compared to the results of the standard ML (Maximum Likelihood) classifier. This work presents calibration suggestions to the SVM algorithm for optimizing its results. SVM classification presented high accuracy, suggesting a good environmental situation along Lajeados dam region. Classification results comparison between SVM and ML were quite similar, but SVMs spatial contextual mapping areas were slightly better. Although environmental situation of the study area was considered good, monitoring ecosystem is important because a significant quantity of burnt areas was noticed due to local communities activities. This fact emphasized the importance of the tools discussed in this work, which helps environmental monitoring.
|
67 |
Wavelets, predição linear e LS-SVM aplicados na análise e classificação de sinais de vozes patológicas / Wavelets, LPC and LS-SVM applied for analysis and identification of pathological voice signalsEverthon Silva Fonseca 24 April 2008 (has links)
Neste trabalho, foram utilizadas as vantagens da ferramenta matemática de análise temporal e espectral, a transformada wavelet discreta (DWT), além dos coeficientes de predição linear (LPC) e do algoritmo de inteligência artificial, Least Squares Support Vector Machines (LS-SVM), para aplicações em análise de sinais de voz e classificação de vozes patológicas. Inúmeros trabalhos na literatura têm demonstrado o grande interesse existente por ferramentas auxiliares ao diagnóstico de patologias da laringe. Os componentes da DWT forneceram parâmetros de medida para a análise e classificação das vozes patológicas, principalmente aquelas provenientes de pacientes com edema de Reinke e nódulo nas pregas vocais. O banco de dados com as vozes patológicas foi obtido do Departamento de Otorrinolaringologia e Cirurgia de Cabeça e Pescoço do Hospital das Clínicas da Faculdade de Medicina de Ribeirão Preto (FMRP-USP). Utilizando-se o algoritmo de reconhecimento de padrões, LS-SVM, mostrou-se que a combinação dos componentes da DWT de Daubechies com o filtro LP inverso levou a um classificador de bom desempenho alcançando mais de 90% de acerto na classificação das vozes patológicas. / The main objective of this work was to use the advantages of the time-frequency analysis mathematical tool, discrete wavelet transform (DWT), besides the linear prediction coefficients (LPC) and the artificial intelligence algorithm, Least Squares Support Vector Machines (LS-SVM), for applications in voice signal analysis and classification of pathological voices. A large number of works in the literature has been shown that there is a great interest for auxiliary tools to the diagnosis of laryngeal pathologies. DWT components gave measure parameters for the analysis and classification of pathological voices, mainly that ones from patients with Reinke\'s edema and nodule in the vocal folds. It was used a data bank with pathological voices from the Otolaryngology and the Head and Neck Surgery sector of the Clinical Hospital of the Faculty of Medicine at Ribeirão Preto, University of Sao Paulo (FMRP-USP), Brazil. Using the automatic learning algorithm applied in pattern recognition problems, LS-SVM, results have showed that the combination of Daubechies\' DWT components and inverse LP filter leads to a classifier with good performance reaching more than 90% of accuracy in the classification of the pathological voices.
|
68 |
"Classificação de páginas na internet" / "Internet pages classification"José Martins Júnior 11 April 2003 (has links)
O grande crescimento da Internet ocorreu a partir da década de 1990 com o surgimento dos provedores comerciais de serviços, e resulta principalmente da boa aceitação e vasta disseminação do uso da Web. O grande problema que afeta a escalabilidade e o uso de tal serviço refere-se à organização e à classificação de seu conteúdo. Os engenhos de busca atuais possibilitam a localização de páginas na Web pela comparação léxica de conjuntos de palavras perante os conteúdos dos hipertextos. Tal mecanismo mostra-se ineficaz quando da necessidade pela localização de conteúdos que expressem conceitos ou objetos, a exemplo de produtos à venda oferecidos em sites de comércio eletrônico. A criação da Web Semântica foi anunciada no ano de 2000 para esse propósito, visando o estabelecimento de novos padrões para a representação formal de conteúdos nas páginas Web. Com sua implantação, cujo prazo inicialmente previsto foi de dez anos, será possível a expressão de conceitos nos conteúdos dos hipertextos, que representarão objetos classificados por uma ontologia, viabilizando assim o uso de sistemas, baseados em conhecimento, implementados por agentes inteligentes de software. O projeto DEEPSIA foi concebido como uma solução centrada no comprador, ao contrário dos atuais Market Places, para resolver o problema da localização de páginas Web com a descrição de produtos à venda, fazendo uso de métodos de classificação de textos, apoiados pelos algoritmos k-NN e C4.5, no suporte ao processo decisório realizado por um agente previsto em sua arquitetura, o Crawler Agent. Os testes com o sistema em sites brasileiros denotaram a necessidade pela sua adaptação em diversos aspectos, incluindo-se o processo decisório envolvido, que foi abordado pelo presente trabalho. A solução para o problema envolveu a aplicação e a avaliação do método Support Vector Machines, e é descrita em detalhes. / The huge growth of the Internet has been occurring since 90s with the arrival of the internet service providers. One important reason is the good acceptance and wide dissemination of the Web. The main problem that affects its scalability and usage is the organization and classification of its content. The current search engines make possible the localization of pages in the Web by means of a lexical comparison among sets of words and the hypertexts contents. In order to find contents that express concepts or object, such as products for sale in electronic commerce sites such mechanisms are inefficient. The proposition of the Semantic Web was announced in 2000 for this purpose, envisioning the establishment of new standards for formal contents representation in the Web pages. With its implementation, whose deadline was initially stated for ten years, it will be possible to express concepts in hypertexts contents, that will fully represent objects classified into an ontology, making possible the use of knowledge based systems implemented by intelligent softwares agents. The DEEPSIA project was conceived as a solution centered in the purchaser, instead of current Market Places, in order to solve the problem of finding Web pages with products for sale description, making use of methods of text classification, with k-NN and C4.5 algorithms, to support the decision problem to be solved by an specific agent designed, the Crawler Agent. The tests of the system in Brazilian sites have denoted the necessity for its adaptation in many aspects, including the involved decision process, which was focused in present work. The solution for the problem includes the application and evaluation of the Support Vector Machines method, and it is described in detail.
|
69 |
"Investigação de estratégias para a geração de máquinas de vetores de suporte multiclasses" / Investigation of strategies for the generation of multiclass support vector machinesAna Carolina Lorena 16 February 2006 (has links)
Diversos problemas envolvem a classificação de dados em categorias, também denominadas classes. A partir de um conjunto de dados cujas classes são conhecidas, algoritmos de Aprendizado de Máquina (AM) podem ser utilizados na indução de um classificador capaz de predizer a classe de novos dados do mesmo domínio, realizando assim a discriminação desejada. Dentre as diversas técnicas de AM utilizadas em problemas de classificação, as Máquinas de Vetores de Suporte (Support Vector Machines - SVMs) se destacam por sua boa capacidade de generalização. Elas são originalmente concebidas para a solução de problemas com apenas duas classes, também denominados binários. Entretanto, diversos problemas requerem a discriminação dos dados em mais que duas categorias ou classes. Nesta Tese são investigadas e propostas estratégias para a generalização das SVMs para problemas com mais que duas classes, intitulados multiclasses. O foco deste trabalho é em estratégias que decompõem o problema multiclasses original em múltiplos subproblemas binários, cujas saídas são então combinadas na obtenção da classificação final. As estratégias propostas visam investigar a adaptação das decomposições a cada aplicação considerada, a partir de informações do desempenho obtido em sua solução ou extraídas de seus dados. Os algoritmos implementados foram avaliados em conjuntos de dados gerais e em aplicações reais da área de Bioinformática. Os resultados obtidos abrem várias possibilidades de pesquisas futuras. Entre os benefícios verificados tem-se a obtenção de decomposições mais simples, que requerem menos classificadores binários na solução multiclasses. / Several problems involve the classification of data into categories, also called classes. Given a dataset containing data whose classes are known, Machine Learning (ML) algorithms can be employed for the induction of a classifier able to predict the class of new data from the same domain, thus performing the desired discrimination. Among the several ML techniques applied to classification problems, the Support Vector Machines (SVMs) are known by their high generalization ability. They are originally conceived for the solution of problems with only two classes, also named binary problems. However, several problems require the discrimination of examples into more than two categories or classes. This thesis investigates and proposes strategies for the generalization of SVMs to problems with more than two classes, known as multiclass problems. The focus of this work is on strategies that decompose the original multiclass problem into multiple binary subtasks, whose outputs are then combined to obtain the final classification. The proposed strategies aim to investigate the adaptation of the decompositions for each multiclass application considered, using information of the performance obtained for its solution or extracted from its examples. The implemented algorithms were evaluated on general datasets and on real applications from the Bioinformatics domain. The results obtained open possibilities of many future work. Among the benefits observed is the obtainment of simpler decompositions, which require less binary classifiers in the multiclass solution.
|
70 |
Product Similarity Matching for Food Retail using Machine Learning / Produktliknande matchning för livsmedel med maskininlärningKerek, Hanna January 2020 (has links)
Product similarity matching for food retail is studied in this thesis. The goal is to find products that are similar but not necessarily of the same brand which can be used as a replacement product for a product that is out of stock or does not exist in a specific store. The aim of the thesis is to examine which machine learning model that is best suited to perform the product similarity matching. The product data used for training the models were name, description, nutrients, weight and filters (labels, for example organic). Product similarity matching was performed pairwise and the similarity between the products was measured by jaccard distance for text attributes and relative difference for numeric values. Random Forest, Logistic Regression and Support Vector Machines were tested and compared to a baseline. The baseline computed the jaccard distance for the product names and did the classification based on a threshold value of the jaccard distance. The result was measured by accuracy, F-measure and AUC score. Random Forest performed best in terms of all evaluation metrics and Logistic Regression, Random Forest and Support Vector Machines all performed better than the baseline. / I den här rapporten studeras produktliknande matchning för livsmedel. Målet är att hitta produkter som är liknande men inte nödvändigtvis har samma märke som kan vara en ersättningsprodukt till en produkt som är slutsåld eller inte säljs i en specifik affär. Syftet med den här rapporten är att undersöka vilken maskininlärningsmodel som är bäst lämpad för att göra produktliknande matchning. Produktdatan som användes för att träna modellerna var namn, beskrivning, näringsvärden, vikt och märkning (exempelvis ekologisk). Produktmatchningen gjordes parvis och likhet mellan produkterna beräknades genom jaccard index för textattribut och relativ differens för numeriska värden. Random Forest, logistisk regression och Support Vector Machines testades och jämfördes mot en baslinje. I baslinjen räknades jaccard index ut enbart för produkternas namn och klassificeringen gjordes genom att använda ett tröskelvärde för jaccard indexet. Resultatet mättes genom noggrannhet, F-measure och AUC. Random Forest presterade bäst sett till alla prestationsmått och logistisk regression, Random Forest och Support Vector Machines gav alla bättre resultat än baslinjen.
|
Page generated in 0.0755 seconds