Spelling suggestions: "subject:"8upport vector"" "subject:"6upport vector""
111 |
"Classificação de páginas na internet" / "Internet pages classification"Martins Júnior, José 11 April 2003 (has links)
O grande crescimento da Internet ocorreu a partir da década de 1990 com o surgimento dos provedores comerciais de serviços, e resulta principalmente da boa aceitação e vasta disseminação do uso da Web. O grande problema que afeta a escalabilidade e o uso de tal serviço refere-se à organização e à classificação de seu conteúdo. Os engenhos de busca atuais possibilitam a localização de páginas na Web pela comparação léxica de conjuntos de palavras perante os conteúdos dos hipertextos. Tal mecanismo mostra-se ineficaz quando da necessidade pela localização de conteúdos que expressem conceitos ou objetos, a exemplo de produtos à venda oferecidos em sites de comércio eletrônico. A criação da Web Semântica foi anunciada no ano de 2000 para esse propósito, visando o estabelecimento de novos padrões para a representação formal de conteúdos nas páginas Web. Com sua implantação, cujo prazo inicialmente previsto foi de dez anos, será possível a expressão de conceitos nos conteúdos dos hipertextos, que representarão objetos classificados por uma ontologia, viabilizando assim o uso de sistemas, baseados em conhecimento, implementados por agentes inteligentes de software. O projeto DEEPSIA foi concebido como uma solução centrada no comprador, ao contrário dos atuais Market Places, para resolver o problema da localização de páginas Web com a descrição de produtos à venda, fazendo uso de métodos de classificação de textos, apoiados pelos algoritmos k-NN e C4.5, no suporte ao processo decisório realizado por um agente previsto em sua arquitetura, o Crawler Agent. Os testes com o sistema em sites brasileiros denotaram a necessidade pela sua adaptação em diversos aspectos, incluindo-se o processo decisório envolvido, que foi abordado pelo presente trabalho. A solução para o problema envolveu a aplicação e a avaliação do método Support Vector Machines, e é descrita em detalhes. / The huge growth of the Internet has been occurring since 90s with the arrival of the internet service providers. One important reason is the good acceptance and wide dissemination of the Web. The main problem that affects its scalability and usage is the organization and classification of its content. The current search engines make possible the localization of pages in the Web by means of a lexical comparison among sets of words and the hypertexts contents. In order to find contents that express concepts or object, such as products for sale in electronic commerce sites such mechanisms are inefficient. The proposition of the Semantic Web was announced in 2000 for this purpose, envisioning the establishment of new standards for formal contents representation in the Web pages. With its implementation, whose deadline was initially stated for ten years, it will be possible to express concepts in hypertexts contents, that will fully represent objects classified into an ontology, making possible the use of knowledge based systems implemented by intelligent softwares agents. The DEEPSIA project was conceived as a solution centered in the purchaser, instead of current Market Places, in order to solve the problem of finding Web pages with products for sale description, making use of methods of text classification, with k-NN and C4.5 algorithms, to support the decision problem to be solved by an specific agent designed, the Crawler Agent. The tests of the system in Brazilian sites have denoted the necessity for its adaptation in many aspects, including the involved decision process, which was focused in present work. The solution for the problem includes the application and evaluation of the Support Vector Machines method, and it is described in detail.
|
112 |
Monitoramento da cobertura do solo no entorno de hidrelétricas utilizando o classificador SVM (Support Vector Machines). / Land cover monitoring in hydroelectric domain area using Support Vector Machines (SVM) classifier.Albuquerque, Rafael Walter de 07 December 2011 (has links)
A classificação de imagens de satélite é muito utilizada para elaborar mapas de cobertura do solo. O objetivo principal deste trabalho consistiu no mapeamento automático da cobertura do solo no entorno da Usina de Lajeado (TO) utilizando-se o classificador SVM. Buscou-se avaliar a dimensão de áreas antropizadas presentes na represa e a acurácia da classificação gerada pelo algoritmo, que foi comparada com a acurácia da classificação obtida pelo tradicional classificador MAXVER. Esta dissertação apresentou sugestões de calibração do algoritmo SVM para a otimização do seu resultado. Verificou-se uma alta acurácia na classificação SVM, que mostrou o entorno da represa hidrelétrica em uma situação ambientalmente favorável. Os resultados obtidos pela classificação SVM foram similares aos obtidos pelo MAXVER, porém este último contextualizou espacialmente as classes de cobertura do solo com uma acurácia considerada um pouco menor. Apesar do bom estado de preservação ambiental apresentado, a represa deve ter seu entorno devidamente monitorado, pois foi diagnosticada uma grande quantidade de incêndios gerados pela população local, sendo que as ferramentas discutidas nesta dissertação auxiliam esta atividade de monitoramento. / Satellite Image Classification are very useful for building land cover maps. The aim of this study consists on an automatic land cover mapping in the domain area of Lajeados dam, at Tocantins state, using the SVM classifier. The aim of this work was to evaluate anthropic dimension areas near the dam and also to verify the algorithms classification accuracy, which was compared to the results of the standard ML (Maximum Likelihood) classifier. This work presents calibration suggestions to the SVM algorithm for optimizing its results. SVM classification presented high accuracy, suggesting a good environmental situation along Lajeados dam region. Classification results comparison between SVM and ML were quite similar, but SVMs spatial contextual mapping areas were slightly better. Although environmental situation of the study area was considered good, monitoring ecosystem is important because a significant quantity of burnt areas was noticed due to local communities activities. This fact emphasized the importance of the tools discussed in this work, which helps environmental monitoring.
|
113 |
Effets masqués en analyse prédictive / Masked effects in predictive analysisBascoul, Ganaël 27 June 2013 (has links)
L’objectif de cette thèse consiste en l’élaboration de deux méthodologies visant à révéler des effets jusqu’alors masqués en modélisation décisionnelle. Dans la première partie, nous cherchons à mettre en œuvre une méthode d’analyse locale des critères de choix dans un contexte de choix binaires. Dans une seconde partie, nous mettons en avant les effets de génération dans l’étude des comportements de choix. Dans les deux parties, notre démarche de recherche combine de nouveaux outils d’analyse prédictive (Support Vector Machines, FANOVA, PLS) aux outils traditionnels de statistique inférentielle, afin d’enrichir les résultats habituels par des informations complémentaires sur les effets masqués que constituent les effets locaux dans les fonctions de choix binaires, et les effets de génération dans l’analyse temporelle des comportement de choix. Les méthodologies proposées, respectivement nommées AEL et APC-PLS, sont appliquées sur des cas réels, afin d’en illustrer le fonctionnement et la pertinence. / The objective of this thesis is the development of two methodologies to reveal previously hidden effects in decision modeling. In the first part, we try to implement a method of local analysis in order to select criteria in the context of binary choices. In a second part, we highlight the effects of generations in the study of consumer behavior. In both parts, our research approach combines new predictive analytical tools (such as Support Vector Machines, FANOVA, PLS) to traditional tools of inferential statistics, to enrich the usual results by additional on the masked effects, which are the local effects in the binary choice functions, and the effects of generation in temporal choice behavior analysis.The proposed methodologies, respectively named AEL and APC- PLS are both applied to real cases in order to illustrate their operation and relevance.
|
114 |
Uma comparação da aplicação de métodos computacionais de classificação de dados aplicados ao consumo de cinema no Brasil / A comparison of the application of data classification computational methods to the consumption of film at theaters in BrazilNieuwenhoff, Nathalia 13 April 2017 (has links)
As técnicas computacionais de aprendizagem de máquina para classificação ou categorização de dados estão sendo cada vez mais utilizadas no contexto de extração de informações ou padrões em bases de dados volumosas em variadas áreas de aplicação. Em paralelo, a aplicação destes métodos computacionais para identificação de padrões, bem como a classificação de dados relacionados ao consumo dos bens de informação é considerada uma tarefa complexa, visto que tais padrões de decisão do consumo estão relacionados com as preferências dos indivíduos e dependem de uma composição de características individuais, variáveis culturais, econômicas e sociais segregadas e agrupadas, além de ser um tópico pouco explorado no mercado brasileiro. Neste contexto, este trabalho realizou o estudo experimental a partir da aplicação do processo de Descoberta do conhecimento (KDD), o que inclui as etapas de seleção e Mineração de Dados, para um problema de classificação binária, indivíduos brasileiros que consomem e não consomem um bem de informação, filmes em salas de cinema, a partir dos dados obtidos na Pesquisa de Orçamento Familiar (POF) 2008-2009, pelo Instituto Brasileiro de Geografia e Estatística (IBGE). O estudo experimental resultou em uma análise comparativa da aplicação de duas técnicas de aprendizagem de máquina para classificação de dados, baseadas em aprendizado supervisionado, sendo estas Naïve Bayes (NB) e Support Vector Machine (SVM). Inicialmente, a revisão sistemática realizada com o objetivo de identificar estudos relacionados a aplicação de técnicas computacionais de aprendizado de máquina para classificação e identificação de padrões de consumo indica que a utilização destas técnicas neste contexto não é um tópico de pesquisa maduro e desenvolvido, visto que não foi abordado em nenhum dos trabalhos estudados. Os resultados obtidos a partir da análise comparativa realizada entre os algoritmos sugerem que a escolha dos algoritmos de aprendizagem de máquina para Classificação de Dados está diretamente relacionada a fatores como: (i) importância das classes para o problema a ser estudado; (ii) balanceamento entre as classes; (iii) universo de atributos a serem considerados em relação a quantidade e grau de importância destes para o classificador. Adicionalmente, os atributos selecionados pelo algoritmo de seleção de variáveis Information Gain sugerem que a decisão de consumo de cultura, mais especificamente do bem de informação, filmes em cinema, está fortemente relacionada a aspectos dos indivíduos relacionados a renda, nível de educação, bem como suas preferências por bens culturais / Machine learning techniques for data classification or categorization are increasingly being used for extracting information or patterns from volumous databases in various application areas. Simultaneously, the application of these computational methods to identify patterns, as well as data classification related to the consumption of information goods is considered a complex task, since such decision consumption paterns are related to the preferences of individuals and depend on a composition of individual characteristics, cultural, economic and social variables segregated and grouped, as well as being not a topic explored in the Brazilian market. In this context, this study performed an experimental study of application of the Knowledge Discovery (KDD) process, which includes data selection and data mining steps, for a binary classification problem, Brazilian individuals who consume and do not consume a information good, film at theaters in Brazil, from the microdata obtained from the Brazilian Household Budget Survey (POF), 2008-2009, performed by the Brazilian Institute of Geography and Statistics (IBGE). The experimental study resulted in a comparative analysis of the application of two machine-learning techniques for data classification, based on supervised learning, such as Naïve Bayes (NB) and Support Vector Machine (SVM). Initially, a systematic review with the objective of identifying studies related to the application of computational techniques of machine learning to classification and identification of consumption patterns indicates that the use of these techniques in this context is not a mature and developed research topic, since was not studied in any of the papers analyzed. The results obtained from the comparative analysis performed between the algorithms suggest that the choice of the machine learning algorithms for data classification is directly related to factors such as: (i) importance of the classes for the problem to be studied; (ii) balancing between classes; (iii) universe of attributes to be considered in relation to the quantity and degree of importance of these to the classifiers. In addition, the attributes selected by the Information Gain variable selection algorithm suggest that the decision to consume culture, more specifically information good, film at theaters, is directly related to aspects of individuals regarding income, educational level, as well as preferences for cultural goods
|
115 |
Wavelets, predição linear e LS-SVM aplicados na análise e classificação de sinais de vozes patológicas / Wavelets, LPC and LS-SVM applied for analysis and identification of pathological voice signalsFonseca, Everthon Silva 24 April 2008 (has links)
Neste trabalho, foram utilizadas as vantagens da ferramenta matemática de análise temporal e espectral, a transformada wavelet discreta (DWT), além dos coeficientes de predição linear (LPC) e do algoritmo de inteligência artificial, Least Squares Support Vector Machines (LS-SVM), para aplicações em análise de sinais de voz e classificação de vozes patológicas. Inúmeros trabalhos na literatura têm demonstrado o grande interesse existente por ferramentas auxiliares ao diagnóstico de patologias da laringe. Os componentes da DWT forneceram parâmetros de medida para a análise e classificação das vozes patológicas, principalmente aquelas provenientes de pacientes com edema de Reinke e nódulo nas pregas vocais. O banco de dados com as vozes patológicas foi obtido do Departamento de Otorrinolaringologia e Cirurgia de Cabeça e Pescoço do Hospital das Clínicas da Faculdade de Medicina de Ribeirão Preto (FMRP-USP). Utilizando-se o algoritmo de reconhecimento de padrões, LS-SVM, mostrou-se que a combinação dos componentes da DWT de Daubechies com o filtro LP inverso levou a um classificador de bom desempenho alcançando mais de 90% de acerto na classificação das vozes patológicas. / The main objective of this work was to use the advantages of the time-frequency analysis mathematical tool, discrete wavelet transform (DWT), besides the linear prediction coefficients (LPC) and the artificial intelligence algorithm, Least Squares Support Vector Machines (LS-SVM), for applications in voice signal analysis and classification of pathological voices. A large number of works in the literature has been shown that there is a great interest for auxiliary tools to the diagnosis of laryngeal pathologies. DWT components gave measure parameters for the analysis and classification of pathological voices, mainly that ones from patients with Reinke\'s edema and nodule in the vocal folds. It was used a data bank with pathological voices from the Otolaryngology and the Head and Neck Surgery sector of the Clinical Hospital of the Faculty of Medicine at Ribeirão Preto, University of Sao Paulo (FMRP-USP), Brazil. Using the automatic learning algorithm applied in pattern recognition problems, LS-SVM, results have showed that the combination of Daubechies\' DWT components and inverse LP filter leads to a classifier with good performance reaching more than 90% of accuracy in the classification of the pathological voices.
|
116 |
Utilização do algoritmo de aprendizado de máquinas para monitoramento de falhas em estruturas inteligentes / Use of the learning algorithm of machines for the monitoring of faults in intelligent structuresGuimarães, Ana Paula Alves [UNESP] 20 December 2016 (has links)
Submitted by ANA PAULA ALVES GUIMARÃES null (annapaulasun@gmail.com) on 2017-02-04T20:28:04Z
No. of bitstreams: 1
dissertação-final.pdf: 4630588 bytes, checksum: 8c2806b890a1b7889d8d26b4a11e97bf (MD5) / Approved for entry into archive by LUIZA DE MENEZES ROMANETTO (luizamenezes@reitoria.unesp.br) on 2017-02-07T13:18:18Z (GMT) No. of bitstreams: 1
guimaraes_apa_me_ilha.pdf: 4630588 bytes, checksum: 8c2806b890a1b7889d8d26b4a11e97bf (MD5) / Made available in DSpace on 2017-02-07T13:18:18Z (GMT). No. of bitstreams: 1
guimaraes_apa_me_ilha.pdf: 4630588 bytes, checksum: 8c2806b890a1b7889d8d26b4a11e97bf (MD5)
Previous issue date: 2016-12-20 / O monitoramento da condição estrutural é uma área que vem sendo bastante estudada por permitir a construção de sistemas que possuem a capacidade de identificar um determinado dano em seu estágio inicial, podendo assim evitar sérios prejuízos futuros. O ideal seria que estes sistemas tivessem o mínimo de interferência humana. Sistemas que abordam o conceito de aprendizagem têm a capacidade de serem autômatos. Acredita-se que por possuírem estas propriedades, os algoritmos de aprendizagem de máquina sejam uma excelente opção para realizar as etapas de identificação, localização e avaliação de um dano, com capacidade de obter resultados extremamente precisos e com taxas mínimas de erros. Este trabalho tem como foco principal utilizar o algoritmo support vector machine no auxílio do monitoramento da condição de estruturas e, com isto, obter melhor exatidão na identificação da presença ou ausência do dano, diminuindo as taxas de erros através das abordagens da aprendizagem de máquina, possibilitando, assim, um monitoramento inteligente e eficiente. Foi utilizada a biblioteca LibSVM para análise e validação da proposta. Desta forma, foi possível realizar o treinamento e classificação dos dados promovendo a identificação dos danos e posteriormente, empregando as predições efetuadas pelo algoritmo, foi possível determinar a localização dos danos na estrutura. Os resultados de identificação e localização dos danos foram bastante satisfatórios. / Structural health monitoring (SHM) is an area that has been extensively studied for allowing the construction of systems that have the ability to identify damages at an early stage, thus being able to avoid serious future losses. Ideally, these systems have the minimum of human interference. Systems that address the concept of learning have the ability to be autonomous. It is believed that by having these properties, the machine learning algorithms are an excellent choice to perform the steps of identifying, locating and assessing damage with ability to obtain highly accurate results with minimum error rates. This work is mainly focused on using support vector machine algorithm for monitoring structural condition and, thus, get better accuracy in identifying the presence or absence of damage, reducing error rates through the approaches of machine learning. It allows an intelligent and efficient monitoring system. LIBSVM library was used for analysing and validation of the proposed approach. Thus, it was feasible to conduct training and classification of data promoting the identification of damages. It was also possible to locate the damages in the structure. The results of identification and location of the damage was quite satisfactory.
|
117 |
Uma comparação da aplicação de métodos computacionais de classificação de dados aplicados ao consumo de cinema no Brasil / A comparison of the application of data classification computational methods to the consumption of film at theaters in BrazilNathalia Nieuwenhoff 13 April 2017 (has links)
As técnicas computacionais de aprendizagem de máquina para classificação ou categorização de dados estão sendo cada vez mais utilizadas no contexto de extração de informações ou padrões em bases de dados volumosas em variadas áreas de aplicação. Em paralelo, a aplicação destes métodos computacionais para identificação de padrões, bem como a classificação de dados relacionados ao consumo dos bens de informação é considerada uma tarefa complexa, visto que tais padrões de decisão do consumo estão relacionados com as preferências dos indivíduos e dependem de uma composição de características individuais, variáveis culturais, econômicas e sociais segregadas e agrupadas, além de ser um tópico pouco explorado no mercado brasileiro. Neste contexto, este trabalho realizou o estudo experimental a partir da aplicação do processo de Descoberta do conhecimento (KDD), o que inclui as etapas de seleção e Mineração de Dados, para um problema de classificação binária, indivíduos brasileiros que consomem e não consomem um bem de informação, filmes em salas de cinema, a partir dos dados obtidos na Pesquisa de Orçamento Familiar (POF) 2008-2009, pelo Instituto Brasileiro de Geografia e Estatística (IBGE). O estudo experimental resultou em uma análise comparativa da aplicação de duas técnicas de aprendizagem de máquina para classificação de dados, baseadas em aprendizado supervisionado, sendo estas Naïve Bayes (NB) e Support Vector Machine (SVM). Inicialmente, a revisão sistemática realizada com o objetivo de identificar estudos relacionados a aplicação de técnicas computacionais de aprendizado de máquina para classificação e identificação de padrões de consumo indica que a utilização destas técnicas neste contexto não é um tópico de pesquisa maduro e desenvolvido, visto que não foi abordado em nenhum dos trabalhos estudados. Os resultados obtidos a partir da análise comparativa realizada entre os algoritmos sugerem que a escolha dos algoritmos de aprendizagem de máquina para Classificação de Dados está diretamente relacionada a fatores como: (i) importância das classes para o problema a ser estudado; (ii) balanceamento entre as classes; (iii) universo de atributos a serem considerados em relação a quantidade e grau de importância destes para o classificador. Adicionalmente, os atributos selecionados pelo algoritmo de seleção de variáveis Information Gain sugerem que a decisão de consumo de cultura, mais especificamente do bem de informação, filmes em cinema, está fortemente relacionada a aspectos dos indivíduos relacionados a renda, nível de educação, bem como suas preferências por bens culturais / Machine learning techniques for data classification or categorization are increasingly being used for extracting information or patterns from volumous databases in various application areas. Simultaneously, the application of these computational methods to identify patterns, as well as data classification related to the consumption of information goods is considered a complex task, since such decision consumption paterns are related to the preferences of individuals and depend on a composition of individual characteristics, cultural, economic and social variables segregated and grouped, as well as being not a topic explored in the Brazilian market. In this context, this study performed an experimental study of application of the Knowledge Discovery (KDD) process, which includes data selection and data mining steps, for a binary classification problem, Brazilian individuals who consume and do not consume a information good, film at theaters in Brazil, from the microdata obtained from the Brazilian Household Budget Survey (POF), 2008-2009, performed by the Brazilian Institute of Geography and Statistics (IBGE). The experimental study resulted in a comparative analysis of the application of two machine-learning techniques for data classification, based on supervised learning, such as Naïve Bayes (NB) and Support Vector Machine (SVM). Initially, a systematic review with the objective of identifying studies related to the application of computational techniques of machine learning to classification and identification of consumption patterns indicates that the use of these techniques in this context is not a mature and developed research topic, since was not studied in any of the papers analyzed. The results obtained from the comparative analysis performed between the algorithms suggest that the choice of the machine learning algorithms for data classification is directly related to factors such as: (i) importance of the classes for the problem to be studied; (ii) balancing between classes; (iii) universe of attributes to be considered in relation to the quantity and degree of importance of these to the classifiers. In addition, the attributes selected by the Information Gain variable selection algorithm suggest that the decision to consume culture, more specifically information good, film at theaters, is directly related to aspects of individuals regarding income, educational level, as well as preferences for cultural goods
|
118 |
Reconnaissance des sons de l’environnement dans un contexte domotique / Environmental sounds recognition in a domotic contextSehili, Mohamed el Amine 05 July 2013 (has links)
Dans beaucoup de pays du monde, on observe une importante augmentation du nombre de personnes âgées vivant seules. Depuis quelques années, un nombre significatif de projets de recherche sur l’assistance aux personnes âgées ont vu le jour. La plupart de ces projets utilisent plusieurs modalités (vidéo, son, détection de chute, etc.) pour surveiller l'activité de la personne et lui permettre de communiquer naturellement avec sa maison "intelligente", et, en cas de danger, lui venir en aide au plus vite. Ce travail a été réalisé dans le cadre du projet ANR VERSO de recherche industrielle, Sweet-Home. Les objectifs du projet sont de proposer un système domotique permettant une interaction naturelle (par commande vocale et tactile) avec la maison, et procurant plus de sécurité à l'habitant par la détection des situations de détresse. Dans ce cadre, l'objectif de ce travail est de proposer des solutions pour la reconnaissance des sons de la vie courante dans un contexte réaliste. La reconnaissance du son fonctionnera en amont d'un système de Reconnaissance Automatique de la Parole. Les performances de celui-ci dépendent donc de la fiabilité de la séparation entre la parole et les autres sons. Par ailleurs, une bonne reconnaissance de certains sons, complétée par d'autres sources informations (détection de présence, détection de chute, etc.) permettrait de bien suivre les activités de la personne et de détecter ainsi les situations de danger. Dans un premier temps, nous nous sommes intéressés aux méthodes en provenance de la Reconnaissance et Vérification du Locuteur. Dans cet esprit, nous avons testé des méthodes basées sur GMM et SVM. Nous avons, en particulier, testé le noyau SVM-GSL (SVM GMM Supervector Linear Kernel) utilisé pour la classification de séquences. SVM-GSL est une combinaison de SVM et GMM et consiste à transformer une séquence de vecteurs de longueur arbitraire en un seul vecteur de très grande taille, appelé Super Vecteur, et utilisé en entrée d'un SVM. Les expérimentations ont été menées en utilisant une base de données créée localement (18 classes de sons, plus de 1000 enregistrements), puis le corpus du projet Sweet-Home, en intégrant notre système dans un système plus complet incluant la détection multi-canaux du son et la reconnaissance de la parole. Ces premières expérimentations ont toutes été réalisées en utilisant un seul type de coefficients acoustiques, les MFCC. Par la suite, nous nous sommes penchés sur l'étude d'autres familles de coefficients en vue d'en évaluer l'utilisabilité en reconnaissance des sons de l'environnement. Notre motivation fut de trouver des représentations plus simples et/ou plus efficaces que les MFCC. En utilisant 15 familles différentes de coefficients, nous avons également expérimenté deux approches pour transformer une séquence de vecteurs en un seul vecteur, à utiliser avec un SVM linéaire. Dans le première approche, on calcule un nombre fixe de coefficients statistiques qui remplaceront toute la séquence de vecteurs. La seconde approche (une des contributions de ce travail) utilise une méthode de discrétisation pour trouver, pour chaque caractéristique d'un vecteur acoustique, les meilleurs points de découpage permettant d'associer une classe donnée à un ou plusieurs intervalles de valeurs. La probabilité de la séquence est estimée par rapport à chaque intervalle. Les probabilités obtenues ainsi sont utilisées pour construire un seul vecteur qui remplacera la séquence de vecteurs acoustiques. Les résultats obtenus montrent que certaines familles de coefficients sont effectivement plus adaptées pour reconnaître certaines classes de sons. En effet, pour la plupart des classes, les meilleurs taux de reconnaissance ont été observés avec une ou plusieurs familles de coefficients différentes des MFCC. Certaines familles sont, de surcroît, moins complexes et comptent une seule caractéristique par fenêtre d'analyse contre 16 caractéristiques pour les MFCC / In many countries around the world, the number of elderly people living alone has been increasing. In the last few years, a significant number of research projects on elderly people monitoring have been launched. Most of them make use of several modalities such as video streams, sound, fall detection and so on, in order to monitor the activities of an elderly person, to supply them with a natural way to communicate with their “smart-home”, and to render assistance in case of an emergency. This work is part of the Industrial Research ANR VERSO project, Sweet-Home. The goals of the project are to propose a domotic system that enables a natural interaction (using touch and voice command) between an elderly person and their house and to provide them a higher safety level through the detection of distress situations. Thus, the goal of this work is to come up with solutions for sound recognition of daily life in a realistic context. Sound recognition will run prior to an Automatic Speech Recognition system. Therefore, the speech recognition’s performances rely on the reliability of the speech/non-speech separation. Furthermore, a good recognition of a few kinds of sounds, complemented by other sources of information (presence detection, fall detection, etc.) could allow for a better monitoring of the person's activities that leads to a better detection of dangerous situations. We first had been interested in methods from the Speaker Recognition and Verification field. As part of this, we have experimented methods based on GMM and SVM. We had particularly tested a Sequence Discriminant SVM kernel called SVM-GSL (SVM GMM Super Vector Linear Kernel). SVM-GSL is a combination of GMM and SVM whose basic idea is to map a sequence of vectors of an arbitrary length into one high dimensional vector called a Super Vector and used as an input of an SVM. Experiments had been carried out using a locally created sound database (containing 18 sound classes for over 1000 records), then using the Sweet-Home project's corpus. Our daily sounds recognition system was integrated into a more complete system that also performs a multi-channel sound detection and speech recognition. These first experiments had all been performed using one kind of acoustical coefficients, MFCC coefficients. Thereafter, we focused on the study of other families of acoustical coefficients. The aim of this study was to assess the usability of other acoustical coefficients for environmental sounds recognition. Our motivation was to find a few representations that are simpler and/or more effective than the MFCC coefficients. Using 15 different acoustical coefficients families, we have also experimented two approaches to map a sequence of vectors into one vector, usable with a linear SVM. The first approach consists of computing a set of a fixed number of statistical coefficients and use them instead of the whole sequence. The second one, which is one of the novel contributions of this work, makes use of a discretization method to find, for each feature within an acoustical vector, the best cut points that associates a given class with one or many intervals of values. The likelihood of the sequence is estimated for each interval. The obtained likelihood values are used to build one single vector that replaces the sequence of acoustical vectors. The obtained results show that a few families of coefficients are actually more appropriate to the recognition of some sound classes. For most sound classes, we noticed that the best recognition performances were obtained with one or many families other than MFCC. Moreover, a number of these families are less complex than MFCC. They are actually a one-feature per frame acoustical families, whereas MFCC coefficients contain 16 features per frame
|
119 |
Efficient Kernel Methods For Large Scale ClassificationAsharaf, S 07 1900 (has links)
Classification algorithms have been widely used in many application domains. Most of these domains deal with massive collection of data and hence demand classification algorithms that scale well with the size of the data sets involved. A classification algorithm is said to be scalable if there is no significant increase in time and space requirements for the algorithm (without compromising the generalization performance) when dealing with an increase in the training set size. Support Vector Machine (SVM) is one of the most celebrated kernel based classification methods used in Machine Learning. An SVM capable of handling large scale classification problems will definitely be an ideal candidate in many real world applications. The training process involved in SVM classifier is usually formulated as a Quadratic Programing(QP) problem. The existing solution strategies for this problem have an associated time and space complexity that is (at least) quadratic in the number of training points. This makes the SVM training very expensive even on classification problems having a few thousands of training examples.
This thesis addresses the scalability of the training algorithms involved in both two class and multiclass Support Vector Machines. Efficient training schemes reducing the space and time requirements of the SVM training process are proposed as possible solutions. The classification schemes discussed in the thesis for handling large scale two class classification problems are a) Two selective sampling based training schemes for scaling Non-linear SVM and b) Clustering based approaches for handling unbalanced data sets with Core Vector Machine. To handle large scale multicalss classification problems, the thesis proposes Multiclass Core Vector Machine (MCVM), a scalable SVM based multiclass classifier. In MVCM, the multiclass SVM problem is shown to be equivalent to a Minimum Enclosing Ball (MEB) problem and is then solved using a fast approximate MEB finding algorithm. Experimental studies were done with several large real world data sets such as IJCNN1 and Acoustic data sets from LIBSVM page, Extended USPS data set from CVM page and network intrusion detection data sets of DARPA, US Defense used in KDD 99 contest. From the empirical results it is observed that the proposed classification schemes achieve good generalization performance at low time and space requirements. Further, the scalability experiments done with large training data sets have demonstrated that the proposed schemes scale well. A novel soft clustering scheme called Rough Support Vector Clustering (RSVC) employing the idea of Soft Minimum Enclosing Ball Problem (SMEB) is another contribution discussed in this thesis. Experiments done with a synthetic data set and the real world data set namely IRIS, have shown that RSVC finds meaningful soft cluster abstractions.
|
120 |
Development of an Innovative System for the Reconstruction of New Generation Satellite ImagesLORENZI, Luca 29 November 2012 (has links) (PDF)
Les satellites de télédétection sont devenus incontournables pour la société civile. En effet, les images satellites ont été exploitées avec succès pour traiter plusieurs applications, notamment la surveillance de l'environnement et de la prévention des catastrophes naturelles. Dans les dernières années, l'augmentation de la disponibilité de très haute résolution spatiale (THR) d'images de télédétection abouti à de nouvelles applications potentiellement pertinentes liées au suivi d'utilisation des sols et à la gestion environnementale. Cependant, les capteurs optiques, en raison du fait qu'ils acquièrent directement la lumière réfléchie par le soleil, ils peuvent souffrir de la présence de nuages dans le ciel et / ou d'ombres sur la terre. Il s'agit du problème des données manquantes, qui induit un problème important et crucial, en particulier dans le cas des images THR, où l'augmentation des détails géométriques induit une grande perte d'informations. Dans cette thèse, de nouvelles méthodologies de détection et de reconstruction de la région contenant des données manquantes dans les images THR sont proposées et appliquées sur les zones contaminées par la présence de nuages et / ou d'ombres. En particulier, les contributions méthodologiques proposées comprennent: i) une stratégie multirésolution d'inpainting visant à reconstruire les images contaminées par des nuages ; ii) une nouvelle combinaison d'information radiométrique et des informations de position spatiale dans deux noyaux spécifiques pour effectuer une meilleure reconstitution des régions contaminés par les nuages en adoptant une régression par méthode a vecteurs supports (RMVS) ; iii) l'exploitation de la théorie de l'échantillonnage compressé avec trois stratégies différentes (orthogonal matching pursuit, basis pursuit et une solution d'échantillonnage compressé, basé sur un algorithme génétique) pour la reconstruction d'images contaminés par des nuages; iv) une chaîne de traitement complète qui utilise une méthode à vecteurs de supports (SVM) pour la classification et la détection des zones d'ombre, puis une régression linéaire pour la reconstruction de ces zones, et enfin v) plusieurs critères d'évaluation promptes à évaluer la performance de reconstruction des zones d'ombre. Toutes ces méthodes ont été spécialement développées pour fonctionner avec des images très haute résolution. Les résultats expérimentaux menés sur des données réelles sont présentés afin de montrer et de confirmer la validité de toutes les méthodes proposées. Ils suggèrent que, malgré la complexité des problèmes, il est possible de récupérer de façon acceptable les zones manquantes masquées par les nuages ou rendues erronées les ombres.
|
Page generated in 0.0648 seconds