Global ETD Search

351	Reconnaissance des émotions par traitement d’images / Emotions recognition based on image processing Gharsalli, Sonia 12 July 2016 (has links) La reconnaissance des émotions est l'un des domaines scientifiques les plus complexes. Ces dernières années, de plus en plus d'applications tentent de l'automatiser. Ces applications innovantes concernent plusieurs domaines comme l'aide aux enfants autistes, les jeux vidéo, l'interaction homme-machine. Les émotions sont véhiculées par plusieurs canaux. Nous traitons dans notre recherche les expressions émotionnelles faciales en s'intéressant spécifiquement aux six émotions de base à savoir la joie, la colère, la peur, le dégoût, la tristesse et la surprise. Une étude comparative de deux méthodes de reconnaissance des émotions l'une basée sur les descripteurs géométriques et l'autre basée sur les descripteurs d'apparence est effectuée sur la base CK+, base d'émotions simulées, et la base FEEDTUM, base d'émotions spontanées. Différentes contraintes telles que le changement de résolution, le nombre limité d'images labélisées dans les bases d'émotions, la reconnaissance de nouveaux sujets non inclus dans la base d'apprentissage sont également prises en compte. Une évaluation de différents schémas de fusion est ensuite réalisée lorsque de nouveaux cas, non inclus dans l'ensemble d'apprentissage, sont considérés. Les résultats obtenus sont prometteurs pour les émotions simulées (ils dépassent 86%), mais restent insuffisant pour les émotions spontanées. Nous avons appliqué également une étude sur des zones locales du visage, ce qui nous a permis de développer des méthodes hybrides par zone. Ces dernières améliorent les taux de reconnaissance des émotions spontanées. Finalement, nous avons développé une méthode de sélection des descripteurs d'apparence basée sur le taux d'importance que nous avons comparée avec d'autres méthodes de sélection. La méthode de sélection proposée permet d'améliorer le taux de reconnaissance par rapport aux résultats obtenus par deux méthodes reprises de la littérature. / Emotion recognition is one of the most complex scientific domains. In the last few years, various emotion recognition systems are developed. These innovative applications are applied in different domains such as autistic children, video games, human-machine interaction… Different channels are used to express emotions. We focus on facial emotion recognition specially the six basic emotions namely happiness, anger, fear, disgust, sadness and surprise. A comparative study between geometric method and appearance method is performed on CK+ database as the posed emotion database, and FEEDTUM database as the spontaneous emotion database. We consider different constraints in this study such as different image resolutions, the low number of labelled images in learning step and new subjects. We evaluate afterward various fusion schemes on new subjects, not included in the training set. Good recognition rate is obtained for posed emotions (more than 86%), however it is still low for spontaneous emotions. Based on local feature study, we develop local features fusion methods. These ones increase spontaneous emotions recognition rates. A feature selection method is finally developed based on features importance scores. Compared with two methods, our developed approach increases the recognition rate. Reconnaissance des émotions Emotions simulées Emotions spontanées Sélection des descripteurs Fusion des descripteurs Emotion recognition Posed emotions Spontaneous emotions Feature selection Feature fusion 629.8
352	Classificação da marcha em parkinsonianos: análise dos algoritmos de aprendizagem supervisionada / Classification of the parkinsonian gait: analysis of supervised learning algorithms Souza, Hugo Araújo 12 April 2017 (has links) Parkinson’s disease is the second most prevalent neurodegenerative disease in the elderly, although its dominance and incidence vary according to age, gender and race/ethnicity. Studies indicate that the prevalence increases with age, with an estimate of 5 to 26 cases per 100,000 people per year, being approximately 1% among individuals aged 65- 69 and ranging from 3% to 14.3% among the elderly over 85 years. The most common clinical signs in the inflammatory process include the presence of resting tremor, muscle stiffness, bradykinesia and postural instability. The diagnosis of the disease is not a simple task, as it is known that there are stages patterns of disease progression in the human organism. However, many patients do not follow this progress because of the heterogeneity of manifestations that may arise. The gait analysis has become an attractive and non-invasive quantitative mechanism that can aid in the detection and monitoring of PD patients. Feature extraction is a very important task for quality of the data to be used by the algorithms, aiming as main objective the reduction in the dimensionality of the data in a classification process. From the reduction of dimensionality it is possible to identify which attributes are important and to facilitate the visualization of the data. For data related to human gait, the purpose is to detect relevant attributes that may help in identifying gait cycle phases, such as support and swing phases, cadence, stride length, velocity, etc. To do this, it is necessary to identify and select which attributes are most relevant, as well as the classification method. This work evaluates the performance of supervised learning algorithms in the classification of human gait characteristics in an open database, also identifies which attributes are most relevant to the performance of the classifiers in aiding the identification of gait characteristics in PD patients. / A Doença de Parkinson é a segunda doença neurodegenerativa mais prevalente em idosos, embora seu domínio e incidência variem de acordo com a idade, sexo e raça/etnia. Estudos apontam que a prevalência aumenta com a idade, tendo estimativa de 5 a 26 casos a cada 100 mil pessoas por ano, sendo de aproximadamente 1% entre os indivíduos de 65 a 69 anos e, variando de 3% a 14,3% entre os idosos acima de 85 anos. Os sinais clínicos mais comuns no processo inflamatório incluem a presença de tremor em repouso, rigidez muscular, bradicinesia e instabilidade postural. O diagnóstico da doença não é uma tarefa simples, pois sabe-se que há padrões de estágios no avanço da doença no organismo humano. Porém, muitos pacientes não seguem esse progresso devido a heterogeneidade de manifestações que podem surgir. A análise da marcha tornou-se um mecanismo quantitativo atrativo e não invasivo que pode auxiliar na detecção e monitoramento de portadores de DP. A extração de características é uma tarefa de suma importância para a qualidade dos dados a serem empregados pelos algoritmos de AM, visando como principal objetivo a redução na dimensionalidade dos dados em um processo de classificação. A partir da redução da dimensionalidade é possível identificar, principalmente, quais atributos são importantes e facilitar a visualização dos dados. Para dados relacionados à marcha humana, o propósito é detectar relevantes atributos que possam ajudar na identificação das fases do ciclo da marcha, como as fases de apoio e swing, cadência, comprimento da passada, velocidade, entre outras. Para tal, é preciso identificar e selecionar quais atributos são mais relevantes, assim como o método de classificação. Este trabalho avalia o desempenho de algoritmos de aprendizagem supervisionada na classificação das características da marcha humana em uma base de dados aberta, também identifica quais atributos são mais relevantes para o desempenho dos classificadores no auxílio à identificação de características da marcha em portadores da DP. Aprendizagem supervisionada - Algoritmos Classificação de dados Seleção de atributos Marcha humana Doença de Parkinson Machine learning Data classification Feature selection Human gait Parkinson disease
353	Seleção de atributos para classificação de textos usando técnicas baseadas em agrupamento, PoS tagging e algoritmos evolutivos Ferreira, Charles Henrique Porto January 2016 (has links) Orientadora: Profa. Dra. Debora Maria Rossi de Medeiros / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Ciência da Computação, 2016. / Neste trabalho são investigadas técnicas de seleção de atributos para serem aplicadas à tarefa de classificação de textos. Três técnicas diferentes são propostas para serem comparadas com as técnicas tradicionais de pré-processamento de textos. A primeira técnica propõe que nem todas as classes gramaticais de um dado idioma sejam relevantes em um texto quando este é submetido à tarefa de classificação. A segunda técnica emprega agrupamento de atributos e algoritmos genéticos para seleção de grupos. Na terceira técnica são levantadas 2 hipóteses: a primeira supõe que as palavras que ocorrem com mais frequência em uma base de textos do que no idioma como um todo, podem ser as palavras mais importantes para comporem os atributos; a segunda presume que as relações de cada instância de dados com cada classe pode compor um novo conjunto de atributos. Os resultados obtidos sugerem que as abordagens propostas são promissoras e que as hipóteses levantadas podem ser válidas. Os experimentos com a primeira abordagem mostram que existe um conjunto de classes gramaticais cujas palavras podem ser desconsideradas do conjunto de atributos em bases de textos diferentes mantendo ou até melhorando a acurácia de classificação. A segunda abordagem consegue realizar uma forte redução no número de atributos original e ainda melhorar a acurácia de classificação. Com a terceira abordagem, foi obtida a redução mais acentuada do número de atributos pois, pela natureza da proposta, o número final de atributos é igual ao número de classes da base, e o impacto na acurácia foi nulo ou até positivo. / This work investigates feature selection techniques to be applied to text classification task. Three different techniques are proposed to be compared with the traditional techniques of preprocessing texts. The first technique proposed that not all grammatical classes of a given language in a text are relevant when it is subjected to the classification task. The second technique employs clustering features and genetic algorithms for selecting groups. In the third technique are raised two hypotheses: the first assumes that the words that occur most often on the dataset than the language as a whole, may be the most important words to compose the features; the second assumes that the relationship of each data instance with each class can compose a new set of attributes. The results suggest that the proposed approaches are promising and that the hypotheses may be valid. The experiments show that the first approach is a set of grammatical word classes which can be disregarded from the set of features from different datasets maintaining or even improving the accuracy of classification. The second approach can achieve a significant reduction in the number of unique features and to improve the accuracy of classification. With the third approach, it obtained the more pronounced reduction in the number of features because, by the nature of the proposal, the final number offeatures is equal to the number of classes of the dataset, and the impact on the accuracy was zero or even positive. Seleção de Atributos PROCESSAMENTO DE LINGUAGEM NATURAL CLASSIFICAÇÃO DE TEXTOS FEATURE SELECTION NATURAL LANGUAGE PROCESSING TEXT MINING
354	Image-based detection and classification of allergenic pollen / Détection et classification des pollens allergisants basée sur l'image Lozano Vega, Gildardo 18 June 2015 (has links) Le traitement médical des allergies nécessite la caractérisation des pollens en suspension dans l’air. Toutefois, cette tâche requiert des temps d’analyse très longs lorsqu’elle est réalisée de manière manuelle. Une approche automatique améliorerait ainsi considérablement les applications potentielles du comptage de pollens. Les dernières techniques d’analyse d’images permettent la détection de caractéristiques discriminantes. C’est pourquoi nous proposons dans cette thèse un ensemble de caractéristiques pertinentes issues d’images pour la reconnaissance des principales classes de pollen allergènes. Le cœur de notre étude est l’évaluation de groupes de caractéristiques capables de décrire correctement les pollens en termes de forme, texture, taille et ouverture. Les caractéristiques sont extraites d’images acquises classiquement sous microscope, permettant la reproductibilité de la méthode. Une étape de sélection des caractéristiques est appliquée à chaque groupe pour évaluer sa pertinence.Concernant les apertures présentes sur certains pollens, une méthode adaptative de détection, localisation et comptage pour différentes classes de pollens avec des apparences variées est proposée. La description des apertures se base sur une stratégie de type Sac-de-Mots appliquée à des primitives issues des images. Une carte de confiance est construite à partir de la confiance donnée à la classification des régions de l’image échantillonnée. De cette carte sont extraites des caractéristiques propres aux apertures, permettant leur comptage. La méthode est conçue pour être étendue de façon modulable à de nouveaux types d’apertures en utilisant le même algorithme mais avec un classifieur spécifique.Les groupes de caractéristiques ont été testés individuellement et conjointement sur les classes de pollens les plus répandues en Allemagne. Nous avons montré leur efficacité lors d’une classification de type SVM, notamment en surpassant la variance intra-classe et la similarité inter-classe. Les résultats obtenus en utilisant conjointement tous les groupes de caractéristiques ont abouti à une précision de 98,2 %, comparable à l’état de l’art. / The correct classification of airborne pollen is relevant for medical treatment of allergies, and the regular manual process is costly and time consuming. An automatic processing would increase considerably the potential of pollen counting. Modern computer vision techniques enable the detection of discriminant pollen characteristics. In this thesis, a set of relevant image-based features for the recognition of top allergenic pollen taxa is proposed and analyzed. The foundation of our proposal is the evaluation of groups of features that can properly describe pollen in terms of shape, texture, size and apertures. The features are extracted on typical brightfield microscope images that enable the easy reproducibility of the method. A process of feature selection is applied to each group for the determination of relevance.Regarding apertures, a flexible method for detection, localization and counting of apertures of different pollen taxa with varying appearances is proposed. Aperture description is based on primitive images following the Bag-of-Words strategy. A confidence map is built from the classification confidence of sampled regions. From this map, aperture features are extracted, which include the count of apertures. The method is designed to be extended modularly to new aperture types employing the same algorithm to build individual classifiers.The feature groups are tested individually and jointly on of the most allergenic pollen taxa in Germany. They demonstrated to overcome the intra-class variance and inter-class similarity in a SVM classification scheme. The global joint test led to accuracy of 98.2%, comparable to the state-of-the-art procedures. Reconnaissance de formes Classification Extraction de caractéristiques Sélection de caractéristiques Extraction d’objets Sac-de-mots Palynologie Apertures Pattern recognition Classification Feature extraction Feature selection Object extraction Bag of words Palynology Apertures 006.4
355	Optimisation du test de production de circuits analogiques et RF par des techniques de modélisation statistique / Optimisation of the production test of analog and RF circuit using statistical modeling techniques Akkouche, Nourredine 09 September 2011 (has links) La part dû au test dans le coût de conception et de fabrication des circuits intégrés ne cesse de croître, d'où la nécessité d'optimiser cette étape devenue incontournable. Dans cette thèse, de nouvelles méthodes d'ordonnancement et de réduction du nombre de tests à effectuer sont proposées. La solution est un ordre des tests permettant de détecter au plus tôt les circuits défectueux, qui pourra aussi être utilisé pour éliminer les tests redondants. Ces méthodes de test sont basées sur la modélisation statistique du circuit sous test. Cette modélisation inclus plusieurs modèles paramétriques et non paramétrique permettant de s'adapté à tous les types de circuit. Une fois le modèle validé, les méthodes de test proposées génèrent un grand échantillon contenant des circuits défectueux. Ces derniers permettent une meilleure estimation des métriques de test, en particulier le taux de défauts. Sur la base de cette erreur, un ordonnancement des tests est construit en maximisant la détection des circuits défectueux au plus tôt. Avec peu de tests, la méthode de sélection et d'évaluation est utilisée pour obtenir l'ordre optimal des tests. Toutefois, avec des circuits contenant un grand nombre de tests, des heuristiques comme la méthode de décomposition, les algorithmes génétiques ou les méthodes de la recherche flottante sont utilisées pour approcher la solution optimale. / The share of test in the cost of design and manufacture of integrated circuits continues to grow, hence the need to optimize this step. In this thesis, new methods of test scheduling and reducing the number of tests are proposed. The solution is a sequence of tests for early identification of faulty circuits, which can also be used to eliminate redundant tests. These test methods are based on statistical modeling of the circuit under test. This model included several parametric and non-parametric models to adapt to all types of circuit. Once the model is validated, the suggested test methods generate a large sample containing defective circuits. These allow a better estimation of test metrics, particularly the defect level. Based on this error, a test scheduling is constructed by maximizing the detection of faulty circuits. With few tests, the Branch and Bound method is used to obtain the optimal order of tests. However, with circuits containing a large number of tests, heuristics such as decomposition method, genetic algorithms or floating search methods are used to approach the optimal solution. Circuit analogique et RF Test fonctionnel Fautes paramétriques Modélisation statistique Métriques de test Algorithme de recherche Analog and RF circuit Functional test Parametric faults Statistical modeling Test metrics Feature Selection Algorithm
356	Word Confidence Estimation and Its Applications in Statistical Machine Translation / Les mesures de confiance au niveau des mots et leurs applications pour la traduction automatique statistique Luong, Ngoc Quang 12 November 2014 (has links) Les systèmes de traduction automatique (TA), qui génèrent automatiquement la phrase de la langue cible pour chaque entrée de la langue source, ont obtenu plusieurs réalisations convaincantes pendant les dernières décennies et deviennent les aides linguistiques efficaces pour la communauté entière dans un monde globalisé. Néanmoins, en raison de différents facteurs, sa qualité en général est encore loin de la perfection, constituant le désir des utilisateurs de savoir le niveau de confiance qu'ils peuvent mettre sur une traduction spécifique. La construction d'une méthode qui est capable d'indiquer des bonnes parties ainsi que d'identifier des erreurs de la traduction est absolument une bénéfice pour non seulement les utilisateurs, mais aussi les traducteurs, post-éditeurs, et les systèmes de TA eux-mêmes. Nous appelons cette méthode les mesures de confiance (MC). Cette thèse se porte principalement sur les méthodes des MC au niveau des mots (MCM). Le système de MCM assigne à chaque mot de la phrase cible un étiquette de qualité. Aujourd'hui, les MCM jouent un rôle croissant dans nombreux aspects de TA. Tout d'abord, elles aident les post-éditeurs d'identifier rapidement les erreurs dans la traduction et donc d'améliorer leur productivité de travail. De plus, elles informent les lecteurs des portions qui ne sont pas fiables pour éviter leur malentendu sur le contenu de la phrase. Troisièmement, elles sélectionnent la meilleure traduction parmi les sorties de plusieurs systèmes de TA. Finalement, et ce qui n'est pas le moins important, les scores MCM peuvent aider à perfectionner la qualité de TA via certains scénarios: ré-ordonnance des listes N-best, ré-décodage du graphique de la recherche, etc. Dans cette thèse, nous visons à renforcer et optimiser notre système de MCM, puis à l'exploiter pour améliorer TA ainsi que les mesures de confiance au niveau des phrases (MCP). Comparer avec les approches précédentes, nos nouvelles contributions étalent sur les points principaux comme suivants. Tout d'abord, nous intégrons différents types des paramètres: ceux qui sont extraits du système TA, avec des caractéristiques lexicales, syntaxiques et sémantiques pour construire le système MCM de base. L'application de différents méthodes d'apprentissage nous permet d'identifier la meilleure (méthode: "Champs conditionnels aléatoires") qui convient le plus nos donnés. En suite, l'efficacité de touts les paramètres est plus profond examinée en utilisant un algorithme heuristique de sélection des paramètres. Troisièmement, nous exploitons l'algorithme Boosting comme notre méthode d'apprentissage afin de renforcer la contribution des sous-ensembles des paramètres dominants du système MCM, et en conséquence d'améliorer la capacité de prédiction du système MCM. En outre, nous enquérons les contributions des MCM vers l'amélioration de la qualité de TA via différents scénarios. Dans le re-ordonnance des liste N-best, nous synthétisons les scores à partir des sorties du système MCM et puis les intégrons avec les autres scores du décodeur afin de recalculer la valeur de la fonction objective, qui nous permet d'obtenir un mieux candidat. D'ailleurs, dans le ré-décodage du graphique de la recherche, nous appliquons des scores de MCM directement aux noeuds contenant chaque mot pour mettre à jour leurs coûts. Une fois la mise à jour se termine, la recherche pour meilleur chemin sur le nouveau graphique nous donne la nouvelle hypothèse de TA. Finalement, les scores de MCM sont aussi utilisés pour renforcer les performances des systèmes de MCP. Au total, notre travail apporte une image perspicace et multidimensionnelle sur des MCM et leurs impacts positifs sur différents secteurs de la TA. Les résultats très prometteurs ouvrent une grande avenue où MCM peuvent exprimer leur rôle, comme: MCM pour la reconnaissance automatique de la parole (RAP), pour la sélection parmi plusieurs systèmes de TA, et pour les systèmes de TA auto-apprentissage. / Machine Translation (MT) systems, which generate automatically the translation of a target language for each source sentence, have achieved impressive gains during the recent decades and are now becoming the effective language assistances for the entire community in a globalized world. Nonetheless, due to various factors, MT quality is still not perfect in general, and the end users therefore expect to know how much should they trust a specific translation. Building a method that is capable of pointing out the correct parts, detecting the translation errors and concluding the overall quality of each MT hypothesis is definitely beneficial for not only the end users, but also for the translators, post-editors, and MT systems themselves. Such method is widely known under the name Confidence Estimation (CE) or Quality Estimation (QE). The motivations of building such automatic estimation methods originate from the actual drawbacks of assessing manually the MT quality: this task is time consuming, effort costly, and sometimes impossible in case where the readers have little or no knowledge of the source language. This thesis mostly focuses on the CE methods at word level (WCE). The WCE classifier tags each word in the MT output a quality label. The WCE working mechanism is straightforward: a classifier trained beforehand by a number of features using ML methods computes the confidence score of each label for each MT output word, then tag this word with highest score label. Nowadays, WCE shows an increasing importance in many aspects of MT. Firstly, it assists the post-editors to quickly identify the translation errors, hence improve their productivity. Secondly, it informs readers of portions of sentence that are not reliable to avoid the misunderstanding about the sentence's content. Thirdly, it selects the best translation among options from multiple MT systems. Last but not least, WCE scores can help to improve the MT quality via some scenarios: N-best list re-ranking, Search Graph Re-decoding, etc. In this thesis, we aim at building and optimizing our baseline WCE system, then exploiting it to improve MT and Sentence Confidence Estimation (SCE). Compare to the previous approaches, our novel contributions spread of these following main points. Firstly, we integrate various types of prediction indicators: system-based features extracted from the MT system, together with lexical, syntactic and semantic features to build the baseline WCE systems. We also apply multiple Machine Learning (ML) models on the entire feature set and then compare their performances to select the optimal one to optimize. Secondly, the usefulness of all features is deeper investigated using a greedy feature selection algorithm. Thirdly, we propose a solution that exploits Boosting algorithm as a learning method in order to strengthen the contribution of dominant feature subsets to the system, thus improve of the system's prediction capability. Lastly, we explore the contributions of WCE in improving MT quality via some scenarios. In N-best list re-ranking, we synthesize scores from WCE outputs and integrate them with decoder scores to calculate again the objective function value, then to re-order the N-best list to choose a better candidate. In the decoder's search graph re-decoding, the proposition is to apply WCE score directly to the nodes containing each word to update its cost regarding on the word quality. Furthermore, WCE scores are used to build useful features, which can enhance the performance of the Sentence Confidence Estimation system. In total, our work brings the insightful and multidimensional picture of word quality prediction and its positive impact on various sectors for Machine Translation. The promising results open up a big avenue where WCE can play its role, such as WCE for Automatic Speech Recognition (ASR) System, WCE for multiple MT selection, and WCE for re-trainable and self-learning MT systems. Traduction automatique statistique Mesure confiance Champs conditionnels aléatoires Statistical machine translation Confidence Estimation N-best list re-ranking Boost- ing Feature Selection Quality Estimation 004
357	Improving armed conflict prediction using machine learning : ViEWS+ Helle, Valeria, Negus, Andra-Stefania, Nyberg, Jakob January 2018 (has links) Our project, ViEWS+, expands the software functionality of the Violence EarlyWarning System (ViEWS). ViEWS aims to predict the probabilities of armed conflicts in the next 36 months using machine learning. Governments and policy-makers may use conflict predictions to decide where to deliver aid and resources, potentially saving lives. The predictions use conflict data gathered by ViEWS, which includes variables like past conflicts, child mortality and urban density. The large number of variables raises the need for a selection tool to remove those that are irrelevant for conflict prediction. Before our work, the stakeholders used their experience and some guesswork to pick the variables, and the predictive function with its parameters. Our goals were to improve the efficiency, in terms of speed, and correctness of the ViEWS predictions. Three steps were taken. Firstly, we made an automatic variable selection tool. This helps researchers use fewer, more relevant variables, to save time and resources. Secondly, we compared prediction functions, and identified the best for the purpose of predicting conflict. Lastly, we tested how parameter values affect the performance of the chosen functions, so as to produce good predictions but also reduce the execution time. The new tools improved both the execution time and the predictive correctness of the system compared to the results obtained prior to our project. It is now nine times faster than before, and its correctness has improved by a factor of three. We believe our work leads to more accurate conflict predictions, and as ViEWS has strong connections to the European Union, we hope that decision makers can benefit from it when trying to prevent conflicts. / I detta projekt, vilket vi valt att benämna ViEWS+, har vi förbättrat olika aspekter av ViEWS (Violence Early-Warning System), ett system som med maskinlärning försöker förutsäga var i världen väpnade konflikter kommer uppstå. Målet med ViEWS är att kunna förutsäga sannolikheten för konflikter så långt som 36 månader i framtiden. Målet med att förutsäga sannoliketen för konflikter är att politiker och beslutsfattare ska kunna använda dessa kunskaper för att förhindra dem. Indata till systemet är konfliktdata med ett stort antal egenskaper, så som tidigare konflikter, barnadödlighet och urbanisering. Dessa är av varierande användbarhet, vilket skapar ett behov för att sålla ut de som inte är användbara för att förutsäga framtida konflikter. Innan vårt projekt har forskarna som använder ViEWS valt ut egenskaper för hand, vilket blir allt svårare i och med att fler introduceras. Forskargruppen hade även ingen formell metodik för att välja parametervärden till de maskinlärningsfunktioner de använder. De valde parametrar baserat på erfarenhet och känsla, något som kan leda till onödigt långa exekveringstider och eventuellt sämre resultat beroende på funktionen som används. Våra mål med projektet var att förbättra systemets produktivitet, i termer av exekveringstid och säkerheten i förutsägelserna. För att uppnå detta utvecklade vi analysverktyg för att försöka lösa de existerande problemen. Vi har utvecklat ett verktyg för att välja ut färre, mer användbara, egenskaper från datasamlingen. Detta gör att egenskaper som inte tillför någon viktig information kan sorteras bort vilket sparar exekveringstid. Vi har även jämfört prestandan hos olika maskinlärningsfunktioner, för att identifiera de bäst lämpade för konfliktprediktion. Slutligen har vi implementerat ett verktyg för att analysera hur resultaten från funktionerna varierar efter valet av parametrar. Detta gör att man systematiskt kan bestämma vilka parametervärden som bör väljas för att garantera bra resultat samtidigt som exekveringstid hålls nere. Våra resultat visar att med våra förbättringar sänkes exekveringstiden med en faktor av omkring nio och förutsägelseförmågorna höjdes med en faktor av tre. Vi hoppas att vårt arbete kan leda till säkrare föutsägelser och vilket i sin tur kanske leder till en fredligare värld. Machine learning supervised machine learning conflict prediction peace and conflict feature selection parameter selection random forest decision trees Computer and Information Sciences Data- och informationsvetenskap Engineering and Technology Teknik och teknologier
358	Técnica de aprendizagem automática aplicada a um codificador HEVC em tempo real. OLIVEIRA, Jean Felipe Fonseca de. 07 May 2018 (has links) Submitted by Emanuel Varela Cardoso (emanuel.varela@ufcg.edu.br) on 2018-05-07T19:44:09Z No. of bitstreams: 1 JEAN FELIPE FONSECA DE OLIVEIRA – TESE (PPGEE) 2016.pdf: 4299929 bytes, checksum: 553f9084b2022247c3b7599b696859c9 (MD5) / Made available in DSpace on 2018-05-07T19:44:09Z (GMT). No. of bitstreams: 1 JEAN FELIPE FONSECA DE OLIVEIRA – TESE (PPGEE) 2016.pdf: 4299929 bytes, checksum: 553f9084b2022247c3b7599b696859c9 (MD5) Previous issue date: 2018-05-07 / O padrão HEVC (High Efficiency Video Coding) é o mais recente padrão para codificação de vídeos e tem uma complexidade computacional muito maior do que seu antecessor, o padrão H.264. A grande eficiência de codificação atingida pelo codificador HEVC é obtida com um custo computacional bastante elevado. Esta tese aborda oportunidades de reduzir essa carga computacional. Dessa forma, um algoritmo de decisão prematura de divisão de uma unidade de codificação é proposto para o codificador HEVC, terminando prematuramente o processo de busca pelo melhor particionamento baseado em um modelo de classificação adaptativo, criado em tempo de execução. Esse modelo é gerado por um processo de aprendizado online baseado no algoritmo Pegasos, que é uma implementação que aplica a resolução do gradiente estocástico ao algoritmo SVM (Support Vector Machine). O método proposto foi implementado e integrado ao codificador de referência HM 16.7. Os resultados experimentais mostraram que o codificador modificado reduziu o custo computacional do processo de codificação em até 50%, em alguns casos, e aproximadamente 30% em média, com perdas de qualidade desprezíveis para os usuários. De modo geral, esse processo resulta em reduzidas perdas de qualidade, no entanto, alguns resultados mostraram pequenos ganhos em eficiência de compressão quando comparados com os resultados do codificador HM 16.7. / The most recent video coding standard, the High Efficiency Video Coding (HEVC), has a higher encoding complexity when compared with H.264/AVC, which means a higher computational cost. This thesis presents a review of the recent literature and proposes an algorithm that reduces such complexity. Therefore, a fast CU (Coding Unit) splitting algorithm is proposed for the HEVC encoder, which terminates the CU partitioning process at an early phase, based on an adaptive classification model. This model is generated by an online learning method based on the Primal Estimated sub-GrAdient SOlver for SVM (Pegasos) algorithm. The proposed method is implemented and integrated in the HEVC reference source code on its version 16.7. Experimental results show that the proposed method reduces the computational complexity of the HEVC encoder, up to 50% in some cases, with negligible losses, and shows an average computational reduction of 30%. This process results in reduced coding efficiency losses, however, some results showed a nearby 1% of BD-Rate (Bjontegaard Delta) gains in the Low Delay B configuration, without using an offline training phase. Engenharia elétrica Ciências HEVC Aprendizado automático Vetores Suportes - Máquina Seleção de atributos Aprendizado online Otimização Machine learning Feature selection Vectors Supports - Machine Online Learning Video coding optimization
359	Modèles graphiques pour la classification et les séries temporelles / Graphical models for classification and time series Jebreen, Kamel 28 September 2017 (has links) Dans cette thèse nous nous intéressons aux méthodes de classifications supervisées utilisant les réseaux bayésiens. L'avantage majeur de ces méthodes est qu'elles peuvent prendre en compte les interactions entre les variables explicatives. Dans une première partie nous proposons une procédure de discrétisation spécifique et une procédure de sélection de variables qui permettent d'améliorer considérablement les classifieurs basés sur des réseaux bayésiens. Cette procédure a montré de très bonnes performances empiriques sur un grand choix de jeux de données connus de l’entrepôt d'apprentissage automatique (UCI Machine Learning repository). Une application pour la prévision de type d’épilepsie à partir de de caractéristiques des patients extraites des images de Tomographie par émission de positrons (TEP) confirme l’efficacité de notre approche comparé à des approches communes de classifications supervisées. Dans la deuxième partie de cette thèse nous nous intéressons à la modélisation des interactions entre des variables dans le contexte de séries chronologiques en grande dimension. Nous avons proposé deux nouvelles approches. La première, similaire à la technique "neighborhood Lasso" remplace la technique Lasso par des machines à vecteurs de supports. La deuxième approche est un réseau bayésien restreint: les variables observées à chaque instant et à l’instant précédent sont utilisées dans un réseau dont la structure est restreinte. Nous montrons l’efficacité de ces approches par des simulations utilisant des donnés simulées issues de modèles linéaires, non-linéaires et un mélange des deux. / First, in this dissertation, we will show that Bayesian networks classifiers are very accurate models when compared to other classical machine learning methods. Discretising input variables often increase the performance of Bayesian networks classifiers, as does a feature selection procedure. Different types of Bayesian networks may be used for supervised classification. We combine such approaches together with feature selection and discretisation to show that such a combination gives rise to powerful classifiers. A large choice of data sets from the UCI machine learning repository are used in our experiments, and the application to Epilepsy type prediction based on PET scan data confirms the efficiency of our approach. Second, in this dissertation we also consider modelling interaction between a set of variables in the context of time series and high dimension. We suggest two approaches; the first is similar to the neighbourhood lasso where the lasso model is replaced by Support Vector Machines (SVMs); the second is a restricted Bayesian network for time series. We demonstrate the efficiency of our approaches simulations using linear and nonlinear data set and a mixture of both. Réseaux Bayésiens Classification Sélection de Variables Discrétisation Modèles Graphiques Séries Temporelles Bayesian Networks Classification Feature Selection Discretisation Graphical Models Time Series 510
360	Seleção e construção de features relevantes para o aprendizado de máquina. / Relevant feature selection and construction for machine learning. Huei Diana Lee 27 April 2000 (has links) No Aprendizado de Máquina Supervisionado - AM - é apresentado ao algoritmo de indução um conjunto de instâncias de treinamento, no qual cada instância é um vetor de features rotulado com a classe. O algoritmo de indução tem como tarefa induzir um classificador que será utilizado para classificar novas instâncias. Algoritmos de indução convencionais baseam-se nos dados fornecidos pelo usuário para construir as descrições dos conceitos. Uma representação inadequada do espaço de busca ou da linguagem de descrição do conjunto de instâncias, bem como erros nos exemplos de treinamento, podem tornar os problemas de aprendizado difícies. Um dos problemas centrais em AM é a Seleção de um Subconjunto de Features - SSF - na qual o objetivo é tentar diminuir o número de features que serão fornecidas ao algoritmo de indução. São várias as razões para a realização de SSF. A primeira é que a maioria dos algoritmos de AM, computacionalmente viáveis, não trabalham bem na presença de muitas features, isto é a precisão dos classificadores gerados pode ser melhorada com a aplicação de SSF. Ainda, com um número menor de features, a compreensibilidade do conceito induzido pode ser melhorada. Uma terceira razão é o alto custo para coletar e processar grande quantidade de dados. Existem, basicamente, três abordagens para a SSF: embedded, filtro e wrapper. Por outro lado, se as features utilizadas para descrever os exemplos de treinamento são inadequadas, os algoritmos de aprendizado estão propensos a criar descrições excessivamente complexas e imprecisas. Porém, essas features, individualmente inadequadas, podem algumas vezes serem, convenientemente, combinadas gerando novas features que podem mostrar-se altamente representativas para a descrição de um conceito. O processo de construção de novas features é conhecido como Construção de Features ou Indução Construtiva - IC. Neste trabalho são enfocadas as abordagens filtro e wrapper para a realização de SSF, bem como a IC guiada pelo conhecimento. É descrita uma série de experimentos usando SSF e IC utilizando quatro conjuntos de dados naturais e diversos algoritmos simbólicos de indução. Para cada conjunto de dados e cada indutor, são realizadas várias medidas, tais como, precisão, tempo de execução do indutor e número de features selecionadas pelo indutor. São descritos também diversos experimentos realizados utilizando três conjuntos de dados do mundo real. O foco desses experimentos não está somente na avaliação da performance dos algoritmos de indução, mas também na avaliação do conhecimento extraído. Durante a extração de conhecimento, os resultados foram apresentados aos especialistas para que fossem feitas sugestões para experimentos futuros. Uma parte do conhecimento extraído desses três estudos de casos foram considerados muito interessantes pelos especialistas. Isso mostra que a interação de diferentes áreas de conhecimento, neste caso específico, áreas médica e computacional, pode produzir resultados interessantes. Assim, para que a aplicação do Aprendizado de Máquina possa gerar frutos é necessário que dois grupos de pesquisadores sejam unidos: aqueles que conhecem os métodos de AM existentes e aqueles com o conhecimento no domínio da aplicação para o fornecimento de dados e a avaliação do conhecimento adquirido. / In supervised Machine Learning - ML - an induction algorithm is typically presented with a set of training instances, where each instance is described by a vector of feature values and a class label. The task of the induction algorithm (inducer) is to induce a classifier that will be useful in classifying new cases. Conventional inductive-learning algorithms rely on existing (user) provided data to build their descriptions. Inadequate representation space or description language as well as errors in training examples can make learning problems be difficult. One of the main problems in ML is the Feature Subset Selection - FSS - problem, i.e. the learning algorithm is faced with the problem of selecting some subset of features upon which to focus its attention, while ignoring the rest. There are a variety of reasons that justify doing FSS. The first reason that can be pointed out is that most of the ML algorithms, that are computationally feasible, do not work well in the presence of a very large number of features. This means that FSS can improve the accuracy of the classifiers generated by these algorithms. Another reason to use FSS is that it can improve comprehensibility, i.e. the human ability of understanding the data and the rules generated by symbolic ML algorithms. A third reason for doing FSS is the high cost in some domains for collecting data. Finally, FSS can reduce the cost of processing huge quantities of data. Basically, there are three approaches in Machine Learning for FSS: embedded, filter and wrapper approaches. On the other hand, if the provided features for describing the training examples are inadequate, the learning algorithms are likely to create excessively complex and inaccurate descriptions. These individually inadequate features can sometimes be combined conveniently, generating new features which can turn out to be highly representative to the description of the concept. The process of constructing new features is called Constructive Induction - CI. Is this work we focus on the filter and wrapper approaches for FSS as well as Knowledge-driven CI. We describe a series of experiments for FSS and CI, performed on four natural datasets using several symbolic ML algorithms. For each dataset, various measures are taken to compare the inducers performance, for example accuracy, time taken to run the inducers and number of selected features by each evaluated induction algorithm. Several experiments using three real world datasets are also described. The focus of these three case studies is not only comparing the induction algorithms performance, but also the evaluation of the extracted knowledge. During the knowledge extraction step results were presented to the specialist, who gave many suggestions for the development of further experiments. Some of the knowledge extracted from these three real world datasets were found very interesting by the specialist. This shows that the interaction between different areas, in this case, medical and computational areas, may produce interesting results. Thus, two groups of researchers need to be put together if the application of ML is to bear fruit: those that are acquainted with the existing ML methods, and those with expertise in the given application domain to provide training data. aprendizado de máquina bases de dados médicos construção de features extração de conhecimentos seleção de features Feature Construction Feature Selection knowledge extraction machine learning medical databases

Search results