Spelling suggestions: "subject:"feature byelection."" "subject:"feature dielection.""
351 |
Software defect prediction using maximal information coefficient and fast correlation-based filter feature selectionMpofu, Bongeka 12 1900 (has links)
Software quality ensures that applications that are developed are failure free. Some modern systems are intricate, due to the complexity of their information processes. Software fault prediction is an important quality assurance activity, since it is a mechanism that correctly predicts the defect proneness of modules and classifies modules that saves resources, time and developers’ efforts. In this study, a model that selects relevant features that can be used in defect prediction was proposed. The literature was reviewed and it revealed that process metrics are better predictors of defects in version systems and are based on historic source code over time. These metrics are extracted from the source-code module and include, for example, the number of additions and deletions from the source code, the number of distinct committers and the number of modified lines. In this research, defect prediction was conducted using open source software (OSS) of software product line(s) (SPL), hence process metrics were chosen. Data sets that are used in defect prediction may contain non-significant and redundant attributes that may affect the accuracy of machine-learning algorithms. In order to improve the prediction accuracy of classification models, features that are significant in the defect prediction process are utilised. In machine learning, feature selection techniques are applied in the identification of the relevant data. Feature selection is a pre-processing step that helps to reduce the dimensionality of data in machine learning. Feature selection techniques include information theoretic methods that are based on the entropy concept. This study experimented the efficiency of the feature selection techniques. It was realised that software defect prediction using significant attributes improves the prediction accuracy. A novel MICFastCR model, which is based on the Maximal Information Coefficient (MIC) was developed to select significant attributes and Fast Correlation Based Filter (FCBF) to eliminate redundant attributes. Machine learning algorithms were then run to predict software defects. The MICFastCR achieved the highest prediction accuracy as reported by various performance measures. / School of Computing / Ph. D. (Computer Science)
|
352 |
Umělé neuronové sítě a jejich využití při extrakci znalostí / Artificial Neural Networks and Their Usage For Knowledge ExtractionPetříčková, Zuzana January 2015 (has links)
Title: Artificial Neural Networks and Their Usage For Knowledge Extraction Author: RNDr. Zuzana Petříčková Department: Department of Theoretical Computer Science and Mathema- tical Logic Supervisor: doc. RNDr. Iveta Mrázová, CSc., Department of Theoretical Computer Science and Mathematical Logic Abstract: The model of multi/layered feed/forward neural networks is well known for its ability to generalize well and to find complex non/linear dependencies in the data. On the other hand, it tends to create complex internal structures, especially for large data sets. Efficient solutions to demanding tasks currently dealt with require fast training, adequate generalization and a transparent and simple network structure. In this thesis, we propose a general framework for training of BP/networks. It is based on the fast and robust scaled conjugate gradient technique. This classical training algorithm is enhanced with analytical or approximative sensitivity inhibition during training and enforcement of a transparent in- ternal knowledge representation. Redundant hidden and input neurons are pruned based on internal representation and sensitivity analysis. The performance of the developed framework has been tested on various types of data with promising results. The framework provides a fast training algorithm,...
|
353 |
Reconnaissance des émotions par traitement d’images / Emotions recognition based on image processingGharsalli, Sonia 12 July 2016 (has links)
La reconnaissance des émotions est l'un des domaines scientifiques les plus complexes. Ces dernières années, de plus en plus d'applications tentent de l'automatiser. Ces applications innovantes concernent plusieurs domaines comme l'aide aux enfants autistes, les jeux vidéo, l'interaction homme-machine. Les émotions sont véhiculées par plusieurs canaux. Nous traitons dans notre recherche les expressions émotionnelles faciales en s'intéressant spécifiquement aux six émotions de base à savoir la joie, la colère, la peur, le dégoût, la tristesse et la surprise. Une étude comparative de deux méthodes de reconnaissance des émotions l'une basée sur les descripteurs géométriques et l'autre basée sur les descripteurs d'apparence est effectuée sur la base CK+, base d'émotions simulées, et la base FEEDTUM, base d'émotions spontanées. Différentes contraintes telles que le changement de résolution, le nombre limité d'images labélisées dans les bases d'émotions, la reconnaissance de nouveaux sujets non inclus dans la base d'apprentissage sont également prises en compte. Une évaluation de différents schémas de fusion est ensuite réalisée lorsque de nouveaux cas, non inclus dans l'ensemble d'apprentissage, sont considérés. Les résultats obtenus sont prometteurs pour les émotions simulées (ils dépassent 86%), mais restent insuffisant pour les émotions spontanées. Nous avons appliqué également une étude sur des zones locales du visage, ce qui nous a permis de développer des méthodes hybrides par zone. Ces dernières améliorent les taux de reconnaissance des émotions spontanées. Finalement, nous avons développé une méthode de sélection des descripteurs d'apparence basée sur le taux d'importance que nous avons comparée avec d'autres méthodes de sélection. La méthode de sélection proposée permet d'améliorer le taux de reconnaissance par rapport aux résultats obtenus par deux méthodes reprises de la littérature. / Emotion recognition is one of the most complex scientific domains. In the last few years, various emotion recognition systems are developed. These innovative applications are applied in different domains such as autistic children, video games, human-machine interaction… Different channels are used to express emotions. We focus on facial emotion recognition specially the six basic emotions namely happiness, anger, fear, disgust, sadness and surprise. A comparative study between geometric method and appearance method is performed on CK+ database as the posed emotion database, and FEEDTUM database as the spontaneous emotion database. We consider different constraints in this study such as different image resolutions, the low number of labelled images in learning step and new subjects. We evaluate afterward various fusion schemes on new subjects, not included in the training set. Good recognition rate is obtained for posed emotions (more than 86%), however it is still low for spontaneous emotions. Based on local feature study, we develop local features fusion methods. These ones increase spontaneous emotions recognition rates. A feature selection method is finally developed based on features importance scores. Compared with two methods, our developed approach increases the recognition rate.
|
354 |
Classificação da marcha em parkinsonianos: análise dos algoritmos de aprendizagem supervisionada / Classification of the parkinsonian gait: analysis of supervised learning algorithmsSouza, Hugo Araújo 12 April 2017 (has links)
Parkinson’s disease is the second most prevalent neurodegenerative disease in the elderly, although its dominance and incidence vary according to age, gender and race/ethnicity. Studies indicate that the prevalence increases with age, with an estimate of 5 to 26 cases per 100,000 people per year, being approximately 1% among individuals aged 65- 69 and ranging from 3% to 14.3% among the elderly over 85 years. The most common clinical signs in the inflammatory process include the presence of resting tremor, muscle stiffness, bradykinesia and postural instability. The diagnosis of the disease is not a simple task, as it is known that there are stages patterns of disease progression in the human organism. However, many patients do not follow this progress because of the heterogeneity of manifestations that may arise. The gait analysis has become an attractive and non-invasive quantitative mechanism that can aid in the detection and monitoring of PD patients. Feature extraction is a very important task for quality of the data to be used by the algorithms, aiming as main objective the reduction in the dimensionality of the data in a classification process. From the reduction of dimensionality it is possible to identify which attributes are important and to facilitate the visualization of the data. For data related to human gait, the purpose is to detect relevant attributes that may help in identifying gait cycle phases, such as support and swing phases, cadence, stride length, velocity, etc. To do this, it is necessary to identify and select which attributes are most relevant, as well as the classification method. This work evaluates the performance of supervised learning algorithms in the classification of human gait characteristics in an open database, also identifies which attributes are most relevant to the performance of the classifiers in aiding the identification of gait characteristics in PD patients. / A Doença de Parkinson é a segunda doença neurodegenerativa mais prevalente em idosos, embora seu domínio e incidência variem de acordo com a idade, sexo e raça/etnia. Estudos apontam que a prevalência aumenta com a idade, tendo estimativa de 5 a 26 casos a cada 100 mil pessoas por ano, sendo de aproximadamente 1% entre os indivíduos de 65 a 69 anos e, variando de 3% a 14,3% entre os idosos acima de 85 anos. Os sinais clínicos mais comuns no processo inflamatório incluem a presença de tremor em repouso, rigidez muscular, bradicinesia e instabilidade postural. O diagnóstico da doença não é uma tarefa simples, pois sabe-se que há padrões de estágios no avanço da doença no organismo humano. Porém, muitos pacientes não seguem esse progresso devido a heterogeneidade de manifestações que podem surgir. A análise da marcha tornou-se um mecanismo quantitativo atrativo e não invasivo que pode auxiliar na detecção e monitoramento de portadores de DP. A extração de características é uma tarefa de suma importância para a qualidade dos dados a serem empregados pelos algoritmos de AM, visando como principal objetivo a redução na dimensionalidade dos dados em um processo de classificação. A partir da redução da dimensionalidade é possível identificar, principalmente, quais atributos são importantes e facilitar a visualização dos dados. Para dados relacionados à marcha humana, o propósito é detectar relevantes atributos que possam ajudar na identificação das fases do ciclo da marcha, como as fases de apoio e swing, cadência, comprimento da passada, velocidade, entre outras. Para tal, é preciso identificar e selecionar quais atributos são mais relevantes, assim como o método de classificação. Este trabalho avalia o desempenho de algoritmos de aprendizagem supervisionada na classificação das características da marcha humana em uma base de dados aberta, também identifica quais atributos são mais relevantes para o desempenho dos classificadores no auxílio à identificação de características da marcha em portadores da DP.
|
355 |
Seleção de atributos para classificação de textos usando técnicas baseadas em agrupamento, PoS tagging e algoritmos evolutivosFerreira, Charles Henrique Porto January 2016 (has links)
Orientadora: Profa. Dra. Debora Maria Rossi de Medeiros / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Ciência da Computação, 2016. / Neste trabalho são investigadas técnicas de seleção de atributos para serem
aplicadas à tarefa de classificação de textos. Três técnicas diferentes são propostas
para serem comparadas com as técnicas tradicionais de pré-processamento de textos.
A primeira técnica propõe que nem todas as classes gramaticais de um dado idioma
sejam relevantes em um texto quando este é submetido à tarefa de classificação.
A segunda técnica emprega agrupamento de atributos e algoritmos genéticos para
seleção de grupos. Na terceira técnica são levantadas 2 hipóteses: a primeira supõe
que as palavras que ocorrem com mais frequência em uma base de textos do que no
idioma como um todo, podem ser as palavras mais importantes para comporem os
atributos; a segunda presume que as relações de cada instância de dados com cada
classe pode compor um novo conjunto de atributos.
Os resultados obtidos sugerem que as abordagens propostas são promissoras
e que as hipóteses levantadas podem ser válidas. Os experimentos com a primeira
abordagem mostram que existe um conjunto de classes gramaticais cujas palavras
podem ser desconsideradas do conjunto de atributos em bases de textos diferentes
mantendo ou até melhorando a acurácia de classificação. A segunda abordagem consegue
realizar uma forte redução no número de atributos original e ainda melhorar
a acurácia de classificação. Com a terceira abordagem, foi obtida a redução mais
acentuada do número de atributos pois, pela natureza da proposta, o número final
de atributos é igual ao número de classes da base, e o impacto na acurácia foi nulo
ou até positivo. / This work investigates feature selection techniques to be applied to text
classification task. Three different techniques are proposed to be compared with
the traditional techniques of preprocessing texts. The first technique proposed that
not all grammatical classes of a given language in a text are relevant when it is
subjected to the classification task. The second technique employs clustering features
and genetic algorithms for selecting groups. In the third technique are raised two
hypotheses: the first assumes that the words that occur most often on the dataset
than the language as a whole, may be the most important words to compose the
features; the second assumes that the relationship of each data instance with each
class can compose a new set of attributes.
The results suggest that the proposed approaches are promising and that
the hypotheses may be valid. The experiments show that the first approach is a
set of grammatical word classes which can be disregarded from the set of features
from different datasets maintaining or even improving the accuracy of classification.
The second approach can achieve a significant reduction in the number of unique
features and to improve the accuracy of classification. With the third approach, it
obtained the more pronounced reduction in the number of features because, by the
nature of the proposal, the final number offeatures is equal to the number of classes
of the dataset, and the impact on the accuracy was zero or even positive.
|
356 |
Image-based detection and classification of allergenic pollen / Détection et classification des pollens allergisants basée sur l'imageLozano Vega, Gildardo 18 June 2015 (has links)
Le traitement médical des allergies nécessite la caractérisation des pollens en suspension dans l’air. Toutefois, cette tâche requiert des temps d’analyse très longs lorsqu’elle est réalisée de manière manuelle. Une approche automatique améliorerait ainsi considérablement les applications potentielles du comptage de pollens. Les dernières techniques d’analyse d’images permettent la détection de caractéristiques discriminantes. C’est pourquoi nous proposons dans cette thèse un ensemble de caractéristiques pertinentes issues d’images pour la reconnaissance des principales classes de pollen allergènes. Le cœur de notre étude est l’évaluation de groupes de caractéristiques capables de décrire correctement les pollens en termes de forme, texture, taille et ouverture. Les caractéristiques sont extraites d’images acquises classiquement sous microscope, permettant la reproductibilité de la méthode. Une étape de sélection des caractéristiques est appliquée à chaque groupe pour évaluer sa pertinence.Concernant les apertures présentes sur certains pollens, une méthode adaptative de détection, localisation et comptage pour différentes classes de pollens avec des apparences variées est proposée. La description des apertures se base sur une stratégie de type Sac-de-Mots appliquée à des primitives issues des images. Une carte de confiance est construite à partir de la confiance donnée à la classification des régions de l’image échantillonnée. De cette carte sont extraites des caractéristiques propres aux apertures, permettant leur comptage. La méthode est conçue pour être étendue de façon modulable à de nouveaux types d’apertures en utilisant le même algorithme mais avec un classifieur spécifique.Les groupes de caractéristiques ont été testés individuellement et conjointement sur les classes de pollens les plus répandues en Allemagne. Nous avons montré leur efficacité lors d’une classification de type SVM, notamment en surpassant la variance intra-classe et la similarité inter-classe. Les résultats obtenus en utilisant conjointement tous les groupes de caractéristiques ont abouti à une précision de 98,2 %, comparable à l’état de l’art. / The correct classification of airborne pollen is relevant for medical treatment of allergies, and the regular manual process is costly and time consuming. An automatic processing would increase considerably the potential of pollen counting. Modern computer vision techniques enable the detection of discriminant pollen characteristics. In this thesis, a set of relevant image-based features for the recognition of top allergenic pollen taxa is proposed and analyzed. The foundation of our proposal is the evaluation of groups of features that can properly describe pollen in terms of shape, texture, size and apertures. The features are extracted on typical brightfield microscope images that enable the easy reproducibility of the method. A process of feature selection is applied to each group for the determination of relevance.Regarding apertures, a flexible method for detection, localization and counting of apertures of different pollen taxa with varying appearances is proposed. Aperture description is based on primitive images following the Bag-of-Words strategy. A confidence map is built from the classification confidence of sampled regions. From this map, aperture features are extracted, which include the count of apertures. The method is designed to be extended modularly to new aperture types employing the same algorithm to build individual classifiers.The feature groups are tested individually and jointly on of the most allergenic pollen taxa in Germany. They demonstrated to overcome the intra-class variance and inter-class similarity in a SVM classification scheme. The global joint test led to accuracy of 98.2%, comparable to the state-of-the-art procedures.
|
357 |
Optimisation du test de production de circuits analogiques et RF par des techniques de modélisation statistique / Optimisation of the production test of analog and RF circuit using statistical modeling techniquesAkkouche, Nourredine 09 September 2011 (has links)
La part dû au test dans le coût de conception et de fabrication des circuits intégrés ne cesse de croître, d'où la nécessité d'optimiser cette étape devenue incontournable. Dans cette thèse, de nouvelles méthodes d'ordonnancement et de réduction du nombre de tests à effectuer sont proposées. La solution est un ordre des tests permettant de détecter au plus tôt les circuits défectueux, qui pourra aussi être utilisé pour éliminer les tests redondants. Ces méthodes de test sont basées sur la modélisation statistique du circuit sous test. Cette modélisation inclus plusieurs modèles paramétriques et non paramétrique permettant de s'adapté à tous les types de circuit. Une fois le modèle validé, les méthodes de test proposées génèrent un grand échantillon contenant des circuits défectueux. Ces derniers permettent une meilleure estimation des métriques de test, en particulier le taux de défauts. Sur la base de cette erreur, un ordonnancement des tests est construit en maximisant la détection des circuits défectueux au plus tôt. Avec peu de tests, la méthode de sélection et d'évaluation est utilisée pour obtenir l'ordre optimal des tests. Toutefois, avec des circuits contenant un grand nombre de tests, des heuristiques comme la méthode de décomposition, les algorithmes génétiques ou les méthodes de la recherche flottante sont utilisées pour approcher la solution optimale. / The share of test in the cost of design and manufacture of integrated circuits continues to grow, hence the need to optimize this step. In this thesis, new methods of test scheduling and reducing the number of tests are proposed. The solution is a sequence of tests for early identification of faulty circuits, which can also be used to eliminate redundant tests. These test methods are based on statistical modeling of the circuit under test. This model included several parametric and non-parametric models to adapt to all types of circuit. Once the model is validated, the suggested test methods generate a large sample containing defective circuits. These allow a better estimation of test metrics, particularly the defect level. Based on this error, a test scheduling is constructed by maximizing the detection of faulty circuits. With few tests, the Branch and Bound method is used to obtain the optimal order of tests. However, with circuits containing a large number of tests, heuristics such as decomposition method, genetic algorithms or floating search methods are used to approach the optimal solution.
|
358 |
Word Confidence Estimation and Its Applications in Statistical Machine Translation / Les mesures de confiance au niveau des mots et leurs applications pour la traduction automatique statistiqueLuong, Ngoc Quang 12 November 2014 (has links)
Les systèmes de traduction automatique (TA), qui génèrent automatiquement la phrase de la langue cible pour chaque entrée de la langue source, ont obtenu plusieurs réalisations convaincantes pendant les dernières décennies et deviennent les aides linguistiques efficaces pour la communauté entière dans un monde globalisé. Néanmoins, en raison de différents facteurs, sa qualité en général est encore loin de la perfection, constituant le désir des utilisateurs de savoir le niveau de confiance qu'ils peuvent mettre sur une traduction spécifique. La construction d'une méthode qui est capable d'indiquer des bonnes parties ainsi que d'identifier des erreurs de la traduction est absolument une bénéfice pour non seulement les utilisateurs, mais aussi les traducteurs, post-éditeurs, et les systèmes de TA eux-mêmes. Nous appelons cette méthode les mesures de confiance (MC). Cette thèse se porte principalement sur les méthodes des MC au niveau des mots (MCM). Le système de MCM assigne à chaque mot de la phrase cible un étiquette de qualité. Aujourd'hui, les MCM jouent un rôle croissant dans nombreux aspects de TA. Tout d'abord, elles aident les post-éditeurs d'identifier rapidement les erreurs dans la traduction et donc d'améliorer leur productivité de travail. De plus, elles informent les lecteurs des portions qui ne sont pas fiables pour éviter leur malentendu sur le contenu de la phrase. Troisièmement, elles sélectionnent la meilleure traduction parmi les sorties de plusieurs systèmes de TA. Finalement, et ce qui n'est pas le moins important, les scores MCM peuvent aider à perfectionner la qualité de TA via certains scénarios: ré-ordonnance des listes N-best, ré-décodage du graphique de la recherche, etc. Dans cette thèse, nous visons à renforcer et optimiser notre système de MCM, puis à l'exploiter pour améliorer TA ainsi que les mesures de confiance au niveau des phrases (MCP). Comparer avec les approches précédentes, nos nouvelles contributions étalent sur les points principaux comme suivants. Tout d'abord, nous intégrons différents types des paramètres: ceux qui sont extraits du système TA, avec des caractéristiques lexicales, syntaxiques et sémantiques pour construire le système MCM de base. L'application de différents méthodes d'apprentissage nous permet d'identifier la meilleure (méthode: "Champs conditionnels aléatoires") qui convient le plus nos donnés. En suite, l'efficacité de touts les paramètres est plus profond examinée en utilisant un algorithme heuristique de sélection des paramètres. Troisièmement, nous exploitons l'algorithme Boosting comme notre méthode d'apprentissage afin de renforcer la contribution des sous-ensembles des paramètres dominants du système MCM, et en conséquence d'améliorer la capacité de prédiction du système MCM. En outre, nous enquérons les contributions des MCM vers l'amélioration de la qualité de TA via différents scénarios. Dans le re-ordonnance des liste N-best, nous synthétisons les scores à partir des sorties du système MCM et puis les intégrons avec les autres scores du décodeur afin de recalculer la valeur de la fonction objective, qui nous permet d'obtenir un mieux candidat. D'ailleurs, dans le ré-décodage du graphique de la recherche, nous appliquons des scores de MCM directement aux noeuds contenant chaque mot pour mettre à jour leurs coûts. Une fois la mise à jour se termine, la recherche pour meilleur chemin sur le nouveau graphique nous donne la nouvelle hypothèse de TA. Finalement, les scores de MCM sont aussi utilisés pour renforcer les performances des systèmes de MCP. Au total, notre travail apporte une image perspicace et multidimensionnelle sur des MCM et leurs impacts positifs sur différents secteurs de la TA. Les résultats très prometteurs ouvrent une grande avenue où MCM peuvent exprimer leur rôle, comme: MCM pour la reconnaissance automatique de la parole (RAP), pour la sélection parmi plusieurs systèmes de TA, et pour les systèmes de TA auto-apprentissage. / Machine Translation (MT) systems, which generate automatically the translation of a target language for each source sentence, have achieved impressive gains during the recent decades and are now becoming the effective language assistances for the entire community in a globalized world. Nonetheless, due to various factors, MT quality is still not perfect in general, and the end users therefore expect to know how much should they trust a specific translation. Building a method that is capable of pointing out the correct parts, detecting the translation errors and concluding the overall quality of each MT hypothesis is definitely beneficial for not only the end users, but also for the translators, post-editors, and MT systems themselves. Such method is widely known under the name Confidence Estimation (CE) or Quality Estimation (QE). The motivations of building such automatic estimation methods originate from the actual drawbacks of assessing manually the MT quality: this task is time consuming, effort costly, and sometimes impossible in case where the readers have little or no knowledge of the source language. This thesis mostly focuses on the CE methods at word level (WCE). The WCE classifier tags each word in the MT output a quality label. The WCE working mechanism is straightforward: a classifier trained beforehand by a number of features using ML methods computes the confidence score of each label for each MT output word, then tag this word with highest score label. Nowadays, WCE shows an increasing importance in many aspects of MT. Firstly, it assists the post-editors to quickly identify the translation errors, hence improve their productivity. Secondly, it informs readers of portions of sentence that are not reliable to avoid the misunderstanding about the sentence's content. Thirdly, it selects the best translation among options from multiple MT systems. Last but not least, WCE scores can help to improve the MT quality via some scenarios: N-best list re-ranking, Search Graph Re-decoding, etc. In this thesis, we aim at building and optimizing our baseline WCE system, then exploiting it to improve MT and Sentence Confidence Estimation (SCE). Compare to the previous approaches, our novel contributions spread of these following main points. Firstly, we integrate various types of prediction indicators: system-based features extracted from the MT system, together with lexical, syntactic and semantic features to build the baseline WCE systems. We also apply multiple Machine Learning (ML) models on the entire feature set and then compare their performances to select the optimal one to optimize. Secondly, the usefulness of all features is deeper investigated using a greedy feature selection algorithm. Thirdly, we propose a solution that exploits Boosting algorithm as a learning method in order to strengthen the contribution of dominant feature subsets to the system, thus improve of the system's prediction capability. Lastly, we explore the contributions of WCE in improving MT quality via some scenarios. In N-best list re-ranking, we synthesize scores from WCE outputs and integrate them with decoder scores to calculate again the objective function value, then to re-order the N-best list to choose a better candidate. In the decoder's search graph re-decoding, the proposition is to apply WCE score directly to the nodes containing each word to update its cost regarding on the word quality. Furthermore, WCE scores are used to build useful features, which can enhance the performance of the Sentence Confidence Estimation system. In total, our work brings the insightful and multidimensional picture of word quality prediction and its positive impact on various sectors for Machine Translation. The promising results open up a big avenue where WCE can play its role, such as WCE for Automatic Speech Recognition (ASR) System, WCE for multiple MT selection, and WCE for re-trainable and self-learning MT systems.
|
359 |
Improving armed conflict prediction using machine learning : ViEWS+Helle, Valeria, Negus, Andra-Stefania, Nyberg, Jakob January 2018 (has links)
Our project, ViEWS+, expands the software functionality of the Violence EarlyWarning System (ViEWS). ViEWS aims to predict the probabilities of armed conflicts in the next 36 months using machine learning. Governments and policy-makers may use conflict predictions to decide where to deliver aid and resources, potentially saving lives. The predictions use conflict data gathered by ViEWS, which includes variables like past conflicts, child mortality and urban density. The large number of variables raises the need for a selection tool to remove those that are irrelevant for conflict prediction. Before our work, the stakeholders used their experience and some guesswork to pick the variables, and the predictive function with its parameters. Our goals were to improve the efficiency, in terms of speed, and correctness of the ViEWS predictions. Three steps were taken. Firstly, we made an automatic variable selection tool. This helps researchers use fewer, more relevant variables, to save time and resources. Secondly, we compared prediction functions, and identified the best for the purpose of predicting conflict. Lastly, we tested how parameter values affect the performance of the chosen functions, so as to produce good predictions but also reduce the execution time. The new tools improved both the execution time and the predictive correctness of the system compared to the results obtained prior to our project. It is now nine times faster than before, and its correctness has improved by a factor of three. We believe our work leads to more accurate conflict predictions, and as ViEWS has strong connections to the European Union, we hope that decision makers can benefit from it when trying to prevent conflicts. / I detta projekt, vilket vi valt att benämna ViEWS+, har vi förbättrat olika aspekter av ViEWS (Violence Early-Warning System), ett system som med maskinlärning försöker förutsäga var i världen väpnade konflikter kommer uppstå. Målet med ViEWS är att kunna förutsäga sannolikheten för konflikter så långt som 36 månader i framtiden. Målet med att förutsäga sannoliketen för konflikter är att politiker och beslutsfattare ska kunna använda dessa kunskaper för att förhindra dem. Indata till systemet är konfliktdata med ett stort antal egenskaper, så som tidigare konflikter, barnadödlighet och urbanisering. Dessa är av varierande användbarhet, vilket skapar ett behov för att sålla ut de som inte är användbara för att förutsäga framtida konflikter. Innan vårt projekt har forskarna som använder ViEWS valt ut egenskaper för hand, vilket blir allt svårare i och med att fler introduceras. Forskargruppen hade även ingen formell metodik för att välja parametervärden till de maskinlärningsfunktioner de använder. De valde parametrar baserat på erfarenhet och känsla, något som kan leda till onödigt långa exekveringstider och eventuellt sämre resultat beroende på funktionen som används. Våra mål med projektet var att förbättra systemets produktivitet, i termer av exekveringstid och säkerheten i förutsägelserna. För att uppnå detta utvecklade vi analysverktyg för att försöka lösa de existerande problemen. Vi har utvecklat ett verktyg för att välja ut färre, mer användbara, egenskaper från datasamlingen. Detta gör att egenskaper som inte tillför någon viktig information kan sorteras bort vilket sparar exekveringstid. Vi har även jämfört prestandan hos olika maskinlärningsfunktioner, för att identifiera de bäst lämpade för konfliktprediktion. Slutligen har vi implementerat ett verktyg för att analysera hur resultaten från funktionerna varierar efter valet av parametrar. Detta gör att man systematiskt kan bestämma vilka parametervärden som bör väljas för att garantera bra resultat samtidigt som exekveringstid hålls nere. Våra resultat visar att med våra förbättringar sänkes exekveringstiden med en faktor av omkring nio och förutsägelseförmågorna höjdes med en faktor av tre. Vi hoppas att vårt arbete kan leda till säkrare föutsägelser och vilket i sin tur kanske leder till en fredligare värld.
|
360 |
Técnica de aprendizagem automática aplicada a um codificador HEVC em tempo real.OLIVEIRA, Jean Felipe Fonseca de. 07 May 2018 (has links)
Submitted by Emanuel Varela Cardoso (emanuel.varela@ufcg.edu.br) on 2018-05-07T19:44:09Z
No. of bitstreams: 1
JEAN FELIPE FONSECA DE OLIVEIRA – TESE (PPGEE) 2016.pdf: 4299929 bytes, checksum: 553f9084b2022247c3b7599b696859c9 (MD5) / Made available in DSpace on 2018-05-07T19:44:09Z (GMT). No. of bitstreams: 1
JEAN FELIPE FONSECA DE OLIVEIRA – TESE (PPGEE) 2016.pdf: 4299929 bytes, checksum: 553f9084b2022247c3b7599b696859c9 (MD5)
Previous issue date: 2018-05-07 / O padrão HEVC (High Efficiency Video Coding) é o mais recente padrão para codificação de vídeos e tem uma complexidade computacional muito maior do que seu antecessor, o
padrão H.264. A grande eficiência de codificação atingida pelo codificador HEVC é obtida com um custo computacional bastante elevado. Esta tese aborda oportunidades de reduzir essa carga computacional. Dessa forma, um algoritmo de decisão prematura de divisão de uma unidade de codificação é proposto para o codificador HEVC, terminando prematuramente o processo de busca pelo melhor particionamento baseado em um modelo de classificação adaptativo, criado em tempo de execução. Esse modelo é gerado por um processo de aprendizado online baseado no algoritmo Pegasos, que é uma implementação que aplica a resolução do gradiente estocástico ao algoritmo SVM (Support Vector Machine). O método proposto foi implementado e integrado ao codificador de referência HM 16.7. Os resultados experimentais mostraram que o
codificador modificado reduziu o custo computacional do processo de codificação em até 50%, em alguns casos, e aproximadamente 30% em média, com perdas de qualidade desprezíveis para os usuários. De modo geral, esse processo resulta em reduzidas perdas de qualidade, no entanto, alguns resultados mostraram pequenos ganhos em eficiência de compressão quando comparados com os resultados do codificador HM 16.7. / The most recent video coding standard, the High Efficiency Video Coding (HEVC), has
a higher encoding complexity when compared with H.264/AVC, which means a higher computational cost. This thesis presents a review of the recent literature and proposes an algorithm that reduces such complexity. Therefore, a fast CU (Coding Unit) splitting algorithm is proposed for the HEVC encoder, which terminates the CU partitioning process at an early phase, based on an adaptive classification model. This model is generated by an online learning method based on the Primal Estimated sub-GrAdient SOlver for SVM (Pegasos) algorithm. The proposed method is implemented and integrated in the HEVC reference source code on its version 16.7. Experimental results show that the proposed method reduces the computational complexity of the HEVC encoder, up to 50% in some cases, with negligible losses, and shows an average computational reduction of 30%. This process results in reduced coding efficiency losses, however, some results showed a nearby 1% of BD-Rate (Bjontegaard Delta) gains in the Low Delay B configuration, without using an offline training phase.
|
Page generated in 0.1145 seconds