• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 5598
  • 577
  • 282
  • 275
  • 167
  • 157
  • 83
  • 66
  • 50
  • 42
  • 24
  • 21
  • 20
  • 19
  • 12
  • Tagged with
  • 9042
  • 9042
  • 3028
  • 1688
  • 1534
  • 1522
  • 1417
  • 1358
  • 1192
  • 1186
  • 1158
  • 1128
  • 1113
  • 1024
  • 1020
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
911

ML4JIT- um arcabouço para pesquisa com aprendizado de máquina em compiladores JIT. / ML4JIT - a framework for research on machine learning in JIT compilers.

Alexandre dos Santos Mignon 27 June 2017 (has links)
Determinar o melhor conjunto de otimizações para serem aplicadas a um programa tem sido o foco de pesquisas em otimização de compilação por décadas. Em geral, o conjunto de otimizações é definido manualmente pelos desenvolvedores do compilador e aplicado a todos os programas. Técnicas de aprendizado de máquina supervisionado têm sido usadas para o desenvolvimento de heurísticas de otimização de código. Elas pretendem determinar o melhor conjunto de otimizações com o mínimo de interferência humana. Este trabalho apresenta o ML4JIT, um arcabouço para pesquisa com aprendizado de máquina em compiladores JIT para a linguagem Java. O arcabouço permite que sejam realizadas pesquisas para encontrar uma melhor sintonia das otimizações específica para cada método de um programa. Experimentos foram realizados para a validação do arcabouço com o objetivo de verificar se com seu uso houve uma redução no tempo de compilação dos métodos e também no tempo de execução do programa. / Determining the best set of optimizations to be applied in a program has been the focus of research on compile optimization for decades. In general, the set of optimization is manually defined by compiler developers and apply to all programs. Supervised machine learning techniques have been used for the development of code optimization heuristics. They intend to determine the best set of optimization with minimal human intervention. This work presents the ML4JIT, a framework for research with machine learning in JIT compilers for Java language. The framework allows research to be performed to better tune the optimizations specific to each method of a program. Experiments were performed for the validation of the framework with the objective of verifying if its use had a reduction in the compilation time of the methods and also in the execution time of the program.
912

Data-driven modelling for demand response from large consumer energy assets

Krishnadas, Gautham January 2018 (has links)
Demand response (DR) is one of the integral mechanisms of today's smart grids. It enables consumer energy assets such as flexible loads, standby generators and storage systems to add value to the grid by providing cost-effective flexibility. With increasing renewable generation and impending electric vehicle deployment, there is a critical need for large volumes of reliable and responsive flexibility through DR. This poses a new challenge for the electricity sector. Smart grid development has resulted in the availability of large amounts of data from different physical segments of the grid such as generation, transmission, distribution and consumption. For instance, smart meter data carrying valuable information is increasingly available from the consumers. Parallel to this, the domain of data analytics and machine learning (ML) is making immense progress. Data-driven modelling based on ML algorithms offers new opportunities to utilise the smart grid data and address the DR challenge. The thesis demonstrates the use of data-driven models for enhancing DR from large consumers such as commercial and industrial (C&I) buildings. A reliable, computationally efficient, cost-effective and deployable data-driven model is developed for large consumer building load estimation. The selection of data pre-processing and model development methods are guided by these design criteria. Based on this model, DR operational tasks such as capacity scheduling, performance evaluation and reliable operation are demonstrated for consumer energy assets such as flexible loads, standby generators and storage systems. Case studies are designed based on the frameworks of ongoing DR programs in different electricity markets. In these contexts, data-driven modelling shows substantial improvement over the conventional models and promises more automation in DR operations. The thesis also conceptualises an emissions-based DR program based on emissions intensity data and consumer load flexibility to demonstrate the use of smart grid data in encouraging renewable energy consumption. Going forward, the thesis advocates data-informed thinking for utilising smart grid data towards solving problems faced by the electricity sector.
913

Combining genome-wide association studies, polygenic risk scores and SNP-SNP interactions to investigate the genomic architecture of human complex diseases : more than the sum of its parts

Meijsen, Joeri Jeroen January 2018 (has links)
Major Depressive Disorder is a devastating psychiatric illness with a complex genetic and environmental component that affects 10% of the UK population. Previous studies have shown that that individuals with depression show poorer performance on measures of cognitive domains such as memory, attention, language and executive functioning. A major risk factor for depression is a higher level of neuroticism, which has been shown to be associated with depression throughout life. Understanding cognitive performance in depression and neuroticism could lead to a better understanding of the aetiology of depression. The first aim of this thesis focused on assessing phenotypic and genetic differences in cognitive performance between healthy controls and depressed individuals and also between single episode and recurrent depression. A second aim was determining the capability of two decision-tree based methods to detect simulated gene-gene interactions. The third aim was to develop a novel statistical methodology for simultaneously analysing single SNP, additive and interacting genetic components associated with neuroticism using machine leaning. To assess the phenotypic and genetic differences in depression, 7,012 unrelated Generation Scotland participants (of which 1,042 were clinically diagnosed with depression) were analysed. Significant differences in cognitive performance were observed in two domains: processing speed and vocabulary. Individuals with recurrent depression showed lower processing speed scores compared to both controls and individuals with single episode depression. Higher vocabulary scores were observed in depressed individuals compared to controls and in individuals with recurrent depression compared to controls. These significant differences could not be tied to significant single locus associations. Derived polygenic scores using the large CHARGE processing speed GWAS explained up to 1% of variation in processing speed performance among individuals with single episode and recurrent depression. Two greedy non-parametric decision-tree based methods - C5.0 and logic regression - were applied to simulated gene-gene interaction data from Generation Scotland. Several gene-gene interactions were simulated under multiple scenarios (e.g. size, strength of association levels and the presence of a polygenic component) to assess the power and type I error. C5.0 was found to have an increased power with a conservative type I error using simulated data. C5.0 was applied to years of education as a proxy of educational attainment in 6,765 Generation Scotland participants. Multiple interacting loci were detected that were associated with years of education, some most notably located in genes known to be associated with reading and spelling (RCAN3) and neurodevelopmental traits (NPAS3). C5.0 was incorporated in a novel methodology called Machine-learning for Additive and Interaction Combined Analysis (MAICA). MAICA allows for a simultaneous analysis of single locus, polygenic components, and gene-gene interaction risk factors by means of a machine learning implementation. MAICA was applied on neuroticism scores in both Generation Scotland and UK Biobank. The MAICA model in Generation Scotland included 151 single loci and 11 gene-gene interaction sets, and explained ~6.5% of variation in neuroticism scores. Applying the same model to UK Biobank did not lead to a statistically significant prediction of neuroticism scores. The results presented in this thesis showed that individuals with depression performed significantly lower on the processing speed tests but higher on vocabulary test and that 1% of variation in processing speed can be explained by using a large processing speed GWAS. Evidence was provided that C5.0 had increased power and acceptable type I error rates versus logic regression when epistatic models exist - even with a strong underlying polygenic component, and that MAICA is an efficient tool to assess single locus, polygenic and epistatic components simultaneously. MAICA is open-source, and will provide a useful tool for other researchers of complex human traits who are interested in exploring the relative contributions of these different genomic architectures.
914

Accelerating process development of complex chemical reactions

Amar, Yehia January 2019 (has links)
Process development of new complex reactions in the pharmaceutical and fine chemicals industries is challenging, and expensive. The field is beginning to see a bridging between fundamental first-principles investigations, and utilisation of data-driven statistical methods, such as machine learning. Nonetheless, process development and optimisation in these industries is mostly driven by trial-and-error, and experience. Approaches that move beyond these are limited to the well-developed optimisation of continuous variables, and often do not yield physical insights. This thesis describes several new methods developed to address research questions related to this challenge. First, we investigated whether utilising physical knowledge could aid statistics-guided self-optimisation of a C-H activation reaction, in which the optimisation variables were continuous. We then considered algorithmic treatment of the more challenging discrete variables, focussing on solvents. We parametrised a library of 459 solvents with physically meaningful molecular descriptors. Our case study was a homogeneous Rh-catalysed asymmetric hydrogenation to produce a chiral γ-lactam, with conversion and diastereoselectivity as objectives. We adapted a state-of-the-art multi-objective machine learning algorithm, based on Gaussian processes, to utilise the descriptors as inputs, and to create a surrogate model for each objective. The aim of the algorithm was to determine a set of Pareto solutions with a minimum experimental budget, whilst simultaneously addressing model uncertainty. We found that descriptors are a valuable tool for Design of Experiments, and can produce predictive and interpretable surrogate models. Subsequently, a physical investigation of this reaction led to the discovery of an efficient catalyst-ligand system, which we studied by operando NMR, and identified a parametrised kinetic model. Turning the focus then to ligands for asymmetric hydrogenation, we calculated versatile empirical descriptors based on the similarity of atomic environments, for 102 chiral ligands, to predict diastereoselectivity. Whilst the model fit was good, it failed to accurately predict the performance of an unseen ligand family, due to analogue bias. Physical knowledge has then guided the selection of symmetrised physico-chemical descriptors. This produced more accurate predictive models for diastereoselectivity, including for an unseen ligand family. The contribution of this thesis is a development of novel and effective workflows and methodologies for process development. These open the door for process chemists to save time and resources, freeing them up from routine work, to focus instead on creatively designing new chemistry for future real-world applications.
915

Analysis of the migratory potential of cancerous cells by image preprocessing, segmentation and classification / Analyse du potentiel migratoire des cellules cancéreuses par prétraitement et segmentation d'image et classification des données

Syed, Tahir Qasim 13 December 2011 (has links)
Ce travail de thèse s’insère dans un projet de recherche plus global dont l’objectif est d’analyser le potentiel migratoire de cellules cancéreuses. Dans le cadre de ce doctorat, on s’intéresse à l’utilisation du traitement des images pour dénombrer et classifier les cellules présentes dans une image acquise via un microscope. Les partenaires biologistes de ce projet étudient l’influence de l’environnement sur le comportement migratoire de cellules cancéreuses à partir de cultures cellulaires pratiquées sur différentes lignées de cellules cancéreuses. Le traitement d’images biologiques a déjà donné lieu `a un nombre important de publications mais, dans le cas abordé ici et dans la mesure où le protocole d’acquisition des images acquises n'était pas figé, le défi a été de proposer une chaîne de traitements adaptatifs ne contraignant pas les biologistes dans leurs travaux de recherche. Quatre étapes sont détaillées dans ce mémoire. La première porte sur la définition des prétraitements permettant d’homogénéiser les conditions d’acquisition. Le choix d’exploiter l’image des écarts-type plutôt que la luminosité est un des résultats issus de cette première partie. La deuxième étape consiste à compter le nombre de cellules présentent dans l’image. Un filtre original, nommé filtre «halo», permettant de renforcer le centre des cellules afin d’en faciliter leur comptage, a été proposé. Une étape de validation statistique de ces centres permet de fiabiliser le résultat obtenu. L’étape de segmentation des images, sans conteste la plus difficile, constitue la troisième partie de ce travail. Il s’agit ici d’extraire des «vignettes», contenant une seule cellule. Le choix de l’algorithme de segmentation a été celui de la «Ligne de Partage des Eaux», mais il a fallu adapter cet algorithme au contexte des images faisant l’objet de cette étude. La proposition d’utiliser une carte de probabilités comme données d’entrée a permis d’obtenir une segmentation au plus près des bords des cellules. Par contre cette méthode entraine une sur-segmentation qu’il faut réduire afin de tendre vers l’objectif : «une région = une cellule». Pour cela un algorithme utilisant un concept de hiérarchie cumulative basée morphologie mathématique a été développée. Il permet d’agréger des régions voisines en travaillant sur une représentation arborescente de ces régions et de leur niveau associé. La comparaison des résultats obtenus par cette méthode à ceux proposés par d’autres approches permettant de limiter la sur-segmentation a permis de prouver l’efficacité de l’approche proposée. L’étape ultime de ce travail consiste dans la classification des cellules. Trois classes ont été définies : cellules allongées (migration mésenchymateuse), cellules rondes «blebbantes» (migration amiboïde) et cellules rondes «lisses» (stade intermédiaire du mode de migration). Sur chaque vignette obtenue à la fin de l’étape de segmentation, des caractéristiques de luminosité, morphologiques et texturales ont été calculées. Une première analyse de ces caractéristiques a permis d’élaborer une stratégie de classification, à savoir séparer dans un premier temps les cellules rondes des cellules allongées, puis séparer les cellules rondes «lisses» des «blebbantes». Pour cela on divise les paramètres en deux jeux qui vont être utilisés successivement dans ces deux étapes de classification. Plusieurs algorithmes de classification ont été testés pour retenir, au final, l’utilisation de deux réseaux de neurones permettant d’obtenir plus de 80% de bonne classification entre cellules longues et cellules rondes, et près de 90% de bonne classification entre cellules rondes «lisses» et «blebbantes». / This thesis is part of a broader research project which aims to analyze the potential migration of cancer cells. As part of this doctorate, we are interested in the use of image processing to count and classify cells present in an image acquired usinga microscope. The partner biologists of this project study the influence of the environment on the migratory behavior of cancer cells from cell cultures grown on different cancer cell lines. The processing of biological images has so far resulted in a significant number of publications, but in the case discussed here, since the protocol for the acquisition of images acquired was not fixed, the challenge was to propose a chain of adaptive processing that does not constrain the biologists in their research. Four steps are detailed in this paper. The first concerns the definition of pre-processing steps to homogenize the conditions of acquisition. The choice to use the image of standard deviations rather than the brightness is one of the results of this first part. The second step is to count the number of cells present in the image. An original filter, the so-called “halo” filter, that reinforces the centre of the cells in order to facilitate counting, has been proposed. A statistical validation step of the centres affords more reliability to the result. The stage of image segmentation, undoubtedly the most difficult, constitutes the third part of this work. This is a matter of extracting images each containing a single cell. The choice of segmentation algorithm was that of the “watershed”, but it was necessary to adapt this algorithm to the context of images included in this study. The proposal to use a map of probabilities as input yielded a segmentation closer to the edges of cells. As against this method leads to an over-segmentation must be reduced in order to move towards the goal: “one region = one cell”. For this algorithm the concept of using a cumulative hierarchy based on mathematical morphology has been developed. It allows the aggregation of adjacent regions by working on a tree representation ofthese regions and their associated level. A comparison of the results obtained by this method with those proposed by other approaches to limit over-segmentation has allowed us to prove the effectiveness of the proposed approach. The final step of this work consists in the classification of cells. Three classes were identified: spread cells (mesenchymal migration), “blebbing” round cells (amoeboid migration) and “smooth” round cells (intermediate stage of the migration modes). On each imagette obtained at the end of the segmentation step, intensity, morphological and textural features were calculated. An initial analysis of these features has allowed us to develop a classification strategy, namely to first separate the round cells from spread cells, and then separate the “smooth” and “blebbing” round cells. For this we divide the parameters into two sets that will be used successively in Two the stages of classification. Several classification algorithms were tested, to retain in the end, the use of two neural networks to obtain over 80% of good classification between long cells and round cells, and nearly 90% of good Classification between “smooth” and “blebbing” round cells.
916

Computação inteligente no estudo de variantes de hemoglobina / Intelligent computation applied to the study of hemoglobin variants

Sousa, Thaís Helena Samed e 29 October 2004 (has links)
A evolução in vitro é um método laboratorial criado para a evolução de moléculas, principalmente de proteínas. Por meio de mutações, o método busca novas propriedades de moléculas, objetivando criar novas proteínas e, com isso, intensificar o estudo e a cura de doenças, pelo desenvolvimento de novos fármacos. O grande desafio na evolução in vitro é criar o maior número possível de moléculas de proteínas que atinjam propriedades desejadas, uma vez que apenas uma fração infinitesimal das diversidades geradas utilizando-se seqüências de DNA é aproveitada. Para se obter moléculas com funcionalidade adequada por meio dessa técnica, é requerido muito tempo e aporte financeiro. Com o objetivo de avaliar computacionalmente a funcionalidade de proteínas variantes a partir das seqüências de aminoácidos buscando reduzir o custo e o tempo desprendido em laboratório, este trabalho propõe o uso de técnicas de computação inteligentes (evolução in silicio), baseadas em aprendizado de máquina e computação evolutiva. Para o emprego de técnicas de AM, bancos de dados com elevado número de informações são fundamentais. Neste sentido, escolheu-se investigar as moléculas mutantes de hemoglobina, uma vez que a quantidade de informações disponíveis sobre a mesma é bastante extensa na literatura. Os resultados obtidos mostram que é possível desenvolver algoritmos eficientes para determinar a funcionalidade de variantes de hemoglobina. Com esses resultados, busca-se contribuir no desenvolvimento de técnicas de evolução dirigida com suporte computacional / In vitro evolution is a laboratorial method developed to molecule evolution mainly proteins. By producing mutations, this method looks for new molecule properties, aiming achieve new proteins for the development of drugs for diseases. The great challenge of in vitro evolution is the development of the highest possible number of molecules that reaches desired properties. This objective is a great challenge to be transposed, since only one infinitesimal fraction of generated proteins using DNA sequencies is usefull to obtain molecules with the desired function. Besides high financial support and time are required to apply this technique. With the objective of evaluating computacionaly and functionality of proteins mutants starting from aminoacids sequences looking for to reduce the cost and the time loosened at laboratory, this work proposes the use of intelligent computation techniques based on learning of it conspires and evolutionary computation. On the other hand, when machine learning techniques are used, it is fundamental to access data mining with high number of information. In order to reduce these difficulties, this work proposes a machine learning (ML) based on approach to evaluate computationaly hemoglobin variants. ML techniques require, in general, large data base. In order to supply this requirement, hemoglobin variants were used because there is a large number of hemoglobin variants available in the literature. The obtained results shown that is possible to develop efficient algorithms to determine hemoglobin variant function. These results can contribute for development of molecule evolution techniques
917

"Combinação de classificadores simbólicos para melhorar o poder preditivo e descritivo de Ensembles" / Combination of symbolic classifiers to improve predictive and descriptive power of ensembles

Bernardini, Flávia Cristina 17 May 2002 (has links)
A qualidade das hipóteses induzidas pelos atuais sistemas de Aprendizado de Máquina depende principalmente da quantidade e da qualidade dos atributos e exemplos utilizados no treinamento. Freqüentemente, resultados experimentais obtidos sobre grandes bases de dados, que possuem muitos atributos irrelevantes, resultam em hipóteses de baixa precisão. Por outro lado, muitos dos sistemas de aprendizado de máquina conhecidos não estão preparados para trabalhar com uma quantidade muito grande de exemplos. Assim, uma das áreas de pesquisa mais ativas em aprendizado de máquina tem girado em torno de técnicas que sejam capazes de ampliar a capacidade dos algoritmos de aprendizado para processar muitos exemplos de treinamento, atributos e classes. Para que conceitos sejam aprendidos a partir de grandes bases de dados utilizando Aprendizado de Máquina, pode-se utilizar duas abordagens. A primeira realiza uma seleção de exemplos e atributos mais relevantes, e a segunda ´e a abordagem de ensembles. Um ensemble ´e um conjunto de classificadores cujas decisões individuais são combinadas de alguma forma para classificar um novo caso. Ainda que ensembles classifiquem novos exemplos melhor que cada classificador individual, eles se comportam como caixas pretas, no sentido de n˜ao oferecer ao usuário alguma explicação relacionada à classificação por eles fornecida. O objetivo deste trabalho é propor uma forma de combinação de classificadores simbólicos, ou seja, classificadores induzidos por algoritmos de AM simbólicos, nos quais o conhecimento é descrito na forma de regras if-then ou equivalentes, para se trabalhar com grandes bases de dados. A nossa proposta é a seguinte: dada uma grande base de dados, divide-se esta base aleatoriamente em pequenas bases de tal forma que é viável fornecer essas bases de tamanho menor a um ou vários algoritmos de AM simbólicos. Logo após, as regras que constituem os classificadores induzidos por esses algoritmos são combinadas em um único classificador. Para analisar a viabilidade do objetivo proposto, foi implementado um sistema na linguagem de programação lógica Prolog, com a finalidade de (a) avaliar regras de conhecimento induzidas por algoritmos de Aprendizado de Máquina simbólico e (b) avaliar diversas formas de combinar classificadores simbólicos bem como explicar a classificação de novos exemplos realizada por um ensemble de classificares simbólicos. A finalidade (a) é implementada pelo Módulo de Análise de Regras e a finalidade (b) pelo Módulo de Combinação e Explicação. Esses módulos constituem os módulos principais do RuleSystem. Neste trabalho, são descritos os métodos de construção de ensembles e de combinação de classificadores encontrados na literatura, o projeto e a documentação do RuleSystem, a metodologia desenvolvida para documentar o sistema RuleSystem, a implementação do Módulo de Combinação e Explicação, objeto de estudo deste trabalho, e duas aplicações do Módulo de Combinação e Explicação. A primeira aplicação utilizou uma base de dados artificiais, a qual nos permitiu observar necessidades de modificações no Módulo de Combinação e Explicação. A segunda aplicação utilizou uma base de dados reais. / The hypothesis quality induced by current machine learning algorithms depends mainly on the quantity and quality of features and examples used in the training phase. Frequently, hypothesis with low precision are obtained in experiments using large databases with a large number of irrelevant features. Thus, one active research area in machine learning is to investigate techniques able to extend the capacity of machine learning algorithms to process a large number of examples, features and classes. To learn concepts from large databases using machine learning algorithms, two approaches can be used. The first approach is based on a selection of relevant features and examples, and the second one is the ensemble approach. An ensemble is a set of classifiers whose individual decisions are combined in some way to classify a new case. Although ensembles classify new examples better than each individual classifier, they behave like black-boxes, since they do not offer any explanation to the user about their classification. The purpose of this work is to consider a form of symbolic classifiers combination to work with large databases. Given a large database, it is equally divided randomly in small databases. These small databases are supplied to one or more symbolic machine learning algorithms. After that, the rules from the resulting classifiers are combined into one classifier. To analise the viability of this proposal, was implemented a system in logic programming language Prolog, called RuleSystem. This system has two purposes; the first one, implemented by the Rule Analises Module, is to evaluate rules induced by symbolic machine learning algorithms; the second one, implemented by the Combination and Explanation Module, is to evaluate several forms of combining symbolic classifiers as well as to explain ensembled classification of new examples. Both principal modules constitute the Rule System. This work describes ensemble construction methods and combination of classifiers methods found in the literature; the project and documentation of RuleSystem; the methodology developed to document the RuleSystem; and the implementation of the Combination and Explanation Module. Two different case studies using the Combination and Explanation Module are described. The first case study uses an artificial database. Through the use of this artificial database, it was possible to improve several of the heuristics used by the the Combination and Explanation Module. A real database was used in the second case study.
918

Machine Learning for Inspired, Structured, Lyrical Music Composition

Bodily, Paul Mark 01 July 2018 (has links)
Computational creativity has been called the "final frontier" of artificial intelligence due to the difficulty inherent in defining and implementing creativity in computational systems. Despite this difficulty computer creativity is becoming a more significant part of our everyday lives, in particular music. This is observed in the prevalence of music recommendation systems, co-creational music software packages, smart playlists, and procedurally-generated video games. Significant progress can be seen in the advances in industrial applications such as Spotify, Pandora, Apple Music, etc., but several problems persist. Of more general interest, however, is the question of whether or not computers can exhibit autonomous creativity in music composition. One of the primary challenges in this endeavor is enabling computational systems to create music that exhibits global structure, that can learn structure from data, and which can effectively incorporate autonomy and intention. We seek to address these challenges in the context of a modular machine learning framework called hierarchical Bayesian program learning (HBPL). Breaking the problem of music composition into smaller pieces, we focus primarily on developing machine learning models that solve the problems related to structure. In particular we present an adaptation of non-homogenous Markov models that enable binary constraints and we present a structural learning model, the multiple Smith-Waterman (mSW) alignment method, which extends sequence alignment techniques from bioinformatics. To address the issue of intention, we incorporate our work on structured sequence generation into a full-fledged computational creative system called Pop* which we show through various evaluative means to possess to varying extents the characteristics of creativity and also creativity itself.
919

Failure Prediction using Machine Learning in a Virtualized HPC System and application

Mohammed, Bashir, Awan, Irfan U., Ugail, Hassan, Muhammad, Y. January 2019 (has links)
Yes / Failure is an increasingly important issue in high performance computing and cloud systems. As large-scale systems continue to grow in scale and complexity, mitigating the impact of failure and providing accurate predictions with sufficient lead time remains a challenging research problem. Traditional existing fault-tolerance strategies such as regular checkpointing and replication are not adequate because of the emerging complexities of high performance computing systems. This necessitates the importance of having an effective as well as proactive failure management approach in place aimed at minimizing the effect of failure within the system. With the advent of machine learning techniques, the ability to learn from past information to predict future pattern of behaviours makes it possible to predict potential system failure more accurately. Thus, in this paper, we explore the predictive abilities of machine learning by applying a number of algorithms to improve the accuracy of failure prediction. We have developed a failure prediction model using time series and machine learning, and performed comparison based tests on the prediction accuracy. The primary algorithms we considered are the Support Vector Machine (SVM), Random Forest(RF), k-Nearest Neighbors (KNN), Classi cation and Regression Trees (CART) and Linear Discriminant Analysis (LDA). Experimental results show that the average prediction accuracy of our model using SVM when predicting failure is 90% accurate and effective compared to other algorithms. This fi nding means that our method can effectively predict all possible future system and application failures within the system. / The full-text of this article will be released for public view a year after publication.
920

A Hierarchical Multi-Output Nearest Neighbor Model for Multi-Output Dependence Learning

Morris, Richard Glenn 08 March 2013 (has links)
Multi-Output Dependence (MOD) learning is a generalization of standard classification problems that allows for multiple outputs that are dependent on each other. A primary issue that arises in the context of MOD learning is that for any given input pattern there can be multiple correct output patterns. This changes the learning task from function approximation to relation approximation. Previous algorithms do not consider this problem, and thus cannot be readily applied to MOD problems. To perform MOD learning, we introduce the Hierarchical Multi-Output Nearest Neighbor model (HMONN) that employs a basic learning model for each output and a modified nearest neighbor approach to refine the initial results. This paper focuses on tasks with nominal features, although HMONN has the initial capacity for solving MOD problems with real-valued features. Results obtained using UCI repository, synthetic, and business application data sets show improved accuracy over a baseline that treats each output as independent of all the others, with HMONN showing improvement that is statistically significant in the majority of cases.

Page generated in 0.1262 seconds