Global ETD Search

51	A scalable evolutionary learning classifier system for knowledge discovery in stream data mining Dam, Hai Huong, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW January 2008 (has links) Data mining (DM) is the process of finding patterns and relationships in databases. The breakthrough in computer technologies triggered a massive growth in data collected and maintained by organisations. In many applications, these data arrive continuously in large volumes as a sequence of instances known as a data stream. Mining these data is known as stream data mining. Due to the large amount of data arriving in a data stream, each record is normally expected to be processed only once. Moreover, this process can be carried out on different sites in the organisation simultaneously making the problem distributed in nature. Distributed stream data mining poses many challenges to the data mining community including scalability and coping with changes in the underlying concept over time. In this thesis, the author hypothesizes that learning classifier systems (LCSs) - a class of classification algorithms - have the potential to work efficiently in distributed stream data mining. LCSs are an incremental learner, and being evolutionary based they are inherently adaptive. However, they suffer from two main drawbacks that hinder their use as fast data mining algorithms. First, they require a large population size, which slows down the processing of arriving instances. Second, they require a large number of parameter settings, some of them are very sensitive to the nature of the learning problem. As a result, it becomes difficult to choose a right setup for totally unknown problems. The aim of this thesis is to attack these two problems in LCS, with a specific focus on UCS - a supervised evolutionary learning classifier system. UCS is chosen as it has been tested extensively on classification tasks and it is the supervised version of XCS, a state of the art LCS. In this thesis, the architectural design for a distributed stream data mining system will be first introduced. The problems that UCS should face in a distributed data stream task are confirmed through a large number of experiments with UCS and the proposed architectural design. To overcome the problem of large population sizes, the idea of using a Neural Network to represent the action in UCS is proposed. This new system - called NLCS { was validated experimentally using a small fixed population size and has shown a large reduction in the population size needed to learn the underlying concept in the data. An adaptive version of NLCS called ANCS is then introduced. The adaptive version dynamically controls the population size of NLCS. A comprehensive analysis of the behaviour of ANCS revealed interesting patterns in the behaviour of the parameters, which motivated an ensemble version of the algorithm with 9 nodes, each using a different parameter setting. In total they cover all patterns of behaviour noticed in the system. A voting gate is used for the ensemble. The resultant ensemble does not require any parameter setting, and showed better performance on all datasets tested. The thesis concludes with testing the ANCS system in the architectural design for distributed environments proposed earlier. The contributions of the thesis are: (1) reducing the UCS population size by an order of magnitude using a neural representation; (2) introducing a mechanism for adapting the population size; (3) proposing an ensemble method that does not require parameter setting; and primarily (4) showing that the proposed LCS can work efficiently for distributed stream data mining tasks. Data mining Action map Classification Data stream Neural network Noisy data Non-stationary environment Reinforcement learning Rule-based system Static environment Stream data mining Supervised learning Distributed data mining Dynamic environment Ensemble learning Evolutionary computation Genetic algorithm Knowledge discovery Learning classifier system Negative correlation learning
52	A scalable evolutionary learning classifier system for knowledge discovery in stream data mining Dam, Hai Huong, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW January 2008 (has links) Data mining (DM) is the process of finding patterns and relationships in databases. The breakthrough in computer technologies triggered a massive growth in data collected and maintained by organisations. In many applications, these data arrive continuously in large volumes as a sequence of instances known as a data stream. Mining these data is known as stream data mining. Due to the large amount of data arriving in a data stream, each record is normally expected to be processed only once. Moreover, this process can be carried out on different sites in the organisation simultaneously making the problem distributed in nature. Distributed stream data mining poses many challenges to the data mining community including scalability and coping with changes in the underlying concept over time. In this thesis, the author hypothesizes that learning classifier systems (LCSs) - a class of classification algorithms - have the potential to work efficiently in distributed stream data mining. LCSs are an incremental learner, and being evolutionary based they are inherently adaptive. However, they suffer from two main drawbacks that hinder their use as fast data mining algorithms. First, they require a large population size, which slows down the processing of arriving instances. Second, they require a large number of parameter settings, some of them are very sensitive to the nature of the learning problem. As a result, it becomes difficult to choose a right setup for totally unknown problems. The aim of this thesis is to attack these two problems in LCS, with a specific focus on UCS - a supervised evolutionary learning classifier system. UCS is chosen as it has been tested extensively on classification tasks and it is the supervised version of XCS, a state of the art LCS. In this thesis, the architectural design for a distributed stream data mining system will be first introduced. The problems that UCS should face in a distributed data stream task are confirmed through a large number of experiments with UCS and the proposed architectural design. To overcome the problem of large population sizes, the idea of using a Neural Network to represent the action in UCS is proposed. This new system - called NLCS { was validated experimentally using a small fixed population size and has shown a large reduction in the population size needed to learn the underlying concept in the data. An adaptive version of NLCS called ANCS is then introduced. The adaptive version dynamically controls the population size of NLCS. A comprehensive analysis of the behaviour of ANCS revealed interesting patterns in the behaviour of the parameters, which motivated an ensemble version of the algorithm with 9 nodes, each using a different parameter setting. In total they cover all patterns of behaviour noticed in the system. A voting gate is used for the ensemble. The resultant ensemble does not require any parameter setting, and showed better performance on all datasets tested. The thesis concludes with testing the ANCS system in the architectural design for distributed environments proposed earlier. The contributions of the thesis are: (1) reducing the UCS population size by an order of magnitude using a neural representation; (2) introducing a mechanism for adapting the population size; (3) proposing an ensemble method that does not require parameter setting; and primarily (4) showing that the proposed LCS can work efficiently for distributed stream data mining tasks. Data mining Action map Classification Data stream Neural network Noisy data Non-stationary environment Reinforcement learning Rule-based system Static environment Stream data mining Supervised learning Distributed data mining Dynamic environment Ensemble learning Evolutionary computation Genetic algorithm Knowledge discovery Learning classifier system Negative correlation learning
53	Modélisation et implémentation de parallélisme implicite pour les simulations scientifiques basées sur des maillages / Model and implementation of implicit parallélism for mesh-based scientific simulations Coullon, Hélène 29 September 2014 (has links) Le calcul scientifique parallèle est un domaine en plein essor qui permet à la fois d’augmenter la vitesse des longs traitements, de traiter des problèmes de taille plus importante ou encore des problèmes plus précis. Ce domaine permet donc d’aller plus loin dans les calculs scientifiques, d’obtenir des résultats plus pertinents, car plus précis, ou d’étudier des problèmes plus volumineux qu’auparavant. Dans le monde plus particulier de la simulation numérique scientifique, la résolution d’équations aux dérivées partielles (EDP) est un calcul particulièrement demandeur de ressources parallèles. Si les ressources matérielles permettant le calcul parallèle sont de plus en plus présentes et disponibles pour les scientifiques, à l’inverse leur utilisation et la programmation parallèle se démocratisent difficilement. Pour cette raison, des modèles de programmation parallèle, des outils de développement et même des langages de programmation parallèle ont vu le jour et visent à simplifier l’utilisation de ces machines. Il est toutefois difficile, dans ce domaine dit du “parallélisme implicite”, de trouver le niveau d’abstraction idéal pour les scientifiques, tout en réduisant l’effort de programmation. Ce travail de thèse propose tout d’abord un modèle permettant de mettre en oeuvre des solutions de parallélisme implicite pour les simulations numériques et la résolution d’EDP. Ce modèle est appelé “Structured Implicit Parallelism for scientific SIMulations” (SIPSim), et propose une vision au croisement de plusieurs types d’abstraction, en tentant de conserver les avantages de chaque vision. Une première implémentation de ce modèle, sous la forme d’une librairie C++ appelée SkelGIS, est proposée pour les maillages cartésiens à deux dimensions. Par la suite, SkelGIS, et donc l’implémentation du modèle, est étendue à des simulations numériques sur les réseaux (permettant l’application de simulations représentant plusieurs phénomènes physiques). Les performances de ces deux implémentations sont évaluées et analysées sur des cas d’application réels et complexes et démontrent qu’il est possible d’obtenir de bonnes performances en implémentant le modèle SIPSim. / Parallel scientific computations is an expanding domain of computer science which increases the speed of calculations and offers a way to deal with heavier or more accurate calculations. Thus, the interest of scientific computations increases, with more precised results and bigger physical domains to study. In the particular case of scientific numerical simulations, solving partial differential equations (PDEs) is an especially heavy calculation and a perfect applicant to parallel computations. On one hand, it is more and more easy to get an access to very powerfull parallel machines and clusters, but on the other hand parallel programming is hard to democratize, and most scientists are not able to use these machines. As a result, high level programming models, framework, libraries, languages etc. have been proposed to hide technical details of parallel programming. However, in this “implicit parallelism” field, it is difficult to find the good abstraction level while keeping a low programming effort. This thesis proposes a model to write implicit parallelism solutions for numerical simulations such as mesh-based PDEs computations. This model is called “Structured Implicit Parallelism for scientific SIMulations” (SIPSim), and proposes an approach at the crossroads of existing solutions, taking advantage of each one. A first implementation of this model is proposed, as a C++ library called SkelGIS, for two dimensional Cartesian meshes. A second implementation of the model, and an extension of SkelGIS, proposes an implicit parallelism solution for network-simulations (which deals with simulations with multiple physical phenomenons), and is studied in details. A performance analysis of both these implementations is given on real case simulations, and it demonstrates that the SIPSim model can be implemented efficiently. Parallélisme implicite Modèle de haut niveau Effort de programmation Structures de données distribuées Partitionnement d’hypergraphes Distribution de données Simulations numériques Équations aux dérivées partielles Maillages cartésiens Réseaux Implicit parallelism High level programming models Development effort Distributed data structures Hypergraph partitioning Data distribution Numerical simulations Partial differential equations Cartesian meshes Networks 004
54	Métodos e softwares para análise da produção científica e detecção de frentes emergentes de pesquisa / Methods and software for scientific production analysis and detection of emerging research trends REIS JUNIOR, JOSE S.B. 21 December 2016 (has links) Submitted by Marco Antonio Oliveira da Silva (maosilva@ipen.br) on 2016-12-21T15:07:24Z No. of bitstreams: 0 / Made available in DSpace on 2016-12-21T15:07:24Z (GMT). No. of bitstreams: 0 / O progresso de projetos anteriores salientou a necessidade de tratar o problema dos softwares para detecção, a partir de bases de dados de publicações científicas, de tendências emergentes de pesquisa e desenvolvimento. Evidenciou-se a carência de aplicações computacionais eficientes dedicadas a este propósito, que são artigos de grande utilidade para um melhor planejamento de programas de pesquisa e desenvolvimento em instituições. Foi realizada, então, uma revisão dos softwares atualmente disponíveis, para poder-se delinear claramente a oportunidade de desenvolver novas ferramentas. Como resultado, implementou-se um aplicativo chamado Citesnake, projetado especialmente para auxiliar a detecção e o estudo de tendências emergentes a partir da análise de redes de vários tipos, extraídas das bases de dados científicas. Através desta ferramenta computacional robusta e eficaz, foram conduzidas análises de frentes emergentes de pesquisa e desenvolvimento na área de Sistemas Geradores de Energia Nuclear de Geração IV, de forma que se pudesse evidenciar, dentre os tipos de reatores selecionados como os mais promissores pelo GIF - Generation IV International Forum, aqueles que mais se desenvolveram nos últimos dez anos e que se apresentam, atualmente, como os mais capazes de cumprir as promessas realizadas sobre os seus conceitos inovadores. / Dissertação (Mestrado em Tecnologia Nuclear) / IPEN/D / Instituto de Pesquisas Energéticas e Nucleares - IPEN-CNEN/SP reactors power generation reactor monitoring systems reactor technology site characterization nuclear engineering computer networks tools technology transfer research programs comparative evaluations data covariances data base management data compilation data tagging distributed data processing documentation information centers information dissemination information retrieval information theory knowledge management performance accuracy specificity tolerance
55	Studies on two specific inverse problems from imaging and finance Rückert, Nadja 20 July 2012 (has links) (PDF) This thesis deals with regularization parameter selection methods in the context of Tikhonov-type regularization with Poisson distributed data, in particular the reconstruction of images, as well as with the identiﬁcation of the volatility surface from observed option prices. In Part I we examine the choice of the regularization parameter when reconstructing an image, which is disturbed by Poisson noise, with Tikhonov-type regularization. This type of regularization is a generalization of the classical Tikhonov regularization in the Banach space setting and often called variational regularization. After a general consideration of Tikhonov-type regularization for data corrupted by Poisson noise, we examine the methods for choosing the regularization parameter numerically on the basis of two test images and real PET data. In Part II we consider the estimation of the volatility function from observed call option prices with the explicit formula which has been derived by Dupire using the Black-Scholes partial diﬀerential equation. The option prices are only available as discrete noisy observations so that the main diﬃculty is the ill-posedness of the numerical diﬀerentiation. Finite diﬀerence schemes, as regularization by discretization of the inverse and ill-posed problem, do not overcome these diﬃculties when they are used to evaluate the partial derivatives. Therefore we construct an alternative algorithm based on the weak formulation of the dual Black-Scholes partial diﬀerential equation and evaluate the performance of the ﬁnite diﬀerence schemes and the new algorithm for synthetic and real option prices. Tikhonov-Regularisierung Kullback-Leibler Totale Variation Poissonverteilte Daten Bildverarbeitung Wahl des Regularisierungsparameters Diskrepanzprinzip L-Kurve Quasi-Optimalitätskriterium PET Black-Scholes-Modell Volatilität Dupire Call-Option Regularisierung durch Diskretisierung Parameter Identifikationsalgorithmus Tikhonov-Regularization Kullback-Leibler Total Variation Poisson distributed data discrepancy principle L-curve method quasi-optimality criterion PET Black-Scholes model volatility Dupire call option regularization by discretization imaging finance ddc:515 ddc:519 Tichonov-Regularisierung Black-Scholes-Modell Volatilität Regularisierungsverfahren Poisson-Verteilung Bildverarbeitung Call-Option Parameter <Mathematik>
56	Studies on two specific inverse problems from imaging and finance Rückert, Nadja 16 July 2012 (has links) This thesis deals with regularization parameter selection methods in the context of Tikhonov-type regularization with Poisson distributed data, in particular the reconstruction of images, as well as with the identiﬁcation of the volatility surface from observed option prices. In Part I we examine the choice of the regularization parameter when reconstructing an image, which is disturbed by Poisson noise, with Tikhonov-type regularization. This type of regularization is a generalization of the classical Tikhonov regularization in the Banach space setting and often called variational regularization. After a general consideration of Tikhonov-type regularization for data corrupted by Poisson noise, we examine the methods for choosing the regularization parameter numerically on the basis of two test images and real PET data. In Part II we consider the estimation of the volatility function from observed call option prices with the explicit formula which has been derived by Dupire using the Black-Scholes partial diﬀerential equation. The option prices are only available as discrete noisy observations so that the main diﬃculty is the ill-posedness of the numerical diﬀerentiation. Finite diﬀerence schemes, as regularization by discretization of the inverse and ill-posed problem, do not overcome these diﬃculties when they are used to evaluate the partial derivatives. Therefore we construct an alternative algorithm based on the weak formulation of the dual Black-Scholes partial diﬀerential equation and evaluate the performance of the ﬁnite diﬀerence schemes and the new algorithm for synthetic and real option prices. info:eu-repo/classification/ddc/515 ddc:515 info:eu-repo/classification/ddc/519 ddc:519

Page generated in 0.065 seconds