101 |
Préparation non paramétrique des données pour la fouille de données multi-tables / Non-parametric data preparation for multi-relational data miningLahbib, Dhafer 06 December 2012 (has links)
Dans la fouille de données multi-tables, les données sont représentées sous un format relationnel dans lequel les individus de la table cible sont potentiellement associés à plusieurs enregistrements dans des tables secondaires en relation un-à-plusieurs. Afin de prendre en compte les variables explicatives secondaires (appartenant aux tables secondaires), la plupart des approches existantes opèrent par mise à plat, obtenant ainsi une représentation attribut-valeur classique. Par conséquent, on perd la représentation initiale naturellement compacte mais également on risque d'introduire des biais statistiques. Dans cette thèse, nous nous intéressons à évaluer directement les variables secondaires vis-à-vis de la variable cible, dans un contexte de classification supervisée. Notre méthode consiste à proposer une famille de modèles non paramétriques pour l'estimation de la densité de probabilité conditionnelle des variables secondaires. Cette estimation permet de prendre en compte les variables secondaires dans un classifieur de type Bayésien Naïf. L'approche repose sur un prétraitement supervisé des variables secondaires, par discrétisation dans le cas numérique et par groupement de valeurs dans le cas catégoriel. Dans un premier temps, ce prétraitement est effectué de façon univariée, c'est-à-dire, en considérant une seule variable secondaire à la fois. Dans un second temps, nous proposons une approche de partitionnement multivarié basé sur des itemsets de variables secondaires, ce qui permet de prendre en compte les éventuelles corrélations qui peuvent exister entre variables secondaires. Des modèles en grilles de données sont utilisés pour obtenir des critères Bayésiens permettant d'évaluer les prétraitements considérés. Des algorithmes combinatoires sont proposés pour optimiser efficacement ces critères et obtenir les meilleurs modèles.Nous avons évalué notre approche sur des bases de données multi-tables synthétiques et réelles. Les résultats montrent que les critères d'évaluation ainsi que les algorithmes d'optimisation permettent de découvrir des variables secondaires pertinentes. De plus, le classifieur Bayésien Naïf exploitant les prétraitements effectués permet d'obtenir des taux de prédiction importants. / In multi-relational data mining, data are represented in a relational form where the individuals of the target table are potentially related to several records in secondary tables in one-to-many relationship. In order take into account the secondary variables (those belonging to a non target table), most of the existing approaches operate by propositionalization, thereby losing the naturally compact initial representation and eventually introducing statistical bias. In this thesis, our purpose is to assess directly the relevance of secondary variables w.r.t. the target one, in the context of supervised classification.We propose a family of non parametric models to estimate the conditional density of secondary variables. This estimation provides an extension of the Naive Bayes classifier to take into account such variables. The approach relies on a supervised pre-processing of the secondary variables, through discretization in the numerical case and a value grouping in the categorical one. This pre-processing is achieved in two ways. In the first approach, the partitioning is univariate, i.e. by considering a single secondary variable at a time. In a second approach, we propose an itemset based multivariate partitioning of secondary variables in order to take into account any correlations that may occur between these variables. Data grid models are used to define Bayesian criteria, evaluating the considered pre-processing. Combinatorial algorithms are proposed to efficiently optimize these criteria and find good models.We evaluated our approach on synthetic and real world multi-relational databases. Experiments show that the evaluation criteria and the optimization algorithms are able to discover relevant secondary variables. In addition, the Naive Bayesian classifier exploiting the proposed pre-processing achieves significant prediction rates.
|
102 |
Discrétisation automatique de machines à signaux en automates cellulaires / Automatic discretization of signal machines into cellular automataBesson, Tom 10 April 2018 (has links)
Dans le contexte du calcul géométrique abstrait, les machines à signaux ont été développées comme le pendant continu des automates cellulaires capturant les notions de particules, de signaux et de collisions. Une question importante est la génération automatique d’un automate cellulaire reproduisant la dynamique d’une machine à signaux donnée. D’une part, il existe des conversions ad hoc. D’autre part, ce n’est pas toujours possible car certaines machines à signaux présentent des comportements « continus ». Par conséquent, la discrétisation automatique de telles structures est souvent complexe et pas toujours possible. Cette thèse propose trois manières différentes de discrétiser automatiquement les machines à signaux en automates cellulaires, avec ou sans approximation possible. La première s’intéresse à une sous-catégorie de machines à signaux, qui présente des propriétés permettant d’assurer une discrétisation automatique exacte pour toute machine de ce type. La deuxième est utilisable sur toutes les machines mais ne peut assurer ni l’exactitude ni la correction du résultat. La troisième s’appuie sur une nouvelle expression de la dynamique d’une machine à signaux pour proposer une discrétisation. Cette expression porte le nom de modularité et est décrite avant d’être utilisée pour discrétiser. / In the context of abstract geometrical computation, signal machines have been developed as a continuous counter part of cellular automata capturing the notions of particles, signals and collisions. An important issue is the automatic generation of a cellular automaton mimicking the dynamics of a given signal machine. On the one hand, ad hoc conversions exist.On the other hand, it is not always possible since some signal machines exhibit “purely continuous” behaviors. Therefore, automatically discretizing such structures is often complicated and not always possible. This thesis proposes different ways to automatically discretize signal machines into cellular automata, both with and without handling the possiblity of approximation.The first is concerned with a subcategory of signal machines, which has properties ensuring an exact automatic discretization for any machine of this type. The second is usable on all machines but cannot guarantee the exactness and correction of the result. The third is based on a new expression of the dynamics of a signal machine to propose a discretization.This dynamical expression takes the name of modularity and is described before being used to discretize.
|
103 |
Fluxo de potência ótimo multiobjetivo com restrições de segurança e variáveis discretas / Multiobjective security constrained optimal power flow with discrete variablesEllen Cristina Ferreira 11 May 2018 (has links)
O presente trabalho visa a investigação e o desenvolvimento de estratégias de otimização contínua e discreta para problemas de Fluxo de Potência Ótimo com Restrições de Segurança (FPORS) Multiobjetivo, incorporando variáveis de controle associadas a taps de transformadores em fase, chaveamentos de bancos de capacitores e reatores shunt. Um modelo Problema de Otimização Multiobjetivo (POM) é formulado segundo a soma ponderada, cujos objetivos são a minimização de perdas ativas nas linhas de transmissão e de um termo adicional que proporciona uma maior margem de reativos ao sistema. Investiga-se a incorporação de controles associados a taps e shunts como grandezas fixas, ou variáveis contínuas e discretas, sendo neste último caso aplicadas funções auxiliares do tipo polinomial e senoidal, para fins de discretização. O problema completo é resolvido via meta-heurísticas Evolutionary Particle Swarm Optimization (EPSO) e Differential Evolutionary Particle Swarm Optimization (DEEPSO). Os algoritmos foram desenvolvidos utilizando o software MatLab R2013a, sendo a metodologia aplicada aos sistemas IEEE de 14, 30, 57, 118 e 300 barras e validada sob os prismas diversidade e qualidade das soluções geradas e complexidade computacional. Os resultados obtidos demonstram o potencial do modelo e estratégias de resolução propostas como ferramentas auxiliares ao processo de tomada de decisão em Análise de Segurança de redes elétricas, maximizando as possibilidades de ação visando a redução de emergências pós-contingência. / The goal of the present work is to investigate and develop continuous and discrete optimization strategies for SCOPF problems, also taking into account control variables related to in-phase transformers, capacitor banks and shunt reactors. Multiobjective optimization model is formulated under a weighted sum criteria whose objectives are the minimization of active power losses and an additional term that yields a greater reactive support to the system. Controls associated with taps and shunts are modeled either as fixed quantities, or continuous and discrete variables, in which case auxiliary functions of polynomial and sinusoidal types are applied for discretization purposes. The complete model is solved via EPSO and DEEPSO metaheuristics. Routines coded in Matlab were applied to the IEEE 14,30, 57, 118 and 300-bus test systems, where the method was validated in terms of diversity and quality of solutions and computational complexity. The results demonstrate the robustness of the model and solution approaches and uphold it as an effective support tool for the decision-making process in Power Systems Security Analysis, maximizing preventive actions in order to avoid insecure operating conditions.
|
104 |
DSA Preconditioning for the S_N Equations with Strictly Positive Spatial DiscretizationBruss, Donald 2012 May 1900 (has links)
Preconditioners based upon sweeps and diffusion-synthetic acceleration (DSA) have been constructed and applied to the zeroth and first spatial moments of the 1-D transport equation using SN angular discretization and a strictly positive nonlinear spatial closure (the CSZ method). The sweep preconditioner was applied using the linear discontinuous Galerkin (LD) sweep operator and the nonlinear CSZ sweep operator. DSA preconditioning was applied using the linear LD S2 equations and the nonlinear CSZ S2 equations. These preconditioners were applied in conjunction with a Jacobian-free Newton Krylov (JFNK) method utilizing Flexible GMRES.
The action of the Jacobian on the Krylov vector was difficult to evaluate numerically with a finite difference approximation because the angular flux spanned many orders of magnitude. The evaluation of the perturbed residual required constructing the nonlinear CSZ operators based upon the angular flux plus some perturbation. For cases in which the magnitude of the perturbation was comparable to the local angular flux, these nonlinear operators were very sensitive to the perturbation and were significantly different than the unperturbed operators. To resolve this shortcoming in the finite difference approximation, in these cases the residual evaluation was performed using nonlinear operators "frozen" at the unperturbed local psi. This was a Newton method with a perturbation fixup. Alternatively, an entirely frozen method always performed the Jacobian evaluation using the unperturbed nonlinear operators. This frozen JFNK method was actually a Picard iteration scheme. The perturbed Newton's method proved to be slightly less expensive than the Picard iteration scheme.
The CSZ sweep preconditioner was significantly more effective than preconditioning with the LD sweep. Furthermore, the LD sweep is always more expensive to apply than the CSZ sweep. The CSZ sweep is superior to the LD sweep as a preconditioner. The DSA preconditioners were applied in conjunction with the CSZ sweep. The nonlinear CSZ DSA preconditioner did not form a more effective preconditioner than the linear DSA preconditioner in this 1-D analysis. As it is very difficult to construct a CSZ diffusion equation in more than one dimension, it will be very beneficial if the results regarding the effectiveness of the LD DSA preconditioner are applicable to multi-dimensional problems.
|
105 |
Modern Mathematical Methods In Modeling And Dynamics Ofregulatory Systems Of Gene-environment NetworksDefterli, Ozlem 01 September 2011 (has links) (PDF)
Inferring and anticipation of genetic networks based on experimental data and environmental
measurements is a challenging research problem of mathematical modeling.
In this thesis, we discuss gene-environment network models whose dynamics are represented by a class of time-continuous systems of ordinary differential equations containing unknown parameters to be optimized. Accordingly, time-discrete version of that model class is studied
and improved by using different numerical methods. In this aspect, 3rd-order Heun&rsquo / s method and 4th-order classical Runge-Kutta method are newly introduced, iteration formulas are derived and corresponding matrix algebras are newly obtained.
We use nonlinear mixed-integer programming for the parameter estimation and present the solution of a constrained and regularized given mixed-integer problem. By using this solution and applying the 3rd-order Heun&rsquo / s and 4th-order classical Runge-Kutta methods in the timediscretized
model, we generate corresponding time-series of gene-expressions by this thesis. Two illustrative numerical examples are studied newly with an artificial data set and a realworld
data set which expresses a real phenomenon. All the obtained approximate results are compared to see the goodness of the new schemes. Different step-size analysis and sensitivity
tests are also investigated to obtain more accurate and stable predictions of time-series results for a better service in the real-world application areas.
The presented time-continuous and time-discrete dynamical models are identified based on given data, and studied by means of an analytical theory and stability theories of rarefication, regularization and robustification.
|
106 |
On the Autoconvolution Equation and Total Variation ConstraintsFleischer, G., Gorenflo, R., Hofmann, B. 30 October 1998 (has links) (PDF)
This paper is concerned with the numerical analysis of the autoconvolution equation
$x*x=y$ restricted to the interval [0,1]. We present a discrete constrained least
squares approach and prove its convergence in $L^p(0,1),1<p<\infinite$ , where
the regularization is based on a prescribed bound for the total variation of admissible
solutions. This approach includes the case of non-smooth solutions possessing jumps.
Moreover, an adaption to the Sobolev space $H^1(0,1)$ and some remarks on monotone
functions are added. The paper is completed by a numerical case study concerning
the determination of non-monotone smooth and non-smooth functions x from the autoconvolution
equation with noisy data y.
|
107 |
Mining Associations Using Directed HypergraphsSimha, Ramanuja N. 01 January 2011 (has links)
This thesis proposes a novel directed hypergraph based model for any database. We introduce the notion of association rules for multi-valued attributes, which is an adaptation of the definition of quantitative association rules known in the literature. The association rules for multi-valued attributes are integrated in building the directed hypergraph model. This model allows to capture attribute-level associations and their strength. Basing on this model, we provide association-based similarity notions between any two attributes and present a method for finding clusters of similar attributes. We then propose algorithms to identify a subset of attributes known as a leading indicator that influences the values of almost all other attributes. Finally, we present an association-based classifier that can be used to predict values of attributes. We demonstrate the effectiveness of our proposed model, notions, algorithms, and classifier through experiments on a financial time-series data set (S&P 500).
|
108 |
A Parallel Newton-Krylov-Schur Algorithm for the Reynolds-Averaged Navier-Stokes EquationsOsusky, Michal 13 January 2014 (has links)
Aerodynamic shape optimization and multidisciplinary optimization algorithms have the potential not only to improve conventional
aircraft, but also to enable the design of novel configurations. By their very nature, these algorithms generate and analyze a large
number of unique shapes, resulting in high computational costs. In order to improve their efficiency and enable their use in the
early stages of the design process, a fast and robust flow solution algorithm is necessary.
This thesis presents an efficient parallel Newton-Krylov-Schur flow solution algorithm for the three-dimensional
Navier-Stokes equations coupled with the Spalart-Allmaras one-equation turbulence model.
The algorithm employs second-order summation-by-parts (SBP) operators on multi-block structured grids with simultaneous
approximation terms (SATs) to enforce block interface coupling and boundary conditions.
The discrete equations are solved iteratively with an inexact-Newton method, while the linear
system at each Newton iteration is solved using the flexible Krylov
subspace iterative method GMRES with an approximate-Schur parallel preconditioner. The algorithm is thoroughly verified and validated, highlighting the
correspondence of the current algorithm with several established flow solvers.
The solution for a transonic flow over a wing on a mesh of medium density (15 million nodes) shows good agreement with experimental results.
Using 128 processors, deep convergence is obtained in under 90 minutes.
The solution of transonic flow over the Common Research Model wing-body geometry with
grids with up to 150 million nodes exhibits the expected grid
convergence behavior. This case was completed as part of the Fifth AIAA Drag Prediction Workshop,
with the algorithm producing solutions that compare favourably with several widely used flow solvers.
The algorithm is shown to scale well on over 6000 processors. The results demonstrate the effectiveness of the SBP-SAT
spatial discretization, which can be readily extended to high order, in combination with
the Newton-Krylov-Schur iterative method to produce a powerful parallel algorithm for the numerical solution of
the Reynolds-averaged Navier-Stokes equations.
The algorithm can efficiently solve the flow over a range of clean geometries, making it suitable for
use at the core of an optimization algorithm.
|
109 |
A Parallel Newton-Krylov-Schur Algorithm for the Reynolds-Averaged Navier-Stokes EquationsOsusky, Michal 13 January 2014 (has links)
Aerodynamic shape optimization and multidisciplinary optimization algorithms have the potential not only to improve conventional
aircraft, but also to enable the design of novel configurations. By their very nature, these algorithms generate and analyze a large
number of unique shapes, resulting in high computational costs. In order to improve their efficiency and enable their use in the
early stages of the design process, a fast and robust flow solution algorithm is necessary.
This thesis presents an efficient parallel Newton-Krylov-Schur flow solution algorithm for the three-dimensional
Navier-Stokes equations coupled with the Spalart-Allmaras one-equation turbulence model.
The algorithm employs second-order summation-by-parts (SBP) operators on multi-block structured grids with simultaneous
approximation terms (SATs) to enforce block interface coupling and boundary conditions.
The discrete equations are solved iteratively with an inexact-Newton method, while the linear
system at each Newton iteration is solved using the flexible Krylov
subspace iterative method GMRES with an approximate-Schur parallel preconditioner. The algorithm is thoroughly verified and validated, highlighting the
correspondence of the current algorithm with several established flow solvers.
The solution for a transonic flow over a wing on a mesh of medium density (15 million nodes) shows good agreement with experimental results.
Using 128 processors, deep convergence is obtained in under 90 minutes.
The solution of transonic flow over the Common Research Model wing-body geometry with
grids with up to 150 million nodes exhibits the expected grid
convergence behavior. This case was completed as part of the Fifth AIAA Drag Prediction Workshop,
with the algorithm producing solutions that compare favourably with several widely used flow solvers.
The algorithm is shown to scale well on over 6000 processors. The results demonstrate the effectiveness of the SBP-SAT
spatial discretization, which can be readily extended to high order, in combination with
the Newton-Krylov-Schur iterative method to produce a powerful parallel algorithm for the numerical solution of
the Reynolds-averaged Navier-Stokes equations.
The algorithm can efficiently solve the flow over a range of clean geometries, making it suitable for
use at the core of an optimization algorithm.
|
110 |
Data Mining For Rule Discovery In Relational DatabasesToprak, Serkan 01 September 2004 (has links) (PDF)
Data is mostly stored in relational databases today. However, most data mining algorithms are not capable of working on data stored in relational databases directly. Instead they require a preprocessing step for transforming relational data into algorithm specified form. Moreover, several data mining algorithms provide solutions for single relations only. Therefore, valuable hidden knowledge involving multiple relations remains undiscovered. In this thesis, an implementation is developed for discovering multi-relational association rules in relational databases. The implementation is based on a framework providing a representation of patterns in relational databases, refinement methods of patterns, and primitives for obtaining necessary record counts from database to calculate measures for patterns. The framework exploits meta-data of relational databases for pruning search space of patterns. The implementation extends the
framework by employing Apriori algorithm for further pruning the search space and discovering relational recursive patterns. Apriori algorithm is used for finding large itemsets of tables, which are used to refine patterns. Apriori algorithm is modified by changing support calculation method for itemsets. A method
for determining recursive relations is described and a solution is
provided for handling recursive patterns using aliases. Additionally, continuous attributes of tables are discretized utilizing equal-depth partitioning. The implementation is
tested with gene localization prediction task of KDD Cup 2001 and
results are compared to those of the winner approach.
|
Page generated in 0.12 seconds