• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 87
  • 39
  • 23
  • 18
  • 16
  • 6
  • 4
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 225
  • 49
  • 34
  • 32
  • 29
  • 28
  • 28
  • 27
  • 24
  • 24
  • 23
  • 23
  • 22
  • 22
  • 21
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
91

On Numerical Error Estimation for the Finite-Volume Method with an Application to Computational Fluid Dynamics

Tyson, William Conrad 29 November 2018 (has links)
Computational fluid dynamics (CFD) simulations can provide tremendous insight into complex physical processes and are often faster and more cost-effective to execute than experiments. However, each CFD result inherently contains numerical errors that can significantly degrade the accuracy of a simulation. Discretization error is typically the largest contributor to the overall numerical error in a given simulation. Discretization error can be very difficult to estimate since the generation, transport, and diffusion of these errors is a highly nonlinear function of the computational grid and discretization scheme. As CFD is increasingly used in engineering design and analysis, it is imperative that CFD practitioners be able to accurately quantify discretization errors to minimize risk and improve the performance of engineering systems. In this work, improvements are made to the accuracy and efficiency of existing error estimation techniques. Discretization error is estimated by deriving and solving an error transport equation (ETE) for the local discretization error everywhere in the computational domain. Truncation error is shown to act as the local source for discretization error in numerical solutions. An equivalence between adjoint methods and ETE methods for functional error estimation is presented. This adjoint/ETE equivalence is exploited to efficiently obtain error estimates for multiple output functionals and to extend the higher-order properties of adjoint methods to ETE methods. Higher-order discretization error estimates are obtained when truncation error estimates are sufficiently accurate. Truncation error estimates are demonstrated to deteriorate on grids with a non-smooth variation in grid metrics (e.g., unstructured grids) regardless of how smooth the underlying exact solution may be. The loss of accuracy is shown to stem from noise in the discrete solution on the order of discretization error. When using conventional least-squares reconstruction techniques, this noise is exactly captured and introduces a lower-order error into the truncation error estimate. A novel reconstruction method based on polyharmonic smoothing splines is developed to smoothly reconstruct the discrete solution and improve the accuracy of error estimates. Furthermore, a method for iteratively improving discretization error estimates is devised. Efficiency and robustness considerations are discussed. Results are presented for several inviscid and viscous flow problems. To facilitate the study of discretization error estimation, a new, higher-order finite-volume solver is developed. A detailed description of the code base is provided along with a discussion of best practices for CFD code design. / Ph. D. / Computational fluid dynamics (CFD) is a branch of computational physics at the intersection of fluid mechanics and scientific computing in which the governing equations of fluid motion, such as the Euler and Navier-Stokes equations, are solved numerically on a computer. CFD is utilized in numerous fields including biomedical engineering, meteorology, oceanography, and aerospace engineering. CFD simulations can provide tremendous insight into physical processes and are often preferred over experiments because they can be performed more quickly, are typically more cost-effective, and can provide data in regions where it may be difficult to measure. While CFD can be an extremely powerful tool, CFD simulations are inherently subject to numerical errors. These errors, which are generated when the governing equations of fluid motion are solved on a computer, can have a significant impact on the accuracy of a CFD solution. If numerical errors are not accurately quantified, ill-informed decision-making can lead to poor system performance, increased risk of injury, or even system failure. In this work, research efforts are focused on numerical error estimation for the finite -volume method, arguably the most widely used numerical algorithm for solving CFD problems. The error estimation techniques provided herein target discretization error, the largest contributor to the overall numerical error in a given simulation. Discretization error can be very difficult to estimate since these errors are generated, convected, and diffused by the same physical processes embedded in the governing equations. In this work, improvements are made to the accuracy and efficiency of existing discretization error estimation techniques. Results are presented for several inviscid and viscous flow problems. To facilitate the study of these error estimators, a new, higher-order finite -volume solver is developed. A detailed description of the code base is provided along with a discussion of best practices for CFD code design.
92

Discretization Error Estimation and Exact Solution Generation Using the 2D Method of Nearby Problems

Kurzen, Matthew James 17 March 2010 (has links)
This work examines the Method of Nearby Problems as a way to generate analytical exact solutions to problems governed by partial differential equations (PDEs). The method involves generating a numerical solution to the original problem of interest, curve fitting the solution, and generating source terms by operating the governing PDEs upon the curve fit. Adding these source terms to the right-hand-side of the governing PDEs defines the nearby problem. In addition to its use for generating exact solutions the MNP can be extended for use as an error estimator. The nearby problem can be solved numerically on the same grid as the original problem. The nearby problem discretization error is calculated as the difference between its numerical solution and exact solution (curve fit). This is an estimate of the discretization error in the original problem of interest. The accuracy of the curve fits is quite important to this work. A method of curve fitting that takes local least squares fits and combines them together with weighting functions is used. This results in a piecewise fit with continuity at interface boundaries. A one-dimensional Burgers' equation case shows this to be a better approach then global curve fits. Six two-dimensional cases are investigated including solutions to the time-varying Burgers' equation and to the 2D steady Euler equations. The results show that the Method of Nearby Problems can be used to create realistic, analytical exact solutions to problems governed by PDEs. The resulting discretization error estimates are also shown to be reasonable for several cases examined. / Master of Science
93

On Viscous Flux Discretization Procedures For Finite Volume And Meshless Solvers

Munikrishna, N 06 1900 (has links)
This work deals with discretizing viscous fluxes in the context of unstructured data based finite volume and meshless solvers, two competing methodologies for simulating viscous flows past complex industrial geometries. The two important requirements of a viscous discretization procedure are consistency and positivity. While consistency is a fundamental requirement, positivity is linked to the robustness of the solution methodology. The following advancements are made through this work within the finite volume and meshless frameworks. Finite Volume Method: Several viscous discretization procedures available in the literature are reviewed for: 1. ability to handle general grid elements 2. efficiency, particularly for 3D computations 3. consistency 4. positivity as applied to a model equation 5. global error behavior as applied to a model equation. While some of the popular procedures result in inconsistent formulation, the consistent procedures are observed to be computationally expensive and also have problems associated with robustness. From a systematic global error study, we have observed that even a formally inconsistent scheme exhibits consistency in terms of global error i.e., the global error decreases with grid refinement. This observation is important and also encouraging from the view point of devising a suitable discretization scheme for viscous fluxes. This study suggests that, one can relax the consistency requirement in order to gain in terms of robustness and computational cost, two key ingredients for any industrial flow solver. Some of the procedures are analysed for positivity as applied to a Laplacian and it is found that the two requirements of a viscous discretization procedure, consistency(accuracy) and positivity are essentially conflicting. Based on the review, four representative schemes are selected and used in HIFUN-2D(High resolution Flow Solver on UNstructured Meshes), an unstructured data based cell center finite volume flow solver, to simulate standard laminar and turbulent flow test cases. From the analysis, we can advocate the use of Green Gauss theorem based diamond path procedure which can render high level of robustness to the flow solver for industrial computations. Meshless Method: An Upwind-Least Squares Finite Difference(LSFD-U) meshless solver is developed for simulating viscous flows. Different viscous discretization procedures are proposed and analysed for positivity and the procedure which is found to be more positive is employed. Obtaining suitable point distribution, particularly for viscous flow computations happens to be one of the important components for the success of the meshless solvers. In principle, the meshless solvers can operate on any point distribution obtained using structured, unstructured and Cartesian meshes. But, the Cartesian meshing happens to be the most natural candidate for obtaining the point distribution. Therefore, the performance of LSFD-U for simulating viscous flows using point distribution obtained from Cartesian like grids is evaluated. While we have successfully computed laminar viscous flows, there are difficulties in terms of solving turbulent flows. In this context, we have evolved a strategy to generate suitable point distribution for simulating turbulent flows using meshless solver. The strategy involves a hybrid Cartesian point distribution wherein the region of boundary layer is filled with high aspect ratio body-fitted structured mesh and the potential flow region with unit aspect ratio Cartesian mesh. The main advantage of our solver is in terms of handling the structured and Cartesian grid interface. The interface algorithm is considerably simplified compared to the hybrid Cartesian mesh based finite volume methodology by exploiting the advantage accrue out of the use of meshless solver. Cheap, simple and robust discretization procedures are evolved for both inviscid and viscous fluxes, exploiting the basic features exhibited by the hybrid point distribution. These procedures are also subjected to positivity analysis and a systematic global error study. It should be remarked that the viscous discretization procedure employed in structured grid block is positive and in fact, this feature imparts the required robustness to the solver for computing turbulent flows. We have demonstrated the capability of the meshless solver LSFDU to solve turbulent flow past complex aerodynamic configurations by solving flow past a multi element airfoil configuration. In our view, the success shown by this work in computing turbulent flows can be considered as a landmark development in the area of meshless solvers and has great potential in industrial applications.
94

Approaches to accommodate remeshing in shape optimization

Wilke, Daniel Nicolas 20 January 2011 (has links)
This study proposes novel optimization methodologies for the optimization of problems that reveal non-physical step discontinuities. More specifically, it is proposed to use gradient-only techniques that do not use any zeroth order information at all for step discontinuous problems. A step discontinuous problem of note is the shape optimization problem in the presence of remeshing strategies, since changes in mesh topologies may - and normally do - introduce non-physical step discontinuities. These discontinuities may in turn manifest themselves as non-physical local minima in which optimization algorithms may become trapped. Conventional optimization approaches for step discontinuous problems include evolutionary strategies, and design of experiment (DoE) techniques. These conventional approaches typically rely on the exclusive use of zeroth order information to overcome the discontinuities, but are characterized by two important shortcomings: Firstly, the computational demands of zero order methods may be very high, since many function values are in general required. Secondly, the use of zero order information only does not necessarily guarantee that the algorithms will not terminate in highly unfit local minima. In contrast, the methodologies proposed herein use only first order information, rather than only zeroth order information. The motivation for this approach is that associated gradient information in the presence of remeshing remains accurately and uniquely computable, notwithstanding the presence of discontinuities. From a computational effort point of view, a gradient-only approach is of course comparable to conventional gradient based techniques. In addition, the step discontinuities do not manifest themselves as local minima. / Thesis (PhD)--University of Pretoria, 2010. / Mechanical and Aeronautical Engineering / unrestricted
95

Adaptive modeling of plate structures / Modélisation adaptive des structures

Bohinc, Uroš 05 May 2011 (has links)
L’objectif principal de la thèse est de répondre à des questions liées aux étapes clé d’un processus de l’adaptation de modèles de plaques. Comme l’adaptativité dépend des estimateurs d’erreurs fiables, une part importante du rapport est dédiée au développement des méthodes numériques pour les estimateurs d’erreurs aussi bien dues à la discrétisation qu’au choix du modèle. Une comparaison des estimateurs d’erreurs de discrétisation d’un point de vue pratique est présentée. Une attention particulière est prêtée a la méthode de résiduels équilibrés (en anglais, "equilibrated residual method"), laquelle est potentiellement applicable aux estimations des deux types d’erreurs, de discrétisation et de modèle.Il faut souligner que, contrairement aux estimateurs d’erreurs de discrétisation, les estimateurs d’erreur de modèle sont plus difficiles à élaborer. Le concept de l’adaptativité de modèles pour les plaques est implémenté sur la base de la méthode de résiduels équilibrés et de la famille hiérarchique des éléments finis de plaques. Les éléments finis dérivés dans le cadre de la thèse, comprennent aussi bien les éléments de plaques minces et que les éléments de plaques épaisses. Ces derniers sont formulés en s’appuyant sur une théorie nouvelle de plaque, intégrant aussi les effets d’étirement le long de l’épaisseur. Les erreurs de modèle sont estimées via des calculs élément par élément. Les erreurs de discrétisation et de modèle sont estimées d’une manière indépendante, ce qui rend l’approche très robuste et facile à utiliser. Les méthodes développées sont appliquées sur plusieurs exemples numériques. Les travaux réalisés dans le cadre de la thèse représentent plusieurs contributions qui visent l’objectif final de la modélisation adaptative, ou une procédure complètement automatique permettrait de faire un choix optimal du modèle de plaques pour chaque élément de la structure. / The primary goal of the thesis is to provide some answers to the questions related to the key steps in the process of adaptive modeling of plates. Since the adaptivity depends on reliable error estimates, a large part of the thesis is related to the derivation of computational procedures for discretization error estimates as well as model error estimates. A practical comparison of some of the established discretization error estimates is made. Special attention is paid to what is called equilibrated residuum method, which has a potential to be used both for discretization error and model error estimates. It should be emphasized that the model error estimates are quite hard to obtain, in contrast to the discretization error estimates. The concept of model adaptivity for plates is in this work implemented on the basis of equilibrated residuum method and hierarchic family of plate finite element models.The finite elements used in the thesis range from thin plate finite elements to thick plate finite elements. The latter are based on a newly derived higher order plate theory, which includes through the thickness stretching. The model error is estimated by local element-wise computations. As all the finite elements, representing the chosen plate mathematical models, are re-derived in order to share the same interpolation bases, the difference between the local computations can be attributed mainly to the model error. This choice of finite elements enables effective computation of the model error estimate and improves the robustness of the adaptive modeling. Thus the discretization error can be computed by an independent procedure.Many numerical examples are provided as an illustration of performance of the derived plate elements, the derived discretization error procedures and the derived modeling error procedure. Since the basic goal of modeling in engineering is to produce an effective model, which will produce the most accurate results with the minimum input data, the need for the adaptive modeling will always be present. In this view, the present work is a contribution to the final goal of the finite element modeling of plate structures: a fully automatic adaptive procedure for the construction of an optimal computational model (an optimal finite element mesh and an optimal choice of a plate model for each element of the mesh) for a given plate structure.
96

Optimizing the Number of Time-steps Used in Option Pricing / Optimering av Antal Tidssteg inom Optionsprissättning

Lewenhaupt, Hugo January 2019 (has links)
Calculating the price of an option commonly uses numerical methods and can becomputationally heavy. In general, longer computations result in a more precisresult. As such, improving existing models or creating new models have been thefocus in the research field. More recently the focus has instead shifted towardcreating neural networks that can predict the price of a given option directly.This thesis instead studied how the number of time-steps parameter can beoptimized, with regard to precision of the resulting price, and then predict theoptimal number of time-steps for other options. The number of time-stepsparameter determines the computation time of one of the most common models inoption pricing, the Cox-Ross-Rubinstein model (CRR). Two different methodsfor determining the optimal number of time-steps were created and tested. Bothmodels use neural networks to learn the relationship between the input variablesand the output. The first method tried to predict the optimal number oftime-steps directly. The other method instead tried to predict the parameters ofan envelope around the oscillations of the option pricing method. It wasdiscovered that the second method improved the performance of the neuralnetworks tasked with predicting the optimal number of time-steps. It was furtherdiscovered that even though the best neural network that was found significantlyoutperformed the benchmark method, there was no significant difference incalculation times, most likely because the range of log moneyness and pricesthat were used. It was also noted that the neural network tended tounderestimate the parameter and that might not be a desirable property of asystem in charge of estimating a price in the financial sector.
97

Geração, contração e polarização de bases gaussianas para cálculos quânticos de átomos e moléculas / Generation, contraction and polarization for gaussian basis set for quantum calculations of atoms and molecules

Guimarães, Amanda Ribeiro 10 September 2013 (has links)
Muitos grupos de pesquisa já trabalharam com o desenvolvimento de conjuntos de bases, no intuito de obter melhores resultados em tempo e custo de cálculo computacional reduzidos. Para tal finalidade, o tamanho e a precisão são fatores a ser considerados, para que o número de funções do conjunto gerado proporcione uma boa descrição do sistema em estudo, num tempo de convergência reduzido. Esta dissertação tem como objetivo apresentar os conjuntos de bases obtidos pelo Método da Coordenada Geradora, para os átomos Na, Mg, Al, Si, P, S e Cl, e avaliar a qualidade de tais conjuntos pela comparação da energia eletrônica total, em nível atômico e molecular. Foi realizada uma busca para a obtenção do melhor conjunto contraído e do melhor conjunto de funções de polarização. A qualidade do conjunto gerado foi avaliada pelo cálculo DFT-B3LYP, cujos resultados foram comparados aos valores obtidos por cálculos que utilizam funções de bases conhecidas na literatura, tais como: cc-pVXZ do Dunning e pc-n do Jensen. Pelos resultados obtidos, pode-se notar que os conjuntos de bases gerados neste trabalho, denominados MCG-3d2f, podem representar sistemas atômicos ou moleculares. Tanto os valores de energia quanto os de tempo computacional são equivalentes e, em alguns casos, melhores que os obtidos aqui com os conjuntos de bases escolhidos como referência (conjuntos de Dunning e Jensen). / Many research groups have been working with the development of basis sets in order to get the best results in reduced time and cost of computational calculation. It is known that for such purpose, size and accuracy are the primary factors to be considered, so that the number of the generated set of functions allows a good description of the system being studied in a small convergence time. This essay aims to present the basis sets obtained by the Generator Coordinate Method for the atoms Na, Mg, Al, Si, P, S and Cl, as well as evaluating the quality of such clusters by comparing the electron energy at atomic and molecular levels. A research was also performed to obtain the best set contracted as well as the best set of polarization functions. The quality of the generated set was evaluated by calculating DFT-B3LYP results, which were compared to values obtained through calculation using basis functions such as cc-pVXZ of Dunning and pcn of Jensen. It can be noted, from the results obtained, that the basis sets generated in this study, named MCG-3d2f, may well represent atomic or molecular systems. Energy values and the computational time are equivalent and in some cases, even better than those obtained with the sets of bases chosen here as reference sets (Dunning and Jensen).
98

Calor específico do modelo de Anderson de uma impureza por grupo de renormalização numérico / Numerical Renormalization-group Computation of Specific Heats.

Costa, Sandra Cristina 24 March 1995 (has links)
Neste trabalho, calculam-se o calor específico e a entropia do Modelo de Anderson simétrico de uma impureza usando o Grupo de Renormalização Numérico (GRN). O método é baseado na discretização logarítmica da banda de condução do metal hospedeiro a qual a impureza está acoplada. Porém, esta discretização introduz oscilações nas propriedades termodinâmicas. Esta inconveniência, inerente ao método, é contornável para a suscetibilidade magnética, mas é crítica para o calor específico, restringindo o alcance do GRN. Para sobrepor essa dificuldade, é usado o novo procedimento denominado intercalado que foi desenvolvido para o cálculo da suscetibilidade magnética de modelos de duas impurezas. Para reduzir as matrizes e o tempo computacional, é usado, também, o operador carga axial, recentemente definido no contexto do Modelo de Kondo de duas impurezas, e que é conservado pelo Hamiltoniano de Anderson simétrico. As curvas obtidas são comparadas com resultados exatos obtidos por ansatz de Bethe e pelo Modelo de Nível Ressonante. / The specific heat and the entropy of the one-impurity symmetric Anderson Model are calculated using the Numerical Renormalization Group (NRG). The heart of the method is the logarithmic discretization of the metal conduction band where the impurity is coupled. However, this discretization, inherent in the method, introduces oscillations in the thermodynamical properties. For the susceptibility it is not so critical but for the specific heat the usual calculation is prohibitive. To overcome this difficulty, we use the new procedure called interleaved that was developed to calculate the susceptibility of two-impurity models. In order to reduce the matrices and computation time, use is made of the axial charge operator recently defined in the two-impurity Kondo Model context and that is conserved by the symmetric Anderson Hamiltonian. The curves obtained are compared with exacts results of Bethe ansatz and Resonant Level Model.
99

Fluxo de potência ótimo multiobjetivo com restrições de segurança e variáveis discretas / Multiobjective security constrained optimal power flow with discrete variables

Ferreira, Ellen Cristina 11 May 2018 (has links)
O presente trabalho visa a investigação e o desenvolvimento de estratégias de otimização contínua e discreta para problemas de Fluxo de Potência Ótimo com Restrições de Segurança (FPORS) Multiobjetivo, incorporando variáveis de controle associadas a taps de transformadores em fase, chaveamentos de bancos de capacitores e reatores shunt. Um modelo Problema de Otimização Multiobjetivo (POM) é formulado segundo a soma ponderada, cujos objetivos são a minimização de perdas ativas nas linhas de transmissão e de um termo adicional que proporciona uma maior margem de reativos ao sistema. Investiga-se a incorporação de controles associados a taps e shunts como grandezas fixas, ou variáveis contínuas e discretas, sendo neste último caso aplicadas funções auxiliares do tipo polinomial e senoidal, para fins de discretização. O problema completo é resolvido via meta-heurísticas Evolutionary Particle Swarm Optimization (EPSO) e Differential Evolutionary Particle Swarm Optimization (DEEPSO). Os algoritmos foram desenvolvidos utilizando o software MatLab R2013a, sendo a metodologia aplicada aos sistemas IEEE de 14, 30, 57, 118 e 300 barras e validada sob os prismas diversidade e qualidade das soluções geradas e complexidade computacional. Os resultados obtidos demonstram o potencial do modelo e estratégias de resolução propostas como ferramentas auxiliares ao processo de tomada de decisão em Análise de Segurança de redes elétricas, maximizando as possibilidades de ação visando a redução de emergências pós-contingência. / The goal of the present work is to investigate and develop continuous and discrete optimization strategies for SCOPF problems, also taking into account control variables related to in-phase transformers, capacitor banks and shunt reactors. Multiobjective optimization model is formulated under a weighted sum criteria whose objectives are the minimization of active power losses and an additional term that yields a greater reactive support to the system. Controls associated with taps and shunts are modeled either as fixed quantities, or continuous and discrete variables, in which case auxiliary functions of polynomial and sinusoidal types are applied for discretization purposes. The complete model is solved via EPSO and DEEPSO metaheuristics. Routines coded in Matlab were applied to the IEEE 14,30, 57, 118 and 300-bus test systems, where the method was validated in terms of diversity and quality of solutions and computational complexity. The results demonstrate the robustness of the model and solution approaches and uphold it as an effective support tool for the decision-making process in Power Systems Security Analysis, maximizing preventive actions in order to avoid insecure operating conditions.
100

Préparation non paramétrique des données pour la fouille de données multi-tables / Non-parametric data preparation for multi-relational data mining

Lahbib, Dhafer 06 December 2012 (has links)
Dans la fouille de données multi-tables, les données sont représentées sous un format relationnel dans lequel les individus de la table cible sont potentiellement associés à plusieurs enregistrements dans des tables secondaires en relation un-à-plusieurs. Afin de prendre en compte les variables explicatives secondaires (appartenant aux tables secondaires), la plupart des approches existantes opèrent par mise à plat, obtenant ainsi une représentation attribut-valeur classique. Par conséquent, on perd la représentation initiale naturellement compacte mais également on risque d'introduire des biais statistiques. Dans cette thèse, nous nous intéressons à évaluer directement les variables secondaires vis-à-vis de la variable cible, dans un contexte de classification supervisée. Notre méthode consiste à proposer une famille de modèles non paramétriques pour l'estimation de la densité de probabilité conditionnelle des variables secondaires. Cette estimation permet de prendre en compte les variables secondaires dans un classifieur de type Bayésien Naïf. L'approche repose sur un prétraitement supervisé des variables secondaires, par discrétisation dans le cas numérique et par groupement de valeurs dans le cas catégoriel. Dans un premier temps, ce prétraitement est effectué de façon univariée, c'est-à-dire, en considérant une seule variable secondaire à la fois. Dans un second temps, nous proposons une approche de partitionnement multivarié basé sur des itemsets de variables secondaires, ce qui permet de prendre en compte les éventuelles corrélations qui peuvent exister entre variables secondaires. Des modèles en grilles de données sont utilisés pour obtenir des critères Bayésiens permettant d'évaluer les prétraitements considérés. Des algorithmes combinatoires sont proposés pour optimiser efficacement ces critères et obtenir les meilleurs modèles.Nous avons évalué notre approche sur des bases de données multi-tables synthétiques et réelles. Les résultats montrent que les critères d'évaluation ainsi que les algorithmes d'optimisation permettent de découvrir des variables secondaires pertinentes. De plus, le classifieur Bayésien Naïf exploitant les prétraitements effectués permet d'obtenir des taux de prédiction importants. / In multi-relational data mining, data are represented in a relational form where the individuals of the target table are potentially related to several records in secondary tables in one-to-many relationship. In order take into account the secondary variables (those belonging to a non target table), most of the existing approaches operate by propositionalization, thereby losing the naturally compact initial representation and eventually introducing statistical bias. In this thesis, our purpose is to assess directly the relevance of secondary variables w.r.t. the target one, in the context of supervised classification.We propose a family of non parametric models to estimate the conditional density of secondary variables. This estimation provides an extension of the Naive Bayes classifier to take into account such variables. The approach relies on a supervised pre-processing of the secondary variables, through discretization in the numerical case and a value grouping in the categorical one. This pre-processing is achieved in two ways. In the first approach, the partitioning is univariate, i.e. by considering a single secondary variable at a time. In a second approach, we propose an itemset based multivariate partitioning of secondary variables in order to take into account any correlations that may occur between these variables. Data grid models are used to define Bayesian criteria, evaluating the considered pre-processing. Combinatorial algorithms are proposed to efficiently optimize these criteria and find good models.We evaluated our approach on synthetic and real world multi-relational databases. Experiments show that the evaluation criteria and the optimization algorithms are able to discover relevant secondary variables. In addition, the Naive Bayesian classifier exploiting the proposed pre-processing achieves significant prediction rates.

Page generated in 0.0466 seconds