• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 9
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 17
  • 17
  • 7
  • 5
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

A methodology for database management of time-variant encodings and/or missing information

Threlfall, William John January 1988 (has links)
The problem presented is how to handle encoded data for which the encodings or decodings change with respect to time, and which contains codes indicating that certain data is unknown, invalid, or not applicable with respect to certain entities during certain time periods. It is desirable to build a database management system that is capable of knowing about and being able to handle the changes in encodings and the missing information codes by embedding such knowledge in the data definition structure, in order to remove the necessity of having applications programmers and users constantly worrying about how the data is encoded. The experimental database management language DEFINE is utilized to achieve the desired result, and a database structure is created for a real-life example of data which contains many examples of time-variant encodings and missing information. / Science, Faculty of / Computer Science, Department of / Graduate
2

The Gourmet Guide to Statistics: For an Instructional Strategy That Makes Teaching and Learning Statistics a Piece of Cake

Edirisooriya, Gunapala 01 January 2003 (has links)
This article draws analogies between the activities of statisticians and of chefs. It suggests how these analogies can be used in teaching, both to help understanding of what statistics is about and to increase motivation to learn the subject.
3

Design of versatile, multi-channeled, data acquisition module

Gateno, Leon W. January 2011 (has links)
Typescript (photocopy). / Digitized by Kansas Correctional Industries
4

Clutter-Based Dimension Reordering in Multi-Dimensional Data Visualization

Peng, Wei 11 January 2005 (has links)
Visual clutter denotes a disordered collection of graphical entities in information visualization. It can obscure the structure present in the data. Even in a small dataset, visual clutter makes it hard for the viewer to find patterns, relationships and structure. In this thesis, I study visual clutter with four distinct visualization techniques, and present the concept and framework of Clutter-Based Dimension Reordering (CBDR). Dimension order is an attribute that can significantly affect a visualization's expressiveness. By varying the dimension order in a display, it is possible to reduce clutter without reducing data content or modifying the data in any way. Clutter reduction is a display-dependent task. In this thesis, I apply the CBDR framework to four different visualization techniques. For each display technique, I determine what constitutes clutter in terms of display properties, then design a metric to measure visual clutter in this display. Finally I search for an order that minimizes the clutter in a display. Different algorithms for the searching process are discussed in this thesis as well. In order to gather users' responses toward the clutter measures used in the Clutter-Based Dimension Reordering process and validate the usefulness of CBDR, I also conducted an evaluation with two groups of users. The study result proves that users find our approach to be helpful for visually exploring datasets. The users also had many comments and suggestions for the CBDR approach as well as for visual clutter reduction in general. The content and result of the user study are included in this thesis.
5

Předzpracování dat pro systémy dobývání znalostí z databází / Data preprocessing for data mining systems

Falc, Václav January 2012 (has links)
Main target of this graduation thesis was creating system for data preparation. System was created using programing languages C#, SQL and partly in XML and HTML.
6

Implementace procedur pro předzpracování dat v systému Rapid Miner / Implementation of data preparation procedures for RapidMiner

Černý, Ján January 2014 (has links)
Knowledge Discovery in Databases (KDD) is gaining importance with the rising amount of data being collected lately, despite this analytic software systems often provide only the basic and most used procedures and algorithms. The aim of this thesis is to extend RapidMiner, one of the most frequently used systems, with some new procedures for data preprocessing. To understand and develop the procedures, it is important to be acquainted with the KDD, with emphasis on the data preparation phase. It's also important to describe the analytical procedures themselves. To be able to develop an extention for Rapidminer, its needed to get acquainted with the process of creating the extention and the tools that are used. Finally, the resulting extension is introduced and tested.
7

Data Preparation from Visually Rich Documents

Sarkhel, Ritesh January 2022 (has links)
No description available.
8

Préparation non paramétrique des données pour la fouille de données multi-tables / Non-parametric data preparation for multi-relational data mining

Lahbib, Dhafer 06 December 2012 (has links)
Dans la fouille de données multi-tables, les données sont représentées sous un format relationnel dans lequel les individus de la table cible sont potentiellement associés à plusieurs enregistrements dans des tables secondaires en relation un-à-plusieurs. Afin de prendre en compte les variables explicatives secondaires (appartenant aux tables secondaires), la plupart des approches existantes opèrent par mise à plat, obtenant ainsi une représentation attribut-valeur classique. Par conséquent, on perd la représentation initiale naturellement compacte mais également on risque d'introduire des biais statistiques. Dans cette thèse, nous nous intéressons à évaluer directement les variables secondaires vis-à-vis de la variable cible, dans un contexte de classification supervisée. Notre méthode consiste à proposer une famille de modèles non paramétriques pour l'estimation de la densité de probabilité conditionnelle des variables secondaires. Cette estimation permet de prendre en compte les variables secondaires dans un classifieur de type Bayésien Naïf. L'approche repose sur un prétraitement supervisé des variables secondaires, par discrétisation dans le cas numérique et par groupement de valeurs dans le cas catégoriel. Dans un premier temps, ce prétraitement est effectué de façon univariée, c'est-à-dire, en considérant une seule variable secondaire à la fois. Dans un second temps, nous proposons une approche de partitionnement multivarié basé sur des itemsets de variables secondaires, ce qui permet de prendre en compte les éventuelles corrélations qui peuvent exister entre variables secondaires. Des modèles en grilles de données sont utilisés pour obtenir des critères Bayésiens permettant d'évaluer les prétraitements considérés. Des algorithmes combinatoires sont proposés pour optimiser efficacement ces critères et obtenir les meilleurs modèles.Nous avons évalué notre approche sur des bases de données multi-tables synthétiques et réelles. Les résultats montrent que les critères d'évaluation ainsi que les algorithmes d'optimisation permettent de découvrir des variables secondaires pertinentes. De plus, le classifieur Bayésien Naïf exploitant les prétraitements effectués permet d'obtenir des taux de prédiction importants. / In multi-relational data mining, data are represented in a relational form where the individuals of the target table are potentially related to several records in secondary tables in one-to-many relationship. In order take into account the secondary variables (those belonging to a non target table), most of the existing approaches operate by propositionalization, thereby losing the naturally compact initial representation and eventually introducing statistical bias. In this thesis, our purpose is to assess directly the relevance of secondary variables w.r.t. the target one, in the context of supervised classification.We propose a family of non parametric models to estimate the conditional density of secondary variables. This estimation provides an extension of the Naive Bayes classifier to take into account such variables. The approach relies on a supervised pre-processing of the secondary variables, through discretization in the numerical case and a value grouping in the categorical one. This pre-processing is achieved in two ways. In the first approach, the partitioning is univariate, i.e. by considering a single secondary variable at a time. In a second approach, we propose an itemset based multivariate partitioning of secondary variables in order to take into account any correlations that may occur between these variables. Data grid models are used to define Bayesian criteria, evaluating the considered pre-processing. Combinatorial algorithms are proposed to efficiently optimize these criteria and find good models.We evaluated our approach on synthetic and real world multi-relational databases. Experiments show that the evaluation criteria and the optimization algorithms are able to discover relevant secondary variables. In addition, the Naive Bayesian classifier exploiting the proposed pre-processing achieves significant prediction rates.
9

SVM-Based Negative Data Mining to Binary Classification

Jiang, Fuhua 03 August 2006 (has links)
The properties of training data set such as size, distribution and the number of attributes significantly contribute to the generalization error of a learning machine. A not well-distributed data set is prone to lead to a partial overfitting model. Two approaches proposed in this dissertation for the binary classification enhance useful data information by mining negative data. First, an error driven compensating hypothesis approach is based on Support Vector Machines (SVMs) with (1+k)-iteration learning, where the base learning hypothesis is iteratively compensated k times. This approach produces a new hypothesis on the new data set in which each label is a transformation of the label from the negative data set, further producing the positive and negative child data subsets in subsequent iterations. This procedure refines the base hypothesis by the k child hypotheses created in k iterations. A prediction method is also proposed to trace the relationship between negative subsets and testing data set by a vector similarity technique. Second, a statistical negative example learning approach based on theoretical analysis improves the performance of the base learning algorithm learner by creating one or two additional hypotheses audit and booster to mine the negative examples output from the learner. The learner employs a regular Support Vector Machine to classify main examples and recognize which examples are negative. The audit works on the negative training data created by learner to predict whether an instance is negative. However, the boosting learning booster is applied when audit does not have enough accuracy to judge learner correctly. Booster works on training data subsets with which learner and audit do not agree. The classifier for testing is the combination of learner, audit and booster. The classifier for testing a specific instance returns the learner's result if audit acknowledges learner's result or learner agrees with audit's judgment, otherwise returns the booster's result. The error of the classifier is decreased to O(e^2) comparing to the error O(e) of a base learning algorithm.
10

Automatizace předzpracování dat za využití doménových znalosti / Automation of data preprocessing using domain knowledge

Beskyba, Jan January 2014 (has links)
In this work we propose a solution that would help automate the part of knowledge discovery in databases. Domain knowledge has an important role in the automation process which is necessary to include into the proposed program for data preparation. In the introduction to this work, we focus on the theoretical basis of knowledge discovery of databases with an emphasis on domain knowledge. Next, we focus on the basic principles of data pre-processing and scripting language LMCL that could be part of the design of the newly established applications for automated data preparation. Subsequently, we will deal with application design for data pre-processing, which will be verified on the data the House of Commons.

Page generated in 0.1647 seconds