Spelling suggestions: "subject:"datamining"" "subject:"mataining""
1 |
MINING STRUCTURED SETS OF SUBSPACES FROM HIGH DIMENSIONAL DATARAJSHIVA, ANSHUMAAN 01 July 2004 (has links)
No description available.
|
2 |
Application of Decision Diagrams for Information Storage and RetrievalKomaragiri, Vivek Chakravarthy 11 May 2002 (has links)
Technology is improving at an amazing pace and one reason for this advancement is because of unprecedented growth in the field of Information Technology and also in Digital Integrated Circuit technology over the past few decades. The size of a typical modern database is in the order of high ends of gigabytes and even terabytes. Researchers were successful in designing complex databases but there is still lot of activity on effectively making use of this stored information. There have been significant advancements in the field of "Logic optimization" and also in "Information storage and retrieval" but there has been very little transfer of these methods. The purpose of this study is to investigate the use of powerful Computer Aided Design (CAD) techniques for efficient information storage and retrieval. In the work presented in this thesis, it is shown that Decision Diagrams can be used for efficient data storage and information retrieval. An efficient technique is proposed for each of the two key areas of research in Database systems known as "Query Optimization" and "Datamining". Encouraging results are obtained indicating that using hardware techniques for information processing can be a new approach for solving these problems. An SQL query is represented using a hardware data structure known as an AND/OR graph and an SQL parser is interfaced with AND/OR package to achieve query optimization. Optimization using AND/OR graphs works only in the Boolean domain and to make the process of query optimization more complete it has to be investigated in Multivalued domain. The possibility of using MDD as a data structure to represent the query in the multi valued domain is discussed and a synthesis technique is developed to synthesize Multi Valued Logic Networks using MDD. Another useful data structure known as BDD can be used to store the large transaction files used in datamining applications very effectively.
|
3 |
Conception et réalisation d’un outil de traitement et analyse des données spatiales pour l'aide à la décision : application au secteur de la distribution / Design and implementation of a spatial data processing and analysis tool for decision support : application to the retail industryDaras, Gautier 20 December 2017 (has links)
L'outil conceptualisé et développé au cours de cette thèse aspire à: (1) Tirer profit des récentes évolutions des Systèmes d'Information Géographique (SIG) en proposant de nouvelles approches pour le traitement de problème ayant un aspect spatial. (2) Appliquer des approches théoriques dans des problématiques industrielles réelles afin de proposer des approches pour les phases qui ne sont pas abordées dans la recherche théorique. Dans cette optique, trois modules ont été développés, un module d’intégration et de visualisation des données spatiales, un module de pré-traitement des données et un module d’optimisation de la couverture.- La première partie de la thèse aborde le sujet de la mise en place du premier module, et propose un framework conceptuel pour le développement d'outil similaire. Le module d'intégration et de visualisation développé permet l’accès aux données de ventes via une interface web dédiée. La plateforme permet la mise en contexte des données de ventes en affichant les détaillants sur une carte, et en donnant accès à la visualisation d’autres données (ex. : socio-démo graphique, concurrentielle). Les détaillants affichés sur la carte sont filtrables suivant leurs caractéristiques et colorables suivant de multiples critères (ex. : comparaison aux années précédentes, comparaison aux objectifs, etc.). La sélection des éléments présents sur la carte permet d’avoir accès à leurs informations détaillées. L’ensemble des différentes fonctionnalités permet une meilleure compréhension du marché, et autorise l’exploration des résultats de ventes sous un nouvel angle.- La seconde partie traite de l’outil de pré-traitement des données spatiale. Notre approche permet de rendre accessible l’analyse de données spatiales aux utilisateurs ne disposant pas de connaissances en SIG. En plus de cela, la réalisation des étapes de prétraitement peut être réalisée plus rapidement, et avec des choix guidés quant à la sélection des relations spatiales à prendre en compte. Une implémentation fonctionnelle de l’approche a été mise en place, basée sur des outils open sources pour permettre l’implémentation à coûts réduits de notre solution. L’utilisation de notre implémentation permet des gains de temps conséquents lors du prétraitement des données spatiales pour les analyses des données géospatiales.- La troisième et dernière partie se concentre sur l’outil d’optimisation de la couverture qui s’appuie sur la structure et les outils mis en place précédemment. Il prend en entrée les jeux de données correspondant aux potentiels des zones et ceux correspondant aux points de vente et à leurs zones de chalandise. À partir de ces données, l’outil propose des solutions d’amélioration de la couverture qui tiennent compte des aspects liés à la zone de chalandise de chaque magasin et à la captation collaborative de la demande. / The tool conceptualized and developed during this thesis aims to: (1) Take advantage of recent evolutions of Geographic Information Systems (GIS) by proposing new approaches for the treatment of problems having a spatial aspect. (2) Apply theorical approach in real industrial issues to propose approaches for phases that are not addressed in theoretical research. With this in mind, three modules have been developed, a spatial data integration and visualization module, a data pre-processing module and a coverage optimization module.- The first part of the thesis addresses the subject of the implementation of the first module, and proposes a conceptual framework for the development of similar tools. The integration and visualization module allows access to sales data via a dedicated web interface. The platform allows the contextualization of sales data by displaying retailers on a map and giving access to the visualization of other data (eg socio-demographic, competitive). The retailers displayed on the map can be filtered according to their characteristics and colorable according to multiple criteria (eg comparison with previous years, comparison with objectives, etc.). The selection of the elements present on the map allows to have access to their detailed information. All the different functionalities allow for a better understanding of the market, and allow for the exploration of the sales results in a new angle.- The second part deals with the spatial data pre-processing tool. Our approach makes it possible to make spatial data analysis available to users who do not have GIS knowledge. In addition to this, the realization of the pre-processing steps can be carried out more quickly, and with guided choices for the selection of the spatial relations to take into account. A functional implementation of the approach has been implemented, based on open source tools to enable cost-effective implementation of our solution. The use of our implementation allows for significant time savings when pre-processing spatial data for geospatial data analysis.- The third and final part focuses on the coverage optimization module that is based on the structure and modules previously implemented. It takes as input the datasets corresponding to the potentials of the zones and those corresponding to the distributors and their catchment areas. From this data, the module proposes solutions to improve the coverage that take into account the aspects related to the catchment area of each distributors and the collaborative capture of the potential.
|
4 |
Etude et Extraction de règles graduelles floues : définition d'algorithmes efficaces. / Survey and Extraction of Fuzzy gradual rules : Definition of Efficient algorithmsAyouni, Sarra 09 May 2012 (has links)
L'Extraction de connaissances dans les bases de données est un processus qui vise à extraire un ensemble réduit de connaissances à fortes valeurs ajoutées à partir d'un grand volume de données. La fouille de données, l'une des étapes de ce processus, regroupe un certain nombre de taches, telles que : le clustering, la classification, l'extraction de règles d'associations, etc.La problématique d'extraction de règles d'association nécessite l'étape d'extraction de motifs fréquents. Nous distinguons plusieurs catégories de motifs : les motifs classiques, les motifs flous, les motifs graduels, les motifs séquentiels. Ces motifs diffèrent selon le type de données à partir desquelles l'extraction est faite et selon le type de corrélation qu'ils présentent.Les travaux de cette thèse s'inscrivent dans le contexte d'extraction de motifs graduels, flous et clos. En effet, nous définissons de nouveaux systèmes de clôture de la connexion de Galois relatifs, respectivement, aux motifs flous et graduels. Ainsi, nous proposons des algorithmes d'extraction d'un ensemble réduit pour les motifs graduels et les motifs flous.Nous proposons également deux approches d'extraction de motifs graduels flous, ceci en passant par la génération automatique des fonctions d'appartenance des attributs.En se basant sur les motifs flous clos et graduels clos, nous définissons des bases génériques de toutes les règles d'association graduelles et floues. Nous proposons également un système d'inférence complet et valide de toutes les règles à partir de ces bases. / Knowledge discovery in databases is a process aiming at extracting a reduced set of valuable knowledge from a huge amount of data. Data mining, one step of this process, includes a number of tasks, such as clustering, classification, of association rules mining, etc.The problem of mining association rules requires the step of frequent patterns extraction. We distinguish several categories of frequent patterns: classical patterns, fuzzy patterns, gradual patterns, sequential patterns, etc. All these patterns differ on the type of the data from which the extraction is done and the type of the relationship that represent.In this thesis, we particularly contribute with the proposal of fuzzy and gradual patterns extraction method.Indeed, we define new systems of closure of the Galois connection for, respectively, fuzzy and gradual patterns. Thus, we propose algorithms for extracting a reduced set of fuzzy and gradual patterns.We also propose two approaches for automatically defining fuzzy modalities that allow obtaining relevant fuzzy gradual patterns.Based on fuzzy closed and gradual closed patterns, we define generic bases of fuzzy and gradual association rules. We thus propose a complet and valid inference system to derive all redundant fuzzy and gradual association rules.
|
5 |
Mapování PMML a BKEF dokumentů v projektu SEWEBAR-CMS / Mapping of PMML and BKEF documents using PHP in the SEWEBAR CMSVojíř, Stanislav January 2010 (has links)
In the data mining process, it is necessary to prepare the source dataset - for example, to select the cutting or grouping of continuous data attributes etc. and use the knowledge from the problem area. Such a preparation process can be guided by background (domain) knowledge obtained from experts. In the SEWEBAR project, we collect the knowledge from experts in a rich XML-based representation language, called BKEF, using a dedicated editor, and save into the database of our custom-tailored (Joomla!-based) CMS system. Data mining tools are then able to generate, from this dataset, mining models represented in the standardized PMML format. It is then necessary to map a particular column (attribute) from the dataset (in PMML) to a relevant 'metaattribute' of the BKEF representation. This specific type of schema mapping problem is addressed in my thesis in terms of algorithms for automatic suggestion of mapping of columns to metaattributes and from values of these columns to BKEF 'metafields'. Manual corrections of this mapping by the user are also supported. The implementation is based on the PHP language and then it was tested on datasets with information about courses taught in 5 universities in the U.S.A. from Illinois Semantic Integration Archive. On this datasets, the auto-mapping suggestion process archieved the precision about 70% and recall about 77% on unknown columns, but when mapping the previously user-mapped data (using implemented learning module), the recall is between 90% and 100%.
|
6 |
Implementation of the Apriori algorithm for effective item set mining in VigiBaseTM : Project report in Teknisk Fysik 15 hpOlofsson, Niklas January 2010 (has links)
No description available.
|
7 |
Implementation of the Apriori algorithm for effective item set mining in VigiBaseTM : Project report in Teknisk Fysik 15 hpOlofsson, Niklas January 2010 (has links)
No description available.
|
8 |
Reporting Management för den interna rapporterings processen med hjälp av verktyget Tivoli Decision Support : TDSSvensson, Christine, Strandberg, Susanne January 2001 (has links)
Rapporten inleds med en beskrivning av WM-datas Network Management struktur och Reporting Management behov. Därefter följer en beskrivning av de två analys tekniker Datamining och On Line Analytical Processing (OLAP) vilka är de mest använda databasbaserade tekniker. Verktyget Tivoli Decision Support (TDS) är ett stödssystem som ska underlätta för beslutsfattare inom organisationen. TDS baseras på OLAP ? tekniken och rapporten visar avslutningsvis de möjligheter som verktyget ger avseende WM-datas Reporting Management. / Christine Svensson 0454-14926 Susanne Strandberg 0456-22824
|
9 |
Modul víceúrovňových asociačních pravidel systému pro dolování z dat / Multi-Level Association Rules Module of a Data Mining SystemPospíšil, Jan January 2010 (has links)
This thesis focuses on the problematics of implementing a multilevel association rules mining module, for existing data mining project. There are two main algorithms explained, Apriori and MLT2L1. The thesis continues with the datamining module implementation, as well as the DMSL elements design. In the final chapters deal with an example dataminig task and its result comparison as well as the whole thesis achievement description.
|
10 |
Discovering Frequent Episodes With General Partial OrdersAchar, Avinash 12 1900 (has links) (PDF)
Pattern Discovery, a popular paradigm in data mining refers to a class of techniques that try and extract some unknown or interesting patterns from data. The work carried out in this thesis concerns frequent episode mining, a popular framework within pattern discovery, with applications in alarm management, fault analysis, network reconstruction etc. The data here is in the form of a single longtime-ordered stream of events. The pattern of interest here, namely episode, is basically a set of event-types with a partial order on it. The task here is to unearth all patterns( episodes here) which have a frequency above a user-defined threshold irrespective of pattern size. Most current discovery algorithms employ a level-wise a priori-based method for mining, which basically adopts a breadth-first search strategy of the space of all episodes.
The episode literature has seen multiple ways of defining frequency with each definition having its own set of merits and demerits. The main reason for different frequencies definitions being proposed is that, in general, counting all occurrences of a set of episodes is computationally very expensive. The first part of the thesis gives a unified view of all the apriori-based discovery algorithms for serial episodes(associated with a total order)under these various frequencies. Specifically, the various existing counting algorithms can be viewed as minor modifications of each other. We also provide some novel proofs of correctness for some of the serial episode counting schemes, which in turn can be generalized to episodes with general partial orders. Our unified view helps us derive quantitative relationships between different frequencies. We also discuss all the anti-monotonicity properties satisfied by the various frequencies, a crucial information needed for the candidate generation step.
The second part of the thesis proposes discovery algorithms for episodes with general partial orders, for which no algorithms currently exist in literature. The discovery algorithm proposed is apriori-based and generalizes the existing serial and parallel (associated with a trivial order) episode algorithms. The discovery algorithm is a level-wise procedure involving the steps of candidate generation and counting a teach level. In the context of general partial orders, a major problem in a priori-based discovery is to have an efficient candidate generation scheme. We present a novel candidate generation algorithm for mining episodes with general partial orders. The counting algorithm design for general partial order episodes draws ideas from the unified view of counting for serial episodes, presented in the first part of the work. We formally show the correctness of the proposed candidate generation and counting steps for general partial orders. The proposed candidate generation algorithm is flexible enough to be able to mine in certain specialized classes of partial orders (satisfying what we call maximal sub episode property), of which, the serial and parallel class of episodes are two specific instances. Our algorithm design initially restricts itself to the class of general partial order episodes called injective episodes wherein repeated event-types are not allowed. We then generalize this to a larger class of episodes called chain episodes, where episodes can have some repeated event types. The class of chain episodes contains all (including non-injective) serial and parallel episodes and thus our method properly generalizes the existing methods for serial and parallel episode discovery. We also discuss some problems in extending our algorithms to episodes beyond the class of chain episodes. Also, we demonstrate that frequency alone is not a sufficient enough interestingness measure for episodes with unrestricted partial orders. To address this issue, we propose an additional measure called bidirectional evidence to assess interestingness which, along with frequency is found to be extremely effective in unearthing interesting patterns.
In the frequent episode framework, the choice of thresholds are most often user-defined and arbitrary. To address this issue, the last part of the work deals with assessing significance of partial order episodes in a statistical sense based on ideas from classical hypothesis testing. We declare an episode to be significant if its observed frequency in the data stream is large enough to be very unlikely, under a random i.i.d model .The key step in the significance analysis involves the mean and variance computation of the the time between successive occurrences of the pattern. This computation can be reformulated as, solving for the mean and variance of the first visit time to a particular stat e in an associated Markov chain. We use a generating function approach to solve for this mean and variance. Using this and a Gaussian approximation to the frequency random variable, we can now calculate a frequency threshold for any partial order episode, beyond which we infer it to be significant. Our significance analysis for general partial order episodes generalizes the existing significance analysis of serial episode patterns. We demonstrate on synthetic data the effectiveness of our significance thresholds.
|
Page generated in 0.0797 seconds