121 |
Visualização de operações de junção em sistemas de bases de dados para mineração de dados. / Visualization of join operations in DBMS for data mining.Maria Camila Nardini Barioni 13 June 2002 (has links)
Nas últimas décadas, a capacidade das empresas de gerar e coletar informações aumentou rapidamente. Essa explosão no volume de dados gerou a necessidade do desenvolvimento de novas técnicas e ferramentas que pudessem, além de processar essa enorme quantidade de dados, permitir sua análise para a descoberta de informações úteis, de maneira inteligente e automática. Isso fez surgir um proeminente campo de pesquisa para a extração de informação em bases de dados denominado Knowledge Discovery in Databases KDD, no geral técnicas de mineração de dados DM têm um papel preponderante. A obtenção de bons resultados na etapa de mineração de dados depende fortemente de quão adequadamente o preparo dos dados é realizado. Sendo assim, a etapa de extração de conhecimento (DM) no processo de KDD, é normalmente precedida de uma etapa de pré-processamento, onde os dados que porventura devam ser submetidos à etapa de DM são integrados em uma única relação. Um problema importante enfrentado nessa etapa é que, na maioria das vezes, o usuário ainda não tem uma idéia muito precisa dos dados que devem ser extraídos. Levando em consideração a grande habilidade de exploração da mente humana, este trabalho propõe uma técnica de visualização de dados armazenados em múltiplas relações de uma base de dados relacional, com o intuito de auxiliar o usuário na preparação dos dados a serem minerados. Esta técnica permite que a etapa de DM seja aplicada sobre múltiplas relações simultaneamente, trazendo as operações de junção para serem parte desta etapa. De uma maneira geral, a adoção de junções em ferramentas de DM não é prática, devido ao alto custo computacional associado às operações de junção. Entretanto, os resultados obtidos nas avaliações de desempenho da técnica proposta neste trabalho mostraram que ela reduz esse custo significativamente, tornando possível a exploração visual de múltiplas relações de uma maneira interativa. / In the last decades the capacity of information generation and accumulation increased quickly. With the explosive growth in the volume of data, new techniques and tools are being sought to process it and to automatically discover useful information from it, leading to techniques known as Knowledge Discovery in Databases KDD where, in general, data mining DM techniques play an important role. The results of applying data mining techniques on datasets are highly dependent on proper data preparation. Therefore, in traditional DM processes, data goes through a pre-processing step that results in just one table that is submitted to mining. An important problem faced during this step is that, most of the times, the analyst doesnt have a clear idea of what portions of data should be mined. This work reckons the strong ability of human beings to interpret data represented in graphical format, to develop a technique to visualize data from multiple tables, helping human analysts when preparing data to DM. This technique allows the data mining process to be applied over multiple relations at once, bringing the join operations to become part of this process. In general, the use of multiple tables in DM tools is not practical, due to the high computational cost required to explore them. Experimental evaluation of the proposed technique shows that it reduces this cost significantly, turning it possible to visually explore data from multiple tables in an interactive way.
|
122 |
Získávání znalostí z databází pohybujících se objektů / Knowledge Discovery in Databases of Moving ObjectsChovanec, Vladimír January 2011 (has links)
The aim of this master's thesis is to get familiar with problems of data mining and classification. This thesis also continues with application SUNAR, which is upgraded in practical part with SVM classification of persons passing between cameras. In the conclusion, we discuss ways to improve classification and person recognition in application SUNAR.
|
123 |
A learning framework for zero-knowledge game playing agentsDuminy, Willem Harklaas 17 October 2007 (has links)
The subjects of perfect information games, machine learning and computational intelligence combine in an experiment that investigates a method to build the skill of a game-playing agent from zero game knowledge. The skill of a playing agent is determined by two aspects, the first is the quantity and quality of the knowledge it uses and the second aspect is its search capacity. This thesis introduces a novel representation language that combines symbols and numeric elements to capture game knowledge. Insofar search is concerned; an extension to an existing knowledge-based search method is developed. Empirical tests show an improvement over alpha-beta, especially in learning conditions where the knowledge may be weak. Current machine learning techniques as applied to game agents is reviewed. From these techniques a learning framework is established. The data-mining algorithm, ID3, and the computational intelligence technique, Particle Swarm Optimisation (PSO), form the key learning components of this framework. The classification trees produced by ID3 are subjected to new post-pruning processes specifically defined for the mentioned representation language. Different combinations of these pruning processes are tested and a dominant combination is chosen for use in the learning framework. As an extension to PSO, tournaments are introduced as a relative fitness function. A variety of alternative tournament methods are described and some experiments are conducted to evaluate these. The final design decisions are incorporated into the learning frame-work configuration, and learning experiments are conducted on Checkers and some variations of Checkers. These experiments show that learning has occurred, but also highlights the need for further development and experimentation. Some ideas in this regard conclude the thesis. / Dissertation (MSc)--University of Pretoria, 2007. / Computer Science / MSc / Unrestricted
|
124 |
FCART: A New FCA-based System for Data Analysis and Knowledge DiscoveryNeznanov, Alexey A., Ilvovsky, Dmitry A., Kuznetsov, Sergei O. 28 May 2013 (has links)
We introduce a new software system called Formal Concept Analysis Research Toolbox (FCART). Our goal is to create a universal integrated environment for knowledge and data engineers. FCART is constructed upon an iterative data analysis methodology and provides a built-in set of research tools based on Formal Concept Analysis techniques for working with object-attribute data representations. The provided toolset allows for the fast integration of extensions on several levels: from internal scripts to plugins.
FCART was successfully applied in several data mining and knowledge discovery tasks. Examples of applying the system in medicine and criminal investigations are considered.
|
125 |
Literature Study and Assessment of Trajectory Data Mining Tools / Litteraturstudie och utvärdering av verktyg för datautvinning från rörelsebanedataKihlström, Petter January 2015 (has links)
With the development of technologies such as Global Navigation Satellite Systems (GNSS), mobile computing, and Information and Communication Technology (ICT) the procedure of sampling positional data has lately been significantly simplified. This enables the aggregation of large amounts of moving objects data (i.e. trajectories) containing potential information about the moving objects. Within Knowledge Discovery in Databases (KDD), automated processes for realization of this information, called trajectory data mining, have been implemented. The objectives of this study is to examine 1) how trajectory data mining tasks are defined at an abstract level, 2) what type of information it is possible to extract from trajectory data, 3) what solutions trajectory data mining tools implement for different tasks, 4) how tools uses visualization, and 5) what the limiting aspects of input data are how those limitations are treated. The topic, trajectory data mining, is examined in a literature review, in which a large number of academic papers found trough googling were screened to find relevant information given the above stated objectives. The literature research found that there are several challenges along the process arriving at profitable knowledge about moving objects. For example, the discrete modelling of movements as polylines is associated with an inherent uncertainty since the location between two sampled positions is unknown. To reduce this uncertainty and prepare raw data for mining, data often needs to be processed in some way. The nature of pre-processing depends on sampling rate and accuracy properties of raw in-data as well as the requirements formulated by the specific mining method. Also a major challenge is to define relevant knowledge and effective methods for extracting this from the data. Furthermore are conveying results from mining to users an important function. Presenting results in an informative way, both at the level of individual trajectories and sets of trajectories, is a vital but far from trivial task, for which visualization is an effective approach. Abstractly defined instructions for data mining are formally denoted as tasks. There are four main categories of mining tasks: 1) managing uncertainty, 2) extrapolation, 3) anomaly detection, and 4) pattern detection. The recitation of tasks within this study provides a basis for an assessment of tools used for the execution of these tasks. To arrive at profitable results the dimensions of comparison are selected with the intention to cover the essential parts of the knowledge discovery process. The measures to appraise this are chosen to make results correctly reflect the 1) sophistication, 2) user friendliness, and 3) flexibility of tools. The focus within this thesis is freely available tools, for which the range is proven to be very small and fragmented. The selection of tools found and reported on are: MoveMine 2.0, MinUS, GeT_Move and M-Atlas. The tools are reviewed entirely through utilizing documentation of the tools. The performance of tools is proved to vary along all dimensional measures except visualization and graphical user interface which all tools provide. Overall the systems preform well considering user-friendliness, somewhat good considering sophistication and poorly considering flexibility. However, since the range of tasks, which tools intend to solve, overall is varying it might not be appropriate to compare the tools in term of better or worse. This thesis further provides some theoretical insights for users regarding requirements on their knowledge, both concerning the technical aspects of tools and about the nature of the moving objects. Furthermore is the future of trajectory data mining in form of constraints on information extraction as well as requirements for development of tools discussed, where a more robust open source solution is emphasised. Finally, this thesis can altogether be regarded to provide material for guidance in what trajectory mining tools to use depending on application. Work to complement this thesis through comparing the actual performance of tools, when using them, is desirable. / I och med utvecklingen av tekniker så som Global Navigation Satellite systems (GNSS), mobile computing och Information and Communication Technology (ICT) har tillvägagångsätt för insamling av positionsdata drastiskt förenklats. Denna utveckling har möjliggjort för insamlandet av stora mängder data från rörliga objekt (i.e. trajecotries)(sv: rörelsebanor), innehållande potentiell information om dessa rörliga objekt. Inom Knowledge Discovery in Databases (KDD)(sv: kunskapsanskaffning i databaser) tillämpas automatiserade processer för att realisera sådan information, som kallas trajectory data mining (sv: utvinning från rörelsebanedata). Denna studie ämnar undersöka 1) hur trajectory data mining tasks (sv: utvinning från rörelsebanedata uppgifter) är definierade på en abstrakt nivå, 2) vilken typ av information som är möjlig att utvinna ur rörelsebanedata, 3) vilka lösningar trajectory data ming tools (sv: verktyg för datautvinning från rörelsebanedata) implementerar för olika uppgifter, 4) hur verktyg använder visualisering, och 5) vilka de begränsande aspekterna av input-data är och hur dessa begränsningar hanteras. Ämnet utvinning från rörelsebanedata undersöks genom en litteraturgranskning, i vilken ett stort antal och akademiska rapporter hittade genom googling granskas för att finna relevant information givet de ovan nämnda frågeställningarna. Litteraturgranskningen visade att processen som leder upp till en användbar kunskap om rörliga objekt innehåller dock flera utmaningar. Till exempel är modelleringen av rörelser som polygontåg associerad med en inbyggd osäkerhet eftersom positionen för objekt mellan två inmätningar är okänd. För att reducera denna osäkerhet och förbereda rådata för extraktion måste ofta datan processeras på något sätt. Karaktären av förprocessering avgörs av insamlingsfrekvens och exakthetsegenskaper hos rå indata tillsammans med de krav som ställs av de specifika datautvinningsmetoderna. En betydande utmaning är också att definiera relevant kunskap och effektiva metoder för att utvinna denna från data. Vidare är förmedlandet av resultat från utvinnande till användare en viktig funktion. Att presentera resultat på ett informativt sätt, både på en nivå av enskilda rörelsebanor men och grupper av rörelsebanor är en vital men långt ifrån trivial uppgift, för vilken visualisering är ett effektivt tillvägagångsätt. Abstrakt definierade instruktioner för dataextraktion är formellt betecknade som uppgifter. Det finns fyra huvudkategorier av uppgifter: 1) hantering av osäkerhet, 2) extrapolation, 3) anomalidetektion, and 4) mönsterdetektion. Sammanfattningen av uppgifter som ges i denna rapport utgör ett fundament för en utvärdering av verktyg, vilka används för utförandet av uppgifter. För att landa i ett givande resultat har jämförelsegrunderna för verktygen valts med intentionen att täcka de viktigaste delarna av processen för att förvärva kunskap. Måtten för att utvärdera detta valdes för att reflektera 1) sofistikering, 2) användarvänlighet, och 3) flexibiliteten hos verktygen. Fokuset inom denna studie har varit verktyg som är gratis tillgängliga, för vilka utbudet har visat sig vara litet och fragmenterat. Selektionen av verktyg som hittats och utvärderats var: MoveMine 2.0, MinUS, GeT_Move and M-Atlas. Verktygen utvärderades helt och hållet baserat på tillgänglig dokumentation av verktygen. Prestationen av verktygen visade sig variera längs alla jämförelsegrunder utom visualisering och grafiskt gränssnitt som alla verktyg tillhandahöll. Överlag presterade systemen väl gällande användarvänlighet, någorlunda bra gällande sofistikering och dåligt gällande flexibilitet. Hursomhelst, eftersom uppgifterna som verktygen avser att lösa varierar är det inte relevant att värdera dem mot varandra gällande denna aspekt. Detta arbete tillhandahåller vidare några teoretiska insikter för användare gällande krav som ställs på deras kunskap, både gällande de tekniska aspekterna av verktygen och rörliga objekts beskaffenhet. Vidare diskuteras framtiden för utvinning från rörelsebanedata i form av begränsningar på informationsutvinning och krav för utvecklingen av verktyg, där en mer robust open source lösning betonas. Sammantaget kan detta arbete anses tillhandahålla material för vägledning i vad för verktyg för datautvinning från rörelsebanedata som kan användas beroende på användningsområde. Arbete för att komplettera denna rapport genom utvärdering av verktygens prestation utifrån användning av dem är önskvärt.
|
126 |
Vytěžování databáze Poradny pro poruchy metabolismu / Data mining of the database of Consulting centre for metabolism disordersSenft, Martin January 2014 (has links)
This thesis applies the data mining method of decision rules on data from Consulting centre for Metabolism disorders from University hospital Pilsen. As a tool is used the system LISp-Miner, developed at University of Economics, Prague. Decision rules found are evaluated by a specialist. The main parts of this thesis are followings: an overview on main data mining methods and results evalutation methods, description of the data mining method application on data and description and evaluation of results.
|
127 |
Sinkhole Hazard Assessment in Minnesota Using a Decision Tree ModelGao, Yongli, Alexander, E. Calvin 01 May 2008 (has links)
An understanding of what influences sinkhole formation and the ability to accurately predict sinkhole hazards is critical to environmental management efforts in the karst lands of southeastern Minnesota. Based on the distribution of distances to the nearest sinkhole, sinkhole density, bedrock geology and depth to bedrock in southeastern Minnesota and northwestern Iowa, a decision tree model has been developed to construct maps of sinkhole probability in Minnesota. The decision tree model was converted as cartographic models and implemented in ArcGIS to create a preliminary sinkhole probability map in Goodhue, Wabasha, Olmsted, Fillmore, and Mower Counties. This model quantifies bedrock geology, depth to bedrock, sinkhole density, and neighborhood effects in southeastern Minnesota but excludes potential controlling factors such as structural control, topographic settings, human activities and land-use. The sinkhole probability map needs to be verified and updated as more sinkholes are mapped and more information about sinkhole formation is obtained.
|
128 |
Fouille de données textuelles et systèmes de recommandation appliqués aux offres d'emploi diffusées sur le web / Text mining and recommender systems applied to job postingsSéguéla, Julie 03 May 2012 (has links)
L'expansion du média Internet pour le recrutement a entraîné ces dernières années la multiplication des canaux dédiés à la diffusion des offres d'emploi. Dans un contexte économique où le contrôle des coûts est primordial, évaluer et comparer les performances des différents canaux de recrutement est devenu un besoin pour les entreprises. Cette thèse a pour objectif le développement d'un outil d'aide à la décision destiné à accompagner les recruteurs durant le processus de diffusion d'une annonce. Il fournit au recruteur la performance attendue sur les sites d'emploi pour un poste à pourvoir donné. Après avoir identifié les facteurs explicatifs potentiels de la performance d'une campagne de recrutement, nous appliquons aux annonces des techniques de fouille de textes afin de les structurer et d'en extraire de l'information pertinente pour enrichir leur description au sein d'un modèle explicatif. Nous proposons dans un second temps un algorithme prédictif de la performance des offres d'emploi, basé sur un système hybride de recommandation, adapté à la problématique de démarrage à froid. Ce système, basé sur une mesure de similarité supervisée, montre des résultats supérieurs à ceux obtenus avec des approches classiques de modélisation multivariée. Nos expérimentations sont menées sur un jeu de données réelles, issues d'une base de données d'annonces publiées sur des sites d'emploi. / Last years, e-recruitment expansion has led to the multiplication of web channels dedicated to job postings. In an economic context where cost control is fundamental, assessment and comparison of recruitment channel performances have become necessary. The purpose of this work is to develop a decision-making tool intended to guide recruiters while they are posting a job on the Internet. This tool provides to recruiters the expected performance on job boards for a given job offer. First, we identify the potential predictors of a recruiting campaign performance. Then, we apply text mining techniques to the job offer texts in order to structure postings and to extract information relevant to improve their description in a predictive model. The job offer performance predictive algorithm is based on a hybrid recommender system, suitable to the cold-start problem. The hybrid system, based on a supervised similarity measure, outperforms standard multivariate models. Our experiments are led on a real dataset, coming from a job posting database.
|
129 |
Dolovací moduly systému pro dolování z dat v prostředí Oracle / Mining Modules of the Data Mining System in OracleMader, Pavel January 2009 (has links)
This master's thesis deals with questions of the data mining and an extension of a data mining system in the Oracle environment developed at FIT. So far, this system cannot apply to real-life conditions as there are no data mining modules available. This system's core application design includes an interface allowing the addition of mining modules. Until now, this interface has been tested on a sample mining module only; this module has not been executing any activity just demonstrating the use of this interface. The main focus of this thesis is the study of this interface and the implementation of a functional mining module testing the applicability of the implemented interface. Association rule mining module was selected for implementation.
|
130 |
A Study of Physicians' Serendipitous Knowledge Discovery: An Evaluation of Spark and the IF-SKD Model in a Clinical SettingHopkins, Mark E 05 1900 (has links)
This research study is conducted to test Workman, Fiszman, Rindflesch and Nahl's information flow-serendipitous knowledge discovery (IF-SKD) model of information behavior, in a clinical care context. To date, there have been few attempts to model the serendipitous knowledge discovery of physicians. Due to the growth and complexity of the biomedical literature, as well as the increasingly specialized nature of medicine, there is a need for advanced systems that can quickly present information and assist physicians to discover new knowledge. The National Library of Medicine's (NLM) Lister Hill Center for Biocommunication's Semantic MEDLINE project is focused on identifying and visualizing semantic relationships in the biomedical literature to support knowledge discovery. This project led to the development of a new information discovery system, Spark. The aim of Spark is to promote serendipitous knowledge discovery by assisting users in maximizing the use of their conceptual short-term memory to iteratively search for, engage, clarify and evaluate information presented from the biomedical literature. Using Spark, this study analyzes the IF- SKD model by capturing and analyzing physician feedback. The McCay-Peet, Toms and Kelloway's Perception of Serendipity and Serendipitous Digital Environment (SDE) questionnaires are used. Results are evaluated to determine whether Spark contributes to physicians' serendipitous knowledge discovery and the ability of the IF-SKD ability to capture physicians' information behavior in a clinical setting.
|
Page generated in 0.0669 seconds