Využití data miningových metod při zpracování dat z demografických šetření / Using data mining methods for demographic survey data processingFišer, David January 2015 (has links)
USING DATA MINING METHODS FOR DEMOGRAPHIC SURVEY DATA PROCESSING Abstract The goal of the thesis was to describe and demonstrate principles of the process of knowledge discovery in databases - data mining (DM). In the theoretical part of the thesis, selected methods for data mining processes are described as well as basic principles of those DM techniques. In the second part of the thesis a DM task is realized in accordance to CRISP-DM methodology. Practical part of the thesis is divided into two parts and data from the survey of American Community Survey served as the basic data for the practical part of the thesis. First part contains a classification task which goal was to determinate whether the selected DM techniques can be used to solve missing data in the surveys. The success rate of classifications and following data value prediction in selected attributes was in 55-80 % range. The second part of the practical part of the thesis was then focused of determining knowledge of interest using associating rules and the GUHA method. Keywords: data mining, knowledge discovery in databases, statistic surveys, missing values, classification, association rules, GUHA method, ACS
Intelligent knowledge discovery on building energy and indoor climate dataRaatikainen, M. (Mika) 29 November 2016 (has links)
A future vision of enabling technologies for the needs of energy conservation as well as energy efficiency based on the most important megatrends identified, namely climate change, urbanization, and digitalization. In the United States and in the European Union, about 40% of total energy consumption goes into energy use by buildings. Moreover, indoor climate quality is recognized as a distinct health hazard. On account of these two factors, energy efficiency and healthy housing are active topics in international research.
The main aims of this thesis are to study which elements affect indoor climate quality, how energy consumption describes building energy efficiency and to analyse the measured data using intelligent computational methods. The data acquisition technology used in the studies relies heavily on smart metering technologies based on Building Automation Systems (BAS), big data and the Internet of Things (IoT).
The data refining process presented and used is called Knowledge Discovery in Databases (KDD). It contains methods for data acquisition, pre-processing, data mining, visualisation and interpretation of results, and transformation into knowledge and new information for end users. In this thesis, four examples of data analysis and knowledge deployment concerning small houses and school buildings are presented.
The results of the case studies show that the data mining methods used in building energy efficiency and indoor climate quality analysis have a great potential for processing a large amount of multivariate data effectively. An innovative use of computational methods provides a good basis for researching and developing new information services. In the KDD process, researchers should co-operate with end users, such as building management and maintenance personnel as well as residents, to achieve better analysis results, easier interpretation and correct conclusions for exploiting the knowledge. / Tiivistelmä
Tulevaisuuden visio energiansäästön sekä energiatehokkuuden mahdollistavista teknologioista pohjautuu tärkeimpiin tunnistettuihin megatrendeihin, ilmastonmuutokseen, kaupungistumiseen ja digitalisoitumiseen. Yhdysvalloissa ja Euroopan unionissa käytetään noin 40 % kokonaisenergiankulutuksesta rakennusten käytön energiatarpeeseen. Myös rakennusten sisäilmaston on havaittu olevan ilmeinen terveysriski. Perustuen kahteen edellä mainittuun tekijään, energiatehokkuus ja asumisterveys ovat aktiivisia tutkimusaiheita kansainvälisessä tutkimuksessa.
Tämän väitöskirjan päätavoitteena on ollut tutkia, mitkä elementit vaikuttavat sisäilmastoon ja rakennusten energiatehokkuuteen pääasiassa analysoimalla mittausdataa käyttäen älykkäitä laskennallisia menetelmiä. Tutkimuksissa käytetyt tiedonkeruuteknologiat perustuvat etäluentaan ja rakennusautomaatioon, big datan hyödyntämiseen ja esineiden internetiin (IoT).
Väitöskirjassa esiteltävä tietämyksen muodostusprosessi (KDD) koostuu tiedonkeruusta,datan esikäsittelystä, tiedonlouhinnasta, visualisoinnista ja tutkimustulosten tulkinnasta sekä tietämyksen muodostamisesta ja oleellisen informaation esittämisestä loppukäyttäjille. Tässä väitöstutkimuksessa esitellään neljän data-analyysin ja niiden pohjalta muodostetun tietämyksen hyödyntämisen esimerkkiä, jotka liittyvät pientaloihin ja koulurakennuksiin.
Esimerkkitapausten tulokset osoittavat, että käytetyillä tiedonlouhinnan menetelmillä sovellettuna rakennusten energiatehokkuus- ja sisäilmastoanalyyseihin on mahdollista jalostaa suuria monimuuttuja-aineistoja tehokkaasti. Laskennallisten menetelmien innovatiivinen käyttö antaa hyvät perusteet tutkia ja kehittää uusia informaatiopalveluja. Tutkijoiden tulee tehdä yhteistyötä loppukäyttäjinä toimivien kiinteistöhallinnan ja -ylläpidon henkilöstön sekä asukkaiden kanssa saavuttaakseen parempia analyysituloksia, helpompaa tulosten tulkintaa ja oikeita johtopäätöksiä tietämyksen hyödyntämiseksi.
From machine learning to learning with machines:remodeling the knowledge discovery processTuovinen, L. (Lauri) 19 August 2014 (has links)
Knowledge discovery (KD) technology is used to extract knowledge from large quantities of digital data in an automated fashion. The established process model represents the KD process in a linear and technology-centered manner, as a sequence of transformations that refine raw data into more and more abstract and distilled representations. Any actual KD process, however, has aspects that are not adequately covered by this model. In particular, some of the most important actors in the process are not technological but human, and the operations associated with these actors are interactive rather than sequential in nature. This thesis proposes an augmentation of the established model that addresses this neglected dimension of the KD process.
The proposed process model is composed of three sub-models: a data model, a workflow model, and an architectural model. Each sub-model views the KD process from a different angle: the data model examines the process from the perspective of different states of data and transformations that convert data from one state to another, the workflow model describes the actors of the process and the interactions between them, and the architectural model guides the design of software for the execution of the process. For each of the sub-models, the thesis first defines a set of requirements, then presents the solution designed to satisfy the requirements, and finally, re-examines the requirements to show how they are accounted for by the solution.
The principal contribution of the thesis is a broader perspective on the KD process than what is currently the mainstream view. The augmented KD process model proposed by the thesis makes use of the established model, but expands it by gathering data management and knowledge representation, KD workflow and software architecture under a single unified model. Furthermore, the proposed model considers issues that are usually either overlooked or treated as separate from the KD process, such as the philosophical aspect of KD. The thesis also discusses a number of technical solutions to individual sub-problems of the KD process, including two software frameworks and four case-study applications that serve as concrete implementations and illustrations of several key features of the proposed process model. / Tiivistelmä
Tiedonlouhintateknologialla etsitään automoidusti tietoa suurista määristä digitaalista dataa. Vakiintunut prosessimalli kuvaa tiedonlouhintaprosessia lineaarisesti ja teknologiakeskeisesti sarjana muunnoksia, jotka jalostavat raakadataa yhä abstraktimpiin ja tiivistetympiin esitysmuotoihin. Todellisissa tiedonlouhintaprosesseissa on kuitenkin aina osa-alueita, joita tällainen malli ei kata riittävän hyvin. Erityisesti on huomattava, että eräät prosessin tärkeimmistä toimijoista ovat ihmisiä, eivät teknologiaa, ja että heidän toimintansa prosessissa on luonteeltaan vuorovaikutteista eikä sarjallista. Tässä väitöskirjassa ehdotetaan vakiintuneen mallin täydentämistä siten, että tämä tiedonlouhintaprosessin laiminlyöty ulottuvuus otetaan huomioon.
Ehdotettu prosessimalli koostuu kolmesta osamallista, jotka ovat tietomalli, työnkulkumalli ja arkkitehtuurimalli. Kukin osamalli tarkastelee tiedonlouhintaprosessia eri näkökulmasta: tietomallin näkökulma käsittää tiedon eri olomuodot sekä muunnokset olomuotojen välillä, työnkulkumalli kuvaa prosessin toimijat sekä niiden väliset vuorovaikutukset, ja arkkitehtuurimalli ohjaa prosessin suorittamista tukevien ohjelmistojen suunnittelua. Väitöskirjassa määritellään aluksi kullekin osamallille joukko vaatimuksia, minkä jälkeen esitetään vaatimusten täyttämiseksi suunniteltu ratkaisu. Lopuksi palataan tarkastelemaan vaatimuksia ja osoitetaan, kuinka ne on otettu ratkaisussa huomioon.
Väitöskirjan pääasiallinen kontribuutio on se, että se avaa tiedonlouhintaprosessiin valtavirran käsityksiä laajemman tarkastelukulman. Väitöskirjan sisältämä täydennetty prosessimalli hyödyntää vakiintunutta mallia, mutta laajentaa sitä kokoamalla tiedonhallinnan ja tietämyksen esittämisen, tiedon louhinnan työnkulun sekä ohjelmistoarkkitehtuurin osatekijöiksi yhdistettyyn malliin. Lisäksi malli kattaa aiheita, joita tavallisesti ei oteta huomioon tai joiden ei katsota kuuluvan osaksi tiedonlouhintaprosessia; tällaisia ovat esimerkiksi tiedon louhintaan liittyvät filosofiset kysymykset. Väitöskirjassa käsitellään myös kahta ohjelmistokehystä ja neljää tapaustutkimuksena esiteltävää sovellusta, jotka edustavat teknisiä ratkaisuja eräisiin yksittäisiin tiedonlouhintaprosessin osaongelmiin. Kehykset ja sovellukset toteuttavat ja havainnollistavat useita ehdotetun prosessimallin merkittävimpiä ominaisuuksia.
Porovnatelnost dat v dobývání znalostí z databází / Data comparability in knowledge discovery in databasesHoráková, Linda January 2017 (has links)
The master thesis is focused on analysis of data comparability and commensurability in datasets, which are used for obtaining knowledge using methods of data mining. Data comparability is one of aspects of data quality, which is crucial for correct and applicable results from data mining tasks. The aim of the theoretical part of the thesis is to briefly describe the field of knowledqe discovery and define specifics of mining of aggregated data. Moreover, the terms of comparability and commensurability is discussed. The main part is focused on process of knowledge discovery. These findings are applied in practical part of the thesis. The main goal of this part is to define general methodology, which can be used for discovery of potential problems of data comparability in analyzed data. This methodology is based on analysis of real dataset containing daily sales of products. In conclusion, the methodology is applied on data from the field of public budgets.
Empirické porovnání volně dostupných systémů dobývání znalostí z databází / Empirical comparison of free software suites for knowledge discovery from dataKasík, Josef January 2009 (has links)
Both topic and main objective of the diploma thesis is a comparison of free data mining suites. Subjects of comparison are six particular applications developed under university projects as experimental tools for data mining and mediums for educational purposes. Criteria of the comparison are derived from four general aspects that form the base for further analyses. Each system is evaluated as a tool for handling real-time data mining tasks, a tool supporting various phases of the CRISP-DM methodology, a tool capable of practical employment on certain data and as a common software system. These aspects bring 31 particular criteria for comparison, evaluation of whose was determined by thorough analysis of each system. The results of comparison confirmed the anticipated assumption. As the best tool the Weka data mining suite was evaluated. The main advantages of Weka are high number of machine learning algorithms, numerous data preparation tools and speed of processing.
Génération de connaissances à l’aide du retour d’expérience : application à la maintenance industrielle / Knowledge generation using experience feedback : application to industrial maintenancePotes Ruiz, Paula Andrea 24 November 2014 (has links)
Les travaux de recherche présentés dans ce mémoire s’inscrivent dans le cadre de la valorisation des connaissances issues des expériences passées afin d’améliorer les performances des processus industriels. La connaissance est considérée aujourd'hui comme une ressource stratégique importante pouvant apporter un avantage concurrentiel décisif aux organisations. La gestion des connaissances (et en particulier le retour d’expérience) permet de préserver et de valoriser des informations liées aux activités d’une entreprise afin d’aider la prise de décision et de créer de nouvelles connaissances à partir du patrimoine immatériel de l’organisation. Dans ce contexte, les progrès des technologies de l’information et de la communication jouent un rôle essentiel dans la collecte et la gestion des connaissances. L’implémentation généralisée des systèmes d’information industriels, tels que les ERP (Enterprise Resource Planning), rend en effet disponible un grand volume d’informations issues des événements ou des faits passés, dont la réutilisation devient un enjeu majeur. Toutefois, ces fragments de connaissances (les expériences passées) sont très contextualisés et nécessitent des méthodologies bien précises pour être généralisés. Etant donné le potentiel des informations recueillies dans les entreprises en tant que source de nouvelles connaissances, nous proposons dans ce travail une démarche originale permettant de générer de nouvelles connaissances tirées de l’analyse des expériences passées, en nous appuyant sur la complémentarité de deux courants scientifiques : la démarche de Retour d’Expérience (REx) et les techniques d’Extraction de Connaissances à partir de Données (ECD). Le couplage REx-ECD proposé porte principalement sur : i) la modélisation des expériences recueillies à l’aide d’un formalisme de représentation de connaissances afin de faciliter leur future exploitation, et ii) l’application de techniques relatives à la fouille de données (ou data mining) afin d’extraire des expériences de nouvelles connaissances sous la forme de règles. Ces règles doivent nécessairement être évaluées et validées par les experts du domaine avant leur réutilisation et/ou leur intégration dans le système industriel. Tout au long de cette démarche, nous avons donné une place privilégiée aux Graphes Conceptuels (GCs), formalisme de représentation des connaissances choisi pour faciliter le stockage, le traitement et la compréhension des connaissances extraites par l’utilisateur, en vue d’une exploitation future. Ce mémoire s’articule en quatre chapitres. Le premier constitue un état de l’art abordant les généralités des deux courants scientifiques qui contribuent à notre proposition : le REx et les techniques d’ECD. Le second chapitre présente la démarche REx-ECD proposée, ainsi que les outils mis en œuvre pour la génération de nouvelles connaissances afin de valoriser les informations disponibles décrivant les expériences passées. Le troisième chapitre présente une méthodologie structurée pour interpréter et évaluer l’intérêt des connaissances extraites lors de la phase de post-traitement du processus d’ECD. Finalement, le dernier chapitre expose des cas réels d’application de la démarche proposée à des interventions de maintenance industrielle. / The research work presented in this thesis relates to knowledge extraction from past experiences in order to improve the performance of industrial process. Knowledge is nowadays considered as an important strategic resource providing a decisive competitive advantage to organizations. Knowledge management (especially the experience feedback) is used to preserve and enhance the information related to a company’s activities in order to support decision-making and create new knowledge from the intangible heritage of the organization. In that context, advances in information and communication technologies play an essential role for gathering and processing knowledge. The generalised implementation of industrial information systems such as ERPs (Enterprise Resource Planning) make available a large amount of data related to past events or historical facts, which reuse is becoming a major issue. However, these fragments of knowledge (past experiences) are highly contextualized and require specific methodologies for being generalized. Taking into account the great potential of the information collected in companies as a source of new knowledge, we suggest in this work an original approach to generate new knowledge based on the analysis of past experiences, taking into account the complementarity of two scientific threads: Experience Feedback (EF) and Knowledge Discovery techniques from Databases (KDD). The suggested EF-KDD combination focuses mainly on: i) modelling the experiences collected using a knowledge representation formalism in order to facilitate their future exploitation, and ii) applying techniques related to data mining in order to extract new knowledge in the form of rules. These rules must necessarily be evaluated and validated by experts of the industrial domain before their reuse and/or integration into the industrial system. Throughout this approach, we have given a privileged position to Conceptual Graphs (CGs), knowledge representation formalism chosen in order to facilitate the storage, processing and understanding of the extracted knowledge by the user for future exploitation. This thesis is divided into four chapters. The first chapter is a state of the art addressing the generalities of the two scientific threads that contribute to our proposal: EF and KDD. The second chapter presents the EF-KDD suggested approach and the tools used for the generation of new knowledge, in order to exploit the available information describing past experiences. The third chapter suggests a structured methodology for interpreting and evaluating the usefulness of the extracted knowledge during the post-processing phase in the KDD process. Finally, the last chapter discusses real case studies dealing with the industrial maintenance domain, on which the proposed approach has been applied.
Použití metod dobývání znalostí v oblasti kardiochirurgie / Application of knowledge discovery methods in the field of cardiac surgeryČech, Bohuslav January 2014 (has links)
This theses demonstrate practical use of knowledge discovery in the field of cardiac surgery. The tasks of the Department of Cardiac Surgery University Hospital Olomouc are solved through the use of GUHA method and LISp-Miner system. Mitral valve surgery data comes from clinical practice between the years 2002 and 2011. Theoretical part includes chapter on KDD -- type of tasks, methods and methodology and chapter on cardiac surgery -- anatomy and functions of heart, mitral valve disease and diagnostic methods including quantification. Practical part brings solutions of the tasks and whole process is described in the spirit of CRISP-DM.
On biclusters aggregation and its benefits for enumerative solutions = Agregação de biclusters e seus benefícios para soluções enumerativas / Agregação de biclusters e seus benefícios para soluções enumerativasOliveira, Saullo Haniell Galvão de, 1988- 27 August 2018 (has links)
"Visualizações temporais em uma plataforma de software extensível e adaptável" / "Temporal visualizations in an extensible and adaptable software platform"Milton Hirokazu Shimabukuro 05 July 2004 (has links)
Repositórios com volumes de dados cada vez maiores foram viabilizados pelo desenvolvimento tecnológico, criando importantes fontes de informação em diversas áreas da atividade humana. Esses repositórios freqüentemente incluem informação sobre o comportamento temporal e o posicionamento espacial dos itens neles representados, os quais são extremamente relevantes para a análise dos dados. O processo de descoberta de conhecimento a partir de grandes volumes de dados tem sido objeto de estudo em diversas disciplinas, dentre elas a Visualização de Informação, cujas técnicas podem apoiar diversas etapas desse processo. Esta tese versa sobre o uso da Visualização Exploratória em conjuntos de dados com atributos temporais e espaciais, empregando a estratégia de múltiplas visualizações coordenadas para apoiar o tratamento de dados em estágios iniciais de processos de descoberta de conhecimento. São propostas duas novas representações visuais temporais denominadas Variação Temporal Uni-escala e Variação Temporal Multi-escala para apoiar a análise exploratória de dados temporais. Adicionalmente, é proposto um modelo de arquitetura de software AdaptaVis, que permite a integração dessas e outras representações visuais em uma plataforma de visualização de informação flexível, extensível e adaptável às necessidades de diferentes usuários, tarefas e domínios de aplicação a plataforma InfoVis. Sessões de uso realizadas com dados e usuários reais dos domínios de Climatologia e Negócios permitiram validar empiricamente as representações visuais e o modelo. O modelo AdaptaVis e a plataforma InfoVis estabelecem bases para a continuidade de diversas pesquisas em Visualização de Informação, particularmente o estudo de aspectos relacionados ao uso coordenado de múltiplas visualizações, à modelagem do processo de coordenação, e à integração entre múltiplas técnicas visuais e analíticas. / Data repositories with ever increasing volumes have been made possible by the evolution in data collection technologies, creating important sources of information in several fields of human activity. Such data repositories often include information about both the temporal behavior and the spatial positioning of data items that will be relevant in future data analysis tasks. The process of discovering knowledge embedded in great volumes of data is a topic of study in several disciplines, including Information Visualization, which offers a range of techniques to support different stages of a discovery process. This thesis addresses the application of Exploratory Visualization techniques on datasets with temporal and spatial attributes, using the strategy of coordinating multiple data views, to assist data treatment on early stages of knowledge discovery processes. Two temporal visual representations are proposed Uni-scale Temporal Behavior and Multi-scale Temporal Behavior that support the exploratory analysis of temporal data. Moreover, a software architecture model is introduced AdaptaVis, that allows the integration of these and other visualization techniques into a flexible, extensible and adaptable information visualization platform called InfoVis that may be tailored to meet the requirements of different users, tasks and application domains. Sessions conducted with real data and users from the Climatology and Business application domains allowed an empirical validation of both the visual representations and the model. The AdaptaVis model and the InfoVis platform establish the basis for further research on issues related to the coordinated use of multiple data views, the modeling of the coordination process and the integration amongst multiple visual and analytical techniques.
Meta-učení v oblasti dolování dat / Meta-Learning in the Area of Data MiningKučera, Petr January 2013 (has links)
This paper describes the use of meta-learning in the area of data mining. It describes the problems and tasks of data mining where meta-learning can be applied, with a focus on classification. It provides an overview of meta-learning techniques and their possible application in data mining, especially model selection. It describes design and implementation of meta-learning system to support classification tasks in data mining. The system uses statistics and information theory to characterize data sets stored in the meta-knowledge base. The meta-classifier is created from the base and predicts the most suitable model for the new data set. The conclusion discusses results of the experiments with more than 20 data sets representing clasification tasks from different areas and suggests possible extensions of the project.
