Global ETD Search

141	Smart Meters Big Data : Behavioral Analytics via Incremental Data Mining and Visualization Singh, Shailendra January 2016 (has links) The big data framework applied to smart meters offers an exception platform for data-driven forecasting and decision making to achieve sustainable energy efficiency. Buying-in consumer confidence through respecting occupants' energy consumption behavior and preferences towards improved participation in various energy programs is imperative but difficult to obtain. The key elements for understanding and predicting household energy consumption are activities occupants perform, appliances and the times that appliances are used, and inter-appliance dependencies. This information can be extracted from the context rich big data from smart meters, although this is challenging because: (1) it is not trivial to mine complex interdependencies between appliances from multiple concurrent data streams; (2) it is difficult to derive accurate relationships between interval based events, where multiple appliance usage persist; (3) continuous generation of the energy consumption data can trigger changes in appliance associations with time and appliances. To overcome these challenges, we propose an unsupervised progressive incremental data mining technique using frequent pattern mining (appliance-appliance associations) and cluster analysis (appliance-time associations) coupled with a Bayesian network based prediction model. The proposed technique addresses the need to analyze temporal energy consumption patterns at the appliance level, which directly reflect consumers' behaviors and provide a basis for generalizing household energy models. Extensive experiments were performed on the model with real-world datasets and strong associations were discovered. The accuracy of the proposed model for predicting multiple appliances usage outperformed support vector machine during every stage while attaining accuracy of 81.65\%, 85.90\%, 89.58\% for 25\%, 50\% and 75\% of the training dataset size respectively. Moreover, accuracy results of 81.89\%, 75.88\%, 79.23\%, 74.74\%, and 72.81\% were obtained for short-term (hours), and long-term (day, week, month, and season) energy consumption forecasts, respectively. Smart Grid Smart Meter Behavioral Analytics Data-Driven Approach Energy Consumption Patterns Frequent Pattern Correlation Pattern Cluster Analysis Online Data Mining Incremental Data Mining Distributed Data Mining Association Rules Appliance Usage Prediction Energy Consumption Prediction Prediction Visualization
142	Možnosti prezentace výsledků DZD na webu / Options of presentation of KDD results on Web Koválik, Tomáš January 2015 (has links) This diploma thesis covers KDD analysis of data and options of presentation of KDD results on Web. The paper is divided into three main sections, which follow the whole process of this thesis. In the first section are mentioned theoretical basics needed for understanding of discussed problem. In this section are described notions data matrix and domain knowledge, concept of CRISP-DM methodology, GUHA method, system LISp-Miner and implementation of GUHA method in LISp-Miner including description of core procedures 4ft-Miner and CF-Miner. The second section is dedicated to the first goal of this paper. It briefly summarizes analysis made during pre-analysis phase. Then is described process of analysis of domain knowledge in a given data set. The third part focuses on the second goal of this thesis, which is problem of presentation of KDD results on Web. This section covers brief theoretical basis for used technologies. Then is described development of export script for automatic generation of website from results found using LISp-Miner system including description of structure of the output and recommendations for work in LISp-Miner system.
143	Pokročilé dolování v datech v kardiologii / Advanced Data Mining in Cardiology Mézl, Martin January 2009 (has links) The aim of this master´s thesis is to analyse and search unusual dependencies in database of patients from Internal Cardiology Clinic Faculty Hospital Brno. The part of the work is theoretical overview of common data mining methods used in medicine, especially decision trees, naive Bayesian classifier, artificial neural networks and association rules. Looking for unusual dependencies between atributes is realized by association rules and naive Bayesian classifier. The output of this work is a complex system for Knowledge discovery in databases process for any data set. This work was realized with collaboration of Internal Cardiology Clinic Faculty Hospital Brno. All programs were made in Matlab 7.0.1.
144	Database Support for 3D-Protein Data Set Analysis Lehner, Wolfgang, Hinneburg, Alexander 25 May 2022 (has links) The progress in genome research demands for an adequate infrastructure to analyze the data sets. Database systems reflect a key technology to organize data and speed up the analysis process. This paper discusses the role of a relational database system based on the problem of finding frequent substructures in multi-dimensional protein databases. The specific problem consists of producing a set of association rules regarding frequent substructures with different lengths and gaps between the amino acid residues of a protein. From a database point of view, the process of finding association rules building the base for a more in-depth analysis of the data material is split into two parts. The first part performs a discretization of the conformational angle space of a single amino acid residue by computing the nearest neighbor of a given set of representatives. The second part consists in adapting a well-known association rule algorithm to determine the frequent substructures. Both steps within this comprehensive analysis task requires substantial support of the underlying database in order to reduce the programming overhead at the application level. info:eu-repo/classification/ddc/005 ddc:005
145	Exploiting Graphic Card Processor Technology to Accelerate Data Mining Queries in SAP NetWeaver BIA Lehner, Wolfgang, Weyerhaeuser, Christoph, Mindnich, Tobias, Faerber, Franz 15 June 2022 (has links) Within business Intelligence contexts, the importance of data mining algorithms is continuously increasing, particularly from the perspective of applications and users that demand novel algorithms on the one hand and an efficient implementation exploiting novel system architectures on the other hand. Within this paper, we focus on the latter issue and report our experience with the exploitation of graphic card processor technology within the SAP NetWeaver business intelligence accelerator (BIA). The BIA represents a highly distributed analytical engine that supports OLAP and data mining processing primitives. The system organizes data entities in column-wise fashion and its operation is completely main-memory-based. Since case studies have shown that classic data mining queries spend a large portion of their runtime on scanning and filtering the data as a necessary prerequisite to the actual mining step, our main goal was to speed up this expensive scanning and filtering process. In a first step, the paper outlines the basic data mining processing techniques within SAP NetWeaver BIA and illustrates the implementation of scans and filters. In a second step, we give insight into the main features of a hybrid system architecture design exploiting graphic card processor technology. Finally, we sketch the implementation and give details of our vast evaluations. info:eu-repo/classification/ddc/004 ddc:004
146	Implementace části standardu SQL/MM DM pro asociační pravidla / Implementation of SQL/MM DM for Association Rules Škodík, Zdeněk Unknown Date (has links) This project is concerned with problems of knowledge discovery in databases, in the concrete then is concerned with an association rules, which are part of the system of data mining. By that way we try to get knowledge which we can´t find directly in the database and which can be useful. There is the description of SQL/MM DM, especially then all user-defined types given by standard for association rules as well as common types which create framework for data mining. Before the description of implementation these types, there is mentioned the instruments which are used for that - programming language PL/SQL and Oracle Data Mining support. The accuracy of implementation is verified by a sample application. In the conclusion, achieved results are evaluated and possible continuation of this work is mentioned.
147	Metodika vývoje a nasazování Business Intelligence v malých a středních podnicích / Methodology of development and deployment of Business Intelligence solutions in Small and Medium Sized Enterprises Rydzi, Daniel January 2005 (has links) Dissertation thesis deals with development and implementation of Business Intelligence (BI) solutions for Small and Medium Sized Enterprises (SME) in the Czech Republic. This thesis represents climax of author's up to now effort that has been put into completing a methodological model for development of this kind of applications for SMEs using self-owned skills and minimum of external resources and costs. This thesis can be divided into five major parts. First part that describes used technologies is divided into two chapters. First chapter describes contemporary state of Business Intelligence concept and it also contains original taxonomy of Business Intelligence solutions. Second chapter describes two Knowledge Discovery in Databases (KDD) techniques that were used for building those BI solutions that are introduced in case studies. Second part describes the area of Czech SMEs, which is an environment where the thesis was written and which it is meant to contribute to. This environment is represented by one chapter that defines the differences of SMEs against large corporations. Furthermore, there are author's reasons why he is personally focusing on this area explained. Third major part introduces the results of survey that was conducted among Czech SMEs with support of Department of Information Technologies of Faculty of Informatics and Statistics of University of Economics in Prague. This survey had three objectives. First one was to map the readiness of Czech SMEs for BI solutions development and deployment. Second was to determine major problems and consequent decisions of Czech SMEs that could be supported by BI solutions and the third objective was to determine top factors preventing SMEs from developing and deploying BI solutions. Fourth part of the thesis is also the core one. In two chapters there is the original Methodology for development and deployment of BI solutions by SMEs described as well as other methodologies that were studied. Original methodology is partly based on famous CRISP-DM methodology. Finally, last part describes particular company that has become a testing ground for author's theories and that supports his research. In further chapters it introduces case-studies of development and deployment of those BI solutions in this company, that were build using contemporary BI and KDD techniques with respect to original methodology. In that sense, these case-studies verified theoretical methodology in real use.
148	透過圖片標籤觀察情緒字詞與事物概念之關聯 / An analysis on association between emotion words and concept words based on image tags 彭聲揚, Peng, Sheng-Yang Unknown Date (has links) 本研究試圖從心理學出發，探究描述情緒狀態的分類方法為何，為了進行情緒與語意的連結，我們試圖將影像當作情緒狀態的刺激來源，針對Flickr網路社群所共建共享的內容進行抽樣與觀察，使用心理學研究中基礎的情緒字詞與詞性變化，提取12,000張帶有字詞標籤的照片，進行標籤字詞與情緒分類字詞共現的計算、關聯規則計算。同時，透過語意差異量表，提出了新的偏向與強度的座標分類方法。透過頻率門檻的過濾、詞性加註與詞幹合併字詞的方法，從 65983個不重複的文字標籤中，最後得到272個帶有情緒偏向的事物概念字詞，以及正負偏向的情緒關聯規則。為了透過影像驗證這些字詞是否與影像內容帶給人們的情緒狀態有關聯，我們透過三種查詢管道：Flickr單詞查詢、google image單詞查詢、以及我們透過照片標籤綜合指標：情緒字詞比例、社群過濾參數來選定最後要比較的 42張照片。透過語意差異量表，測量三組照片在136位使用者的答案中，是否能吻合先前提出的強度-偏向模型。實驗結果發現，我們的方法和google image回傳的結果類似，使用者問卷調查結果支持我們的方法對於正負偏向的判定，且比 google有更佳的強弱分離程度。 / This study attempts to proceed from psychology to explore the emotional state of the classification method described why, in order to be emotional and semantic links, images as we try to stimulate the emotional state of the source, the Internet community for sharing Flickr content sampling and observation, using basic psychological research in terms of mood changes with the parts of speech, with word labels extracted 12,000 photos, label and classification of words and word co-occurrence of emotional computing, computing association rules. At the same time, through the semantic differential scale, tend to put forward a new classification of the coordinates and intensity. Through the frequency threshold filter, filling part of speech combined with the terms of the method stems from the 65,983 non-duplicate text labels, the last 272 to get things with the concept of emotional bias term, and positive and negative emotions tend to association rules. In order to verify these words through images is to bring people's emotional state associated with our pipeline through the three sources: Flickr , google image , and photos through our index labels: the proportion of emotional words, the community filtering parameters to select the final 42 photos to compare. Through the semantic differential scale, measuring three photos in 136 users of answers, whether the agreement made earlier strength - bias model. Experimental results showed that our methods and google image similar to the results returned, the user survey results support our approach to determine the positive and negative bias, and the strength of better than google degree of separation. 情緒分類情緒檢索情緒詞社群網路字詞共現關聯規則影像與情緒 sentiment classification sentiment retrieval sentiment words word co-occurrence association rules image and emotion social network
149	Development of an intelligent analytics-based model for product sales optimisation in retail enterprises Matobobo, Courage 03 July 2016 (has links) A retail enterprise is a business organisation that sells goods or services directly to consumers for personal use. Retail enterprises such as supermarkets enable customers to go around the shop picking items from the shelves and placing them into their baskets. The basket of each customer is captured into transactional systems. In this research study, retail enterprises were classified into two main categories: centralised and distributed retail enterprises. A distributed retail enterprise is one that issues the decision rights to the branches or groups nearest to the data collection, while in centralised retail enterprises the decision rights of the branches are concentrated in a single authority. It is difficult for retail enterprises to ascertain customer preferences by merely observing transactions. This has led to quantifiable losses. Although some enterprises implemented classical business models to address these challenging issues, they still lacked analytics-based marketing programs to gain competitive advantage. This research study develops an intelligent analytics-based (ARANN) model for both distributed and centralised retail enterprises in the cross-demographics of a developing country. The ARANN model is built on association rules (AR), complemented by artificial neural networks (ANN) to strengthen the results of these two individual models. The ARANN model was tested using real-life and publicly available transactional datasets for the generation of product arrangement sets. In centralised retail enterprises, the data from different branches was integrated and pre-processed to remove data impurities. The cleaned data was then fed into the ARANN model. On the other hand, in distributed retail enterprises data was collected branch per branch and cleaned. The cleaned data was fed into the ARANN model. According to experimental analytics, the ARANN model can generate improved product arrangement sets, thereby improving the confidence of retail enterprise decision-makers in competitive environments. It was also observed that the ARANN model performed faster in distributed than in centralised retail enterprises. This research is beneficial for sustainable businesses and consideration of the results is therefore recommended to retail enterprises. / Computing / M Sc. (Computing) Analytics Product sales optimisation Retail enterprises Model Association rules Artificial neural networks Data mining ARANN Business intelligence Management Marketing 006.32 Neural networks (Computer science) Computer science -- Data processing Retail trade -- Data processing Business -- Databases Business -- Computer network resources
150	Extraction optimisée de règles d'association positives et négatives intéressantes / Efficient mining of interesting positive and negative association rules Papon, Pierre-Antoine 09 June 2016 (has links) L’objectif de la fouille de données consiste à extraire des connaissances à partir de grandes masses de données. Les connaissances extraites peuvent prendre différentes formes. Dans ce travail, nous allons chercher à extraire des connaissances uniquement sous la forme de règles d’association positives et de règles d’association négatives. Une règle d’association négative est une règle dans laquelle la présence ainsi que l’absence d’une variable peuvent être utilisées. En considérant l’absence des variables dans l’étude, nous allons élargir la sémantique des connaissances et extraire des informations non détectables par les méthodes d’extraction de règles d’association positives. Cela va par exemple permettre aux médecins de trouver des caractéristiques qui empêchent une maladie de se déclarer, en plus de chercher des caractéristiques déclenchant une maladie. Cependant, l’ajout de la négation va entraîner différents défis. En effet, comme l’absence d’une variable est en général plus importante que la présence de ces mêmes variables, les coûts de calculs vont augmenter exponentiellement et le risque d’extraire un nombre prohibitif de règles, qui sont pour la plupart redondantes et inintéressantes, va également augmenter. Afin de remédier à ces problèmes, notre proposition, dérivée de l’algorithme de référence A priori, ne va pas se baser sur les motifs fréquents comme le font les autres méthodes. Nous définissons donc un nouveau type de motifs : les motifs raisonnablement fréquents qui vont permettre d’améliorer la qualité des règles. Nous nous appuyons également sur la mesure M G pour connaître les types de règles à extraire mais également pour supprimer des règles inintéressantes. Nous utilisons également des méta-règles nous permettant d’inférer l’intérêt d’une règle négative à partir d’une règle positive. Par ailleurs, notre algorithme va extraire un nouveau type de règles négatives qui nous semble intéressant : les règles dont la prémisse et la conclusion sont des conjonctions de motifs négatifs. Notre étude se termine par une comparaison quantitative et qualitative aux autres algorithmes d’extraction de règles d’association positives et négatives sur différentes bases de données de la littérature. Notre logiciel ARA (Association Rules Analyzer ) facilite l’analyse qualitative des algorithmes en permettant de comparer intuitivement les algorithmes et d’appliquer en post-traitement différentes mesures de qualité. Finalement, notre proposition améliore l’extraction au niveau du nombre et de la qualité des règles extraites mais également au niveau du parcours de recherche des règles. / The purpose of data mining is to extract knowledge from large amount of data. The extracted knowledge can take different forms. In this work, we will seek to extract knowledge only in the form of positive association rules and negative association rules. A negative association rule is a rule in which the presence and the absence of a variable can be used. When considering the absence of variables in the study, we will expand the semantics of knowledge and extract undetectable information by the positive association rules mining methods. This will, for example allow doctors to find characteristics that prevent disease instead of searching characteristics that cause a disease. Nevertheless, adding the negation will cause various challenges. Indeed, as the absence of a variable is usually more important than the presence of these same variables, the computational costs will increase exponentially and the risk to extract a prohibitive number of rules, which are mostly redundant and uninteresting, will also increase. In order to address these problems, our proposal, based on the famous Apriori algorithm, does not rely on frequent itemsets as other methods do. We define a new type of itemsets : the reasonably frequent itemsets which will improve the quality of the rules. We also rely on the M G measure to know which forms of rules should be mined but also to remove uninteresting rules. We also use meta-rules to allow us to infer the interest of a negative rule from a positive one. Moreover, our algorithm will extract a new type of negative rules that seems interesting : the rules for which the antecedent and the consequent are conjunctions of negative itemsets. Our study ends with a quantitative and qualitative comparison with other positive and negative association rules mining algorithms on various databases of the literature. Our software ARA (Association Rules Analyzer ) facilitates the qualitative analysis of the algorithms by allowing to compare intuitively the algorithms and to apply in post-process treatments various quality measures. Finally, our proposal improves the extraction in the number and the quality of the extracted rules but also in the rules search path. Fouille de données Motifs raisonnablement fréquents Mesure M G Méta-règle ARA Qualité des règles Data mining Positive and negative association rules Reasonably frequent itemsets M G measure Meta-rules ARA Quality of rules

Search results